Kernel Control Groups (abbreviated known as “cgroups”) are a kernel feature that allows aggregating or partitioning tasks (processes) and all their children into hierarchical organized groups. These hierarchical groups can be configured to show a specialized behavior that helps with tuning the system to make best use of available hardware and network resources.
In the following sections, we often reference kernel documentation
such as /usr/src/linux/Documentation/cgroups/
.
These files are part of the kernel-source
package.
This chapter is an overview. To use cgroups properly and to avoid performance implications, you must study the provided references.
The following terms are used in this chapter:
“cgroup” is another name for Control Groups.
In a cgroup there is a set of tasks (processes) associated with a set of subsystems that act as parameters constituting an environment for the tasks.
Subsystems provide the parameters that can be assigned and define CPU sets, freezer, or—more general—“resource controllers” for memory, disk I/O, network traffic, etc.
cgroups are organized in a tree-structured hierarchy. There can be more than one hierarchy in the system. You use a different or alternate hierarchy to cope with specific situations.
Every task running in the system is in exactly one of the cgroups in the hierarchy.
See the following resource planning scenario for a better understanding
(source:
/usr/src/linux/Documentation/cgroups/cgroups.txt
):
Web browsers such as Firefox will be part of the Web network class, while the NFS daemons such as (k)nfsd will be part of the NFS network class. On the other side, Firefox will share appropriate CPU and memory classes depending on whether a professor or student started it.
The following subsystems are available:
cpuset
,
cpu
,
cpuacct
,
memory
,
devices
,
freezer
,
net_cls
,
net_prio
,
blkio
,
perf_event
, and
hugetlbt
.
Either mount each subsystem separately, for example:
mkdir /cpuset /cpu mount -t cgroup -o cpuset none /cpuset mount -t cgroup -o cpu,cpuacct none /cpu
or all subsystems in one go; you can use an arbitrary device name (for example
none
), which will appear in
/proc/mounts
, for example:
mount -t cgroup none /sys/fs/cgroup
Some additional information on available subsystems:
net_cls
(Identification)The Network classifier cgroup helps with providing identification for controlling processes such as Traffic Controller (tc) or Netfilter (iptables). These controller tools can act on tagged network packets.
For more information, see
/usr/src/linux/Documentation/cgroups/net_cls.txt
.
net_prio
(Identification)The Network priority cgroup helps with setting the priority of network packets.
For more information, see
/usr/src/linux/Documentation/cgroups/net_prio.txt
.
devices
(Isolation)A system administrator can provide a list of devices that can be accessed by processes under cgroups.
It limits access to a device or a file system on a device to only
tasks that belong to the specified cgroup. For more information, see
/usr/src/linux/Documentation/cgroups/devices.txt
.
freezer
(Control)
The freezer
subsystem is useful for
high-performance computing clusters (HPC clusters). Use it to
freeze (stop) all tasks in a group or to stop tasks, if they reach
a defined checkpoint. For more information, see
/usr/src/linux/Documentation/cgroups/freezer-subsystem.txt
.
Here are basic commands to use the freezer subsystem:
mount -t cgroup -o freezer freezer /freezer # Create a child cgroup: mkdir /freezer/0 # Put a task into this cgroup: echo $task_pid > /freezer/0/tasks # Freeze it: echo FROZEN > /freezer/0/freezer.state # Unfreeze (thaw) it: echo THAWED > /freezer/0/freezer.state
perf_event
(Control)
perf_event
collects performance data.
cpuset
(Isolation)
Use cpuset
to tie processes to system subsets
of CPUs and memory (“memory nodes”). For an example,
see Section 9.4.2, “Example: Cpusets”.
cpuacct
(Accounting)
The CPU accounting controller groups tasks using cgroups and accounts
the CPU usage of these groups. For more information, see
/usr/src/linux/Documentation/cgroups/cpuacct.txt
.
memory
(Resource Control)Tracking or limiting memory usage of user space processes.
Control swap usage by setting swapaccount=1
as a
kernel boot parameter.
Limit LRU (Least Recently Used) pages.
Anonymous and file cache.
No limits for kernel memory.
Maybe in another subsystem if needed.
Memory cgroup now offers a mechanism allowing easier workload
opt-in isolation. Memory cgroup can define its so called low
limit (memory.low_limit_in_bytes
), which works
as a protection from memory pressure. For workloads that need to be
isolated from outside memory management activity, the value should be set
to the expected Resident Set Size (RSS) plus some head
room. If a memory pressure condition triggers on the system and
the particular group is still under its low limit, its memory is
protected from reclaim. As a result, workloads outside of the
cgroup do not need the aforementioned capping.
For more information, see
/usr/src/linux/Documentation/cgroups/memory.txt
.
hugetlb
(Resource Control)The HugeTLB controller manages the memory allocated to huge pages.
For more information, see
/usr/src/linux/Documentation/cgroups/hugetlb.txt
.
cpu
(Control)Share CPU bandwidth between groups with the group scheduling function of CFS (the scheduler). Mechanically complicated.
The Block IO controller is available as a disk I/O controller. With the blkio controller you can currently set policies for proportional bandwidth and for throttling.
These are the basic commands to configure proportional weight division
of bandwidth by setting weight values in
blkio.weight
:
# Setup in /sys/fs/cgroup mkdir /sys/fs/cgroup/blkio mount -t cgroup -o blkio none /sys/fs/cgroup/blkio # Start two cgroups mkdir -p /sys/fs/cgroup/blkio/group1 /sys/fs/cgroup/blkio/group2 # Set weights echo 1000 > /sys/fs/cgroup/blkio/group1/blkio.weight echo 500 > /sys/fs/cgroup/blkio/group2/blkio.weight # Write the PIDs of the processes to be controlled to the # appropriate groups COMMAND1 & echo $! > /sys/fs/cgroup/blkio/group1/tasks COMMAND2 & echo $! > /sys/fs/cgroup/blkio/group2/tasks
These are the basic commands to configure throttling or upper limit
policy by setting values in
blkio.throttle.read_bps_device
for reads and
blkio.throttle.write_bps_device
for writes:
# Setup in /sys/fs/cgroup mkdir /sys/fs/cgroup/blkio mount -t cgroup -o blkio none /sys/fs/cgroup/blkio # Bandwidth rate of a device for the root group; format: # <major>:<minor> <byes_per_second> echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device
For more information about caveats, usage scenarios, and additional
parameters, see
/usr/src/linux/Documentation/cgroups/blkio-controller.txt
.
To conveniently use cgroups, install the following additional packages:
libcgroup-tools
— basic user space tools
to simplify resource management
libcgroup1
— control groups
management library
cpuset
— contains the
cset
to manipulate cpusets
libcpuset1
— C API to cpusets
kernel-source
— only needed for
documentation purposes
With the command line proceed as follows:
To determine the number of CPUs and memory nodes see
/proc/cpuinfo
and
/proc/zoneinfo
.
Create the cpuset hierarchy as a virtual file system (source:
/usr/src/linux/Documentation/cgroups/cpusets.txt
):
mount -t cgroup -ocpuset cpuset /sys/fs/cgroup/cpuset cd /sys/fs/cgroup/cpuset mkdir Charlie cd Charlie # List of CPUs in this cpuset: echo 2-3 > cpuset.cpus # List of memory nodes in this cpuset: echo 1 > cpuset.mems echo $$ > tasks # The subshell 'sh' is now running in cpuset Charlie # The next line should display '/Charlie' cat /proc/self/cpuset
Remove the cpuset using shell commands:
rmdir /sys/fs/cgroup/cpuset/Charlie
This fails as long as this cpuset is in use. First, you must remove the inside cpusets or tasks (processes) that belong to it. Check it with:
cat /sys/fs/cgroup/cpuset/Charlie/tasks
For background information and additional configuration flags, see
/usr/src/linux/Documentation/cgroups/cpusets.txt
.
With the cset
tool, proceed as follows:
# Determine the number of CPUs and memory nodes cset set --list # Creating the cpuset hierarchy cset set --cpu=2-3 --mem=1 --set=Charlie # Starting processes in a cpuset cset proc --set Charlie --exec -- stress -c 1 & # Moving existing processes to a cpuset cset proc --move --pid PID --toset=Charlie # List task in a cpuset cset proc --list --set Charlie # Removing a cpuset cset set --destroy Charlie
Using shell commands, proceed as follows:
Create the cgroups hierarchy:
mount -t cgroup cgroup /sys/fs/cgroup cd /sys/fs/cgroup/cpuset/cgroup mkdir priority cd priority cat cpu.shares
Understanding cpu.shares:
1024 is the default (for more information, see
/Documentation/scheduler/sched-design-CFS.txt
)
= 50% usage
1524 = 60% usage
2048 = 67% usage
512 = 40% usage
Changing cpu.shares
echo 1024 > cpu.shares
This is a simple example. Use the following in
/etc/cgconfig.conf
:
group foo { perm { task { uid = root; gid = users; fperm = 660; } admin { uid = root; gid = root; fperm = 600; dperm = 750; } } } mount { cpu = /mnt/cgroups/cpu; }
Then start the cgconfig service and stat
/mnt/cgroups/cpu/foo/tasks
which should show the permissions
mask 660
with root
as an owner and
users
as a group. stat
/mnt/cgroups/cpu/foo/
should be 750
and all
files (but tasks
) should have the mask
600
. Note that fperm
is applied on
top of existing file permissions as a mask.
For more information, see the cgconfig.conf
man
page.
Kernel documentation (package kernel-source
):
files in /usr/src/linux/Documentation/cgroups
.
http://lwn.net/Articles/604609/—Brown, Neil: Control Groups Series (2014, 7 parts).
http://lwn.net/Articles/243795/—Corbet, Jonathan: Controlling memory use in containers (2007).
http://lwn.net/Articles/236038/—Corbet, Jonathan: Process containers (2007).