forbidden

Cgroups explained: Limiting Linux Processes

How do you limit a group of processes on Linux? – The classic answer to that is: you don’t. The reason for that is simple. Linux’ historical resource limiting solution – ulimit – works on a process level. If you start a new process, that process has limits of it’s own, which is the direct explanation for the fork bomb’s effectiveness.

Unbeknownst to a lot of sysadmins however, Linux has a new toy to play with: control groups. As you might have guessed these are groups of processes, which limits can be applied to as a whole. How do they work? Bear with me and I’ll show you…

Theory

Before we dive into playing Bastard Operator From Hell let’s look at the theory how this neat little tinker toy actually works.

When talking about cgroups, don’t just think about a bunch of groups where processes can be jailed into. These groups are hierarchical, beginning with the top group which all processes are located in unless set otherwise. You can then go crazy about defining groups, subgroups, sub-subgroups, etc. to facilitate your limitation needs.

When we talk about limits, there are actually two adjacent terms you need to distinguish. One of them are the actual limits like how many bytes per second a group can write to the disk.

The other term is prioritization, which means how many shares of the given resource the group gets from the whole bucket. If we take two processes for example, one with weight of 700 the other 300 and both want to write data to a disk that can do 10 MB/s, the first process roughly gets 7 MB/s whereas the second is limited to 3 MB/s.

Cgroups are also useful for a bunch of other reasons. Using User Bean Counters (UBC) one can count the amount of resources a cgroup has consumed or using the freezer functionality a group of processes can be put on ice and resurrected at will.

On a side note, cgroups is the very technology LXC is using to limit the virtual machines’ resource usage. The same tweaks documented here can be used in an LXC environment as well, which comes quite handy when dealing with runaway VM’s.

Setting up cgroups

Depending on your Linux distribution you may or may not have cgroups already set up. The easiest way to check is to run the mount and look for lines starting with cgroup.

In the event that you do not have cgroup mount points, you need to mount them by adding the following line to your /etc/fstab file:

cgroup  /sys/fs/cgroup  cgroup  defaults  0   0

Then mount /sys/fs/cgroup and there you go you have cgroup support so head into the /sys/fs/cgroup directory and look at the files there. Some distributions (like Ubuntu) mount the separate subsystems (cpu, memory, blkio, etc) info separate directories, some just use a flat stucture. It doesn’t matter much, you just have to go into subdirectories for the different subdirectories if your system works like that.

Managing cgroups

To manage cgroups I recommend using libcgroup as it contains a bunch of utilities for managing cgroups in an easier manner. Nevertheless, I will demonstrate the manual way as well for you hackers out there. If you are configuring cgroups manually you need to save values into files within the /sys/fs/cgroups directory. Don’t mistake them for regular files though, just like /proc they are actually configuration interfaces for the Linux kernel’s internal workings. Therefore I recommend against using a regular editor, just use the echo and cat commands to set the desired values. (If you don’t know how to use them, you probably shouldn’t be playing around with cgroups anyway.)

Creating a cgroup

To create a cgroup simply create a directory in /sys/fs/cgroup or if you have a per-subsystem setup, in the appropriate directory for the the subsystem. The kernel automatically fills the cgroup’s directory with the settings file nodes. If you want to use the toolkit-way, use cgcreate and provide the subsystems you wish to add as a parameter:

cgcreate -g cpu,cpuacct,...:/my_group

Attaching processes

To attach a process just echo the processes ID into the tasks file. Note, that you can only inject one task at a time:

echo 1234 >/sys/fs/cgroup/my_group/tasks

Alternatively you can of course use the cgclassify tool to classify multiple processes:

cgclassify -g cpu,cpuset,...:/my_group 1234 1235 ...

Deleting a cgroup

Deleting a cgroup is a little more tricky as you can’t just use rm -rf since you cannot really remove the “files” in that directory. Instead just use a little trick I found here, which removes the directory with a depth of one:

find my_group -depth -type d -print -exec rmdir {} \;

Again, there is a utility for that:

cgdelete cpu,cpuset,...:/my_group

Resource limiting

Now that you have all your cgroups set up and some processes in them, time to do some limiting. In the beginning all prioritization-type switches are balanced, that is resources are distributed equally among cgroups on the same level. In most cases this alone is enough to stop excessive usage of shared resources like CPU and disk IO. There are times however, when equal sharing is not enough: we want to increase or decrease resource usage of certain groups for certain subsystems.

Limiting CPU usage

CPU limits come in two flavors: binding cgroups to certain CPU cores and limiting the actual usage. To set the CPU affinity of a group we have to use the cpuset subsystem. As discussed before, we will use command line tools to adjust the settings. Let’s look at our process group’s CPU affinity before setting it:

my_cgroup # cat cpuset.cpus
0-7

As you can see the CPU affinity for this group is set to all 8 cores. (If you have more cores, the affinity will be set accordingly.) To adjust the affinity, I simply echo the desired core list into the file:

my_cgroup # echo "0-2,4" >cpuset.cpus

The limit is then applied immediately as you can observe using the top command. Be careful though, setting the CPU affinity of the root cgroup will affect all processese.

If CPU affinity isn’t enough, you can also adjust CPU bandwidth. This means, that you set the weight of a group with the process scheduler. This will still give the process all free CPU, but will give other processes a higher priority when considering CPU allowance. Setting this is done via the cpu.shares option, which defaults to 1024.

echo 2048 >cpu.shares

The final — most strict — setting is the realtime CPU quota a process gets. This only works on realtime scheduling groups and you should not use it unless you know what you are doing!

There are two configuration options, cpu.rt_runtime_us and cpu.rt_period_us. The former limits how long the process can keep the CPU continuously at most, whereas the latter sets the period length for the former setting. In other words if you want a process to access the CPU 4 seconds out of 5, you need to set cpu.rt_runtime_us to 4000000 and cpu.rt_period_us to 5000000. There are a few non-trivial issues in the way however:

  • Setting very small values in either option can result in an unstable system. For details see the Real-Time group scheduling documentation.
  • RT groups have reserved CPU time and are accounted separately from non-RT processes.
    • Did I mention it’s easy to destabilize your system?

      Limiting memory usage

      Limiting memory usage is easy compared to the CPU. You will basically want to limit how much memory a certain process can use, and that’s what memory.limit_in_bytes and memory.memsw.limit_in_bytes are for. memory.limit_in_bytes limits the total memory usage of a group including file cache, whereas memory.memsw.limit_in_bytes limits the amount of memory and swap a group can use. Pay attention however, memory.limit_in_bytes should be set first, otherwise you’ll receive an error. Also note that you don’t need to specify the amount in bytes, you can use the shorthand multipliers k or K for kilobytes, m or M for Megabytes, and g or G for Gigabytes.

      Limiting (disk) IO

      Last but not least among the major limiting features there is the IO. Disk IO was a long time pain in the neck, since no really reliable methods existed for limiting it. With cgroups however, we have a couple of switches available. Again, we have blkio.weight, which behaves just like the CPU shares and deserves no further explanation.

      When weights are not enough, fixed limits come into play. You can either limit by bytes-per-second or by IOPS (IO Operations Per Second). The configuration options are aptly named:

      • blkio.throttle.read_bps_device for read limits in BPS
      • blkio.throttle.read_iops_device for read limits in IOPS.
      • blkio.throttle.write_bps_device for write limits in BPS.
      • blkio.throttle.write_iops_device for write limites in IOPS.

      To adjust them you need to figure out the minor and major number of the device you wish to put a limit on. Easily done though, just use ls -la /dev and look at the line with your device, the numbers in just before the date will be the two numbers you are looking for. To place a limit run the following with your major, minor and byte limits replaced:

      echo "252:2 10485760">blkio.throttle.write_bps_device

      Permanent configuration

      Now that all limits are configured, you might want to make sure they are applied upon restart as all contents of /sys, cgroup configurations are volatile and a reboot deletes them. To persist the configuration one can use the cgconfigparser utility, which is available in the libcgroup toolkit. The config parser takes a configuration file and builds the whole cgroup structure from scratch and also allows for resource and process assignment. For more details see the cgconfig.conf man page.

      Further reading and closing thoughts

      Of course there are a lot of other switches that I couldn’t go into, but plenty of material is readily available to help you with that. If you want to read a more detailed documentation, go over to kernel.org and look at the kernel’s documentation for this feature. Be aware however, that’s no light literature for lazy evenings. For a slightly easier reading and more information about cgroups I also recommend the RedHat Resource Management Guide.

      If you found an error or have anything to add, let me know what you think in the comments below.

11 thoughts on “Cgroups explained: Limiting Linux Processes

  1. brian mullan

    Janos.. great post for summarizing cgroup. I am assuming Network utilization can be managed with cgroup as well. Do you have any examples that you could add for doing that as well?

    Reply
    1. János Pásztor Post author

      Hi Brian, apologies for my late reply. I’m assuming you want to limit bandwidth usage. I don’t know if it can be done by using cgroups only, but you can always use tc (Traffic Control) and netfilter (iptables) to do that. You could for example mark all packets belonging to a certain user with a connmark and then have tc limit the connection that has been marked.

      Reply
      1. Pieter Ennes

        Hi János, are you sure you can CONNMARK packets coming from a certain cgroup? I would be very interested in an example!

        Reply
        1. János Pásztor Post author

          If you want to CONNMARK packets independently from UID’s, I don’t know if that can be done. I’m looking into it, but you’re probably better off using LXC containers because they have a separate IP address and network interface which makes deploying bandwidth filters a lot easier.

          As for the cgroups solution, I’m researching that at the moment. If you want, you can subscribe to this blog so you’ll see when the article comes out.

          Update: if looks like the net_cls cgroup module can be used to assign a TC handle to a specific cgroup. I’m not sure about CONNMARK though.

          Reply
  2. Peeyush

    Hey,

    Thanks for such a good tutorial. I am trying to create a cgroup and I used cgcreate. But there are no files inside. I mean as you said, kernel automatically fills the directory. I am not able to find cpuset.cpus file in the directory my_group.

    Btw, I am using Fedora 18.

    Reply
    1. János Pásztor Post author

      I’ve tested the procedure on Fedora 18, the following works for me on a freshly installed system running as root:

      [root@localhost ~]# cgcreate -g cpuset:/my_group
      [root@localhost ~]# cd /sys/fs/cgroup/cpuset/my_group
      [root@localhost my_group]# ls
      cgroup.clone_children  cpuset.memory_pressure
      cgroup.event_control   cpuset.memory_spread_page
      cgroup.procs           cpuset.memory_spread_slab
      cpuset.cpu_exclusive   cpuset.mems
      cpuset.cpus            cpuset.sched_load_balance
      cpuset.mem_exclusive   cpuset.sched_relax_domain_level
      cpuset.memory_migrate  tasks

      However, be aware that the official resource management guide for Fedora recommends using the /etc/cgconfig.conf file. The documentation is actually quite amazing, so I recommend at least a good look at it.

      Reply
  3. gkout

    Is it possible to completely block a process from running with a specific message?

    For example:
    /usr/local/bin/bad_file should be blocked from executing and print a message to stderr (or file) like “This program is not allowed to run on this server”
    while
    /usr/local/bin/good_file should be allowed execution and print a message to stdout (or file) that would look like: “Program was allowed execution”

    I am not sure if I could include the program name in the message.

    Reply
    1. János Pásztor Post author

      Yes and no. There is a filesystem flag called noexec in Linux, which basically disables executing files directly, however since most Linux system have some kind of interpreter like Python or Perl installed, these can be used to execute code pretty easily: python /usr/local/bin/bad_file.py

      If you really want to lock down your system tightly, you will need to put a lot of effort into it, potentially use a hardened kernel like grsecurity, etc.

      Reply
  4. Pingback: Better Resource Throttling for Processes with cgroups | Random Curiosity

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>