CGROUPS for Jobs

A cgroup is a control group, used as a system for resource management on Linux.

A cgroup can be used to limit, throttle and account for resource usage per control group. Each resource interface is provided by a controller. Support for cgroup v2 is now enabled. For example, cgroups can be used to isolate core workloads from background resource needs. It prevents one workload from overpowering other workloads. On Linux taskers, a job can be requested to run in one or more cgroups.

cgroup v2

Below is a list of the fundamental differences between cgroup V1 and cgroup v2:

  • Unified hierarchy - resources apply to cgroups now
  • Granularity at TGID (PID), not TID level
  • Focus on simplicity/clarity over ultimate flexibility

Other improvements are shown in the table below.

Description v1 v2
Tracking on non-immediate/multi-source charges No tracking of non-immediate charges

Charged to root cgroup, essentially unlimited

Page cache writebacks and network are charged to the responsible cgroup

Can be considered as part of cgroup limits

Communication with backing subsystems Most actions for non-share based resources reacted crudely to hitting thresholds

For example, in the memory cgroup, the only option was to OOM kill or freeze

Many cgroup controllers negotiate with subsystems before real problems occur

Subsystems can take remediative action (eg. direct reclaim withmemory.high)

Easier to deal with temporary spikes in a resource's usage

Saner notifications One clone() per event for cgroup release, expensive

eventfd() support for others

inotify support everywhere

eventfd() support still exists

One process to monitor everything, if you like

Utility controllers make sense now Utility controllers have their own hierarchies

We usually want to use processes from another hierarchy

As such, we end up manually synchronising

We have a single unified hierarchy, so no sync needed
Consistency between controllers Inconsistencies in controller APIs

Some controllers don't inherit values

Better consistency between controllers
Unified limits Some limitations could not be fixed due to backwards compatibility

memory.{,kmem.,kmem.tcp.,memsw.,[...]}limit_in_bytes

Less iterative, more designed up front

We now have universal thresholds (eg .memory.{high,max})

Syntax

The syntax is similar to requesting any other resource, with the resource name consisting of the prefix CGROUP: followed by the path to the cgroup on the filesystem relative to the root of the cgroup hierarchy.

In the following example, "sleep 120" job is assigned to the cgroup /cpuset/my_cgroup1 and /memory/my_cgroup2:
nc run -r CGROUP:/cpuset/my_cgroup1 -r CGROUP:/memory/my_cgroup2 -- sleep 120 
Note: The groups must exist and be constrained by the cgroup set up on the taskers. If that condition is not met, the job will not launch on the tasker because the resource must exist.

If multiple conflicting cgroups are assigned, such as two cgroups under the /memory hierarchy, the cgroup that is specified last is the cgroup that is assigned to the process.

The special resource CGROUP:RAM can be used to limit the memory usage of a job within a cgroup. In the following example, the job is assigned to a default cgroup that is limited to 2000 megabytes of RAM. As only one job is placed in each default cgroup, the RAM usage can be limited on a per job. The path to this default cgroup is:

<path to cgroup root directory>/memory/<queue name>_<tasker name>_<job slot number>

For example:
nc run -r CGROUP:RAM -r RAM/2000 -- sleep 120 

There is a special case with RAM: specifying CGROUP:RAM and RAM/200 would result in a job being placed in /cgroup/memory/vncCG_buffalo_3 and /cgroup/memory/vncCG_buffalo_3/memory.limit_in_bytes being set to 209715200.

Note: CGROUP:RAM cannot be used with a non-default cgroup. If both CGROUP:RAM and a non-default cgroup are specified, the job will be placed in the specified cgroup without changing that cgroup's RAM usage limit. We strongly recommend specifying a RAM resource when using CGROUP:RAM, as the default value is low (20 megabytes).

To see tasker resource for cgroups use, nc hosts -r and look for CGROUP: entries. If they are not present, ensure that they are set up on the taskers with LSCGROUPS and check the tasker logs for errors.

Enable cgroups

Many Centos, SLES or Ubuntu systems do not install with cgroups available by default. However, Use the steps below to configure Linux for cgroups.

On Centos 6, enable cgroup by installing the libcgroup RPM, and then enabling cgroup with the following:
% sudo service cgconfig start
% sudo chkconfig cgconfig on