NUMA Control and CPU Affinity

NUMA nodes are localities for CPUs and memory, and it is often best to schedule a Linux process on a CPU that is attached to the NUMA node where its system memory is located. To determine if a Linux host has more than one NUMA node, view the output of the lscpu command:

% lscpu | grep NUMA 
NUMA node(s):          2 
NUMA node0 CPU(s):     0-3 
NUMA node1 CPUs):      4-7

When you manage to restrict the CPUs that processes in a job are permitted to run on, use Linux methods to set the "CPUS allowed list" for the job, which is also called a "CPU affinity list". This type of control is called "CPU placement".

The performance of accessing RAM from a CPU depends on whether the RAM is physically attached to the same CPU or to another CPU. Therefore, application performance can be enhanced by constraining the application to stay within a single physical CPU. A Linux command, numactl, supports this control. For more information, use man numactl.

Accelerator can also automatically set the CPU affinity of an application in order to maximize its execution performance. The vovtasker automatically computes the socket, core, and memory layout of the machine on which it is executing. For each job that requests NUMA control, both the CORES and RAM resources for the job are used to determine where in the layout the job should be placed, depending on the placement type requested.

Note: NUMA is supported only on Linux machines.

CPU Placement Control in Accelerator

CPU placement control is requested by the nc run options -jpp pack or -jpp spread. This is further controlled by the optional CGROUP:CORES quasi-resource as follows:

-jpp pack: assign to the NUMA node with least available CPUs. Packed allocation for jobs that need few CPUs will help avoid fragmentation so that jobs needing more CPUs will find available NUMA nodes.
-jpp spread: assign to the NUMA nodes with most available CPUs within a host.
-r CGROUP:CORES: this is supported only when -jpp pack or -jpp spread is specified, and only on systems with cgroups v2 enabled. Without CGROUP:CORES, the CPU affinity list assigned with –jpp pack or spread is rounded up to an integral multiple of a NUMA node size. If you specify -r CGROUP:CORES, the CPU affinity list assigned to the job will exactly match the cores/cpus/slots number requested.

To determine if a Linux host has cgroups v2 active, invoke the following shell command to see if a “cgroup2” filesystem is mounted.

% mount | grep ^cgroup2 
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)

CPU Placement Examples

Here are some example jobs with CPU placement specified. The nc list command is used to display the job property NUMA_AFFINITY which indicates which CPUs are in the allowed list for the job.

% nc run -r CORES/2 -jpp pack -- my_job 
% nc run -r CORES/4 -jpp spread -- my_job 
% nc run –r CORES/1 CGROUP:CORES –jpp pack – my_job 
% nc list -O "@ID@ @PROP.NUMA_AFFINITY@ @STATUSNC@" 
283569646 NUMA pack:    0   1   2   3   *   *   *   *   Running 
283569654 NUMA spread:  *   *   *   *   4   5   6   7   Running 
283569662 NUMA pack:    *   *   2   *   *   *   *   *   Running

Check Tasker NUMA Status

Each tasker sets and maintains a property on itself named "NUMA_LAYOUT". The property is updated each time a job with a NUMA request begins or ends. To check the current NUMA status for a tasker named "foo":

% nc cmd vovselect prop.NUMA_LAYOUT from taskers where name==foo
Original:           Used/Total    _________+__
Socket:  0   RAM=    512/32089    ****oooooooo
Socket:  1   RAM=   1024/32089    ********oooo