GPU Support

Available GPU-related resources are shown as tasker resources. For example:

>> nc cmd vovselect name,resources from taskers

>> localhost CORES/8 CORESTOTAL#8 CORESUSED#0 GPUS/1 GPUSTOTAL#1 
[(GPU-ab3207e3-6dff-fcbe-d93b-0f91cb2d45c3,Quadro K3100M,RAM:4028MB)] GPUSUSED#0 
MAXNUMACORES#8 MAXNUMACORESFREE#8 PERCENT/100 RAM/32031 RAMFREE#25319 RAMTOTAL#32031 
SLOTS/8 SLOTSSUSP#0 SLOTSTOTAL#8 SLOTSUSED#0 SWAP/979 SWAPFREE#935 SWAPTOTAL#979 
TMP#61842 arch:linux64 clock:3300MHz clockturbo:3700MHz hostname:sys-lap214 machine:x86_64 
osname:Ubuntu osversion:20.04 os:Linux osclass:unix power:1200000 taskergroup:g1 
taskername:localhost timeleft:unlimited uptime:8d00h user:root L1=0.37 L5=2.42 L15=3.50 
localhost CGROUP:RAM GPU:Quadro_K3100M/1 GPU:GPU-ab3207e3-6dff-fcbe-d93b-0f91cb2d45c3/1 
GPU:RAM/402

GPU resources are a bit different from other hardware consumable resources because all GPU consumable resource variant names refer to the same resource (there is only one GPUS resource). Said another way, several resources are linked by VOV: GPUS/n, GPU:Quadro_K3100M/n, GPU:GPU-ab3207e3-6dff-fcbe-d93b-0f91cb2d45c3/n, and GPU:RAM/m. A GPU device maybe reserved by using any one of these resource names, and all of the related consumable GPU resources will be consumed.

The CUDA_VISIBLE_DEVICES environment variable will be set by Accelerator in the user job. This will direct CUDA programs invoked in the job to run on the assigned GPU device.

How to Request GPU Resources

There are a number of ways to request GPU resources:


Request GPUs by count:	`nc run -v 1 -r GPUS/1 – sleep 10` This will consume the specified number of GPUS resources and, in addition, the associated GPUS' GPU:<UUID>, GPU:<Model_Name> and GPU:RAM resources.
Request GPUs by UUID:	`nc run -v 1 -r GPU:GPU-ab3207e3-6dff-fcbe-d93b-0f91cb2d45c3/1 --sleep 10` Request GPUs by device name (UUID). This will consume one GPUS resource, and in addition, the associated GPUs' GPU: <UUID>, GPU: <Model_Name> and GPU:RAM resources.
Request GPUS by Model Name:	`nc run -v 1 -r GPU:Quadro_K3100M/1 -- sleep 10` Request GPUS(s) by GPU model name. This will consume the specified number of GPU:<Model_Name> resources and, in addition, the specified number of GPUS resources, the corresponding GPU:<UUID> resources, and the corresponding GPU:RAM resource quantity.
Request GPUS by RAM:	`nc run -v 1 -r GPU:RAM/1024 – sleep 10` This will request a GPU and the requested GPU:RAM from it, 1 GPUS resource and, in addition, the associated GPUs' GPU:<UUID> and GPU:<Model Name> resources. Note that even though the entire GPU:RAM might not be requested, the entire GPUS is consumed.