2021.2.1-p1 Release Notes

New Features

Internal Number Products Case Number Description
VOV-14477 All CS0257852 Taskers running as non-root will no longer get sent jobs unless the job's user matches the non-root tasker's userid. This is to address a situation where a job running on a non-root tasker gets access to the user's data on the filesystem. This policy can be disabled by setting allowForeignJobsOnUserTaskers to 1.
VOV-14419 All None Multiphase support is provided by two additional command arguments to nc run: -multiphase [1|0] and -mpres "resource string"

-multiphase 1 enables multiphase jobs.

-mpres sets the resources that will be used for each phase. The '%' is used as a delimeter for the resources of each phase, for example, -mpres "linux64 foo%linux64 bar:linux64 baz.

By specifying the resources of each phase and designating that certain resources are only allocated to certain taskers, you can run different phases of a job on different taskers.

For example, I have two taskers named tasker1 and tasker2. I want to run phase 1 and 3 on tasker1, and phase 2 on tasker2. My resources may look like:
 vtk_resourcemap_set License:blue UNLIMITED License:blue_tasker1 
vtk_resourcemap_set License:red UNLIMITED License:red_tasker2 
vtk_resourcemap_set License:blue_tasker1 1 tasker1 
vtk_resourcemap_set License:red_tasker2 1 tasker2 
I could then run a multiphase job as:
nc run -multiphase 1 -mpres "linux64 License:blue%linux64 License:red%linux64 License:blue" -- -e BASE -D /home/jjmcwill/testDir/testMultistage.sh
A multiphase job will have two new Job Properties set:
  • MPRESOURCES: Contains the same resources passed in -mpres, and is used to reset the job resources for each phase.
  • MPCURRENTPHASE: Contains an integer indicating the current job phase. It starts at one, and has a max value of 9.
The running job script will see an environment variable named VOV_JOB_PHASE which is set to the current phase. The script writer will need to use that to decide what work to do for that phase.

If the script exits with an exit code of 216, nc will increment the job phase, change the job resources, and reschedule the job to run again. If the script exits with an exit code of 0, the job is considered "Done", and MPCURRENTPHASE is reset to 1.

Failed jobs:

If a job fails during a phase with a code other than 0 or 216, it is considered FAILED and MPCURRENTPHASE will not increment. If the job is invalided and re-run (for example, nc rerun -f JOBID), the job will re-run starting at MPCURRENTPHASE and further phases will run if the job exits with code 216, as described above.

Logging:

After the first phase is run, subsequent phases of the job will have the command rewritten so that the wrappers are passed "-a -A", telling the wrappers to append to the job log. This is so that all phases of the job get their stdout and stderr logged to the same file. If this was not done, each phase of the job would overwrite the log, and the user would only see the output from the last phase that was run. If nc does not detect one of the standard vov wrappers at the beginning of the command line, it will assume the command is not using a wrapper. In this case, it will look for the standard ">" redirect symbol in the command and replace it with ">>".

REST Support:

In the payload for submitting a job via rest, two new fields are allowed: multiphase and mpres. Setting multiphase = True enables multiphase job support. Setting the mpres field behaves the same as described for the command line argument described above. Re-running a multiphase job that has failed via the REST re-run API will behave similarly to rerunning a failed multiphase job from the command line as described above.

Resolved Issues

Internal Number Products Case Number Description
VOV-14465 All CS0257852 Taskers running as non-root will no longer get sent jobs unless the job's user matches the non-root tasker's userid. This is to address a situation where a job running on a non-root tasker gets access to the user's data on the filesystem. This can be disabled by setting allowForeignJobsOnUserTaskers to 1.
VOV-14217 Accelerator Plus CS0237217 An issue that prevented DP jobs from successfully being run via Accelerator Plus has been resolved.
VOV-13910 Accelerator Plus CSO213212, CSO303516 If a job is dispatched to a tasker that is in the process of exiting, the job will be refused by the tasker and automatically rescheduled for execution up to the maximum number of times allowed by the autoRescheduleCount server configuration parameter.