Job Cohorts

Note: This is a new experimental concept in the 2017 release.

A job cohort is a collection of jobs that require special FIFO scheduling. These jobs are always given a FairShare rank of 0 and are therefore scheduled "as soon as possible".

These type of jobs typically occurs in conjunction with large distributed parallel jobs. If a wide DP job has been partially dispatched, it is important that the rest of the partial jobs get dispatched right away, else we waste a lot of time waiting for the rendezvous. In the current implementation of partialTool if at least 25% of the DP job has been dispatched, then all the remaining DP job components become a cohort and will be dispatched "as soon as possible," therefore bypassing other scheduling rules.

Another application of job cohorts is to schedule sets of jobs that are relatively short, but only have value when all of them have been executed, for example a "smoke regression test", those short regressions that many organizations require of their engineers before a change to the input data can be checked into the repository. If the smoke test begins and is, say, 50% dispatched, then it becomes very valuable to make sure that the other jobs in the smoke test also get executed, bypassing the FairShare rules.

It is possible to abuse this concept and say that all my jobs are "cohort" jobs. Abuses will be detected and warnings will be issued.

How to Turn On/Off Cohorts

There is only one low level method to activate cohorts, via the vtk_set_cohort API.
vtk_set_cohort $setId  1 ;#   Set the cohort flag on all jobs contained in set.
vtk_set_cohort $setId  0 ;#   Reset the cohort flag on all jobs contained in set.