Schedule Jobs

The scheduling process controls the order of job execution.

The scheduler in Accelerator is event-driven. When an event occurs that can cause a job to be placed onto a tasker, the scheduler is called. The types of events includes:
  • job submission
  • job termination
  • increased availability in a resource map
  • expiration of a reservation
  • and many others

The jobs that are scheduled to be run, the queued jobs, are organized into buckets. Each bucket contains jobs that have identical scheduling parameters.

The buckets are assigned a rank based on the FairShare statistics. Starting from the buckets of rank 0 (zero), the scheduler attempts to dispatch the top job in the bucket to the best available tasker. Then the scheduler looks at buckets with rank 1 and so on.

Exceptions to FairShare Scheduling

There are a few ways to bypass the FairShare scheduling: manual dispatch of jobs or Job Cohorts.

Seamless Transition to a Cycle-based Scheduler

The scheduler typically executes in a few milliseconds. The scheduler effort required in each cycle increases with the number of buckets, the number of taskers, and the complexity of the resource expressions. It is always possible to overload the scheduler, meaning that the scheduler requires a large fraction of the total CPU time used by vovserver.

The vovserver parameter to control scheduler cycle behavior is called schedSkip, and is expressed in seconds. If the effort required to run the scheduler exceeds the schedSkip threshold, action is taken to reduce the duty cycle allocated to scheduling. To reduce the duty cycle, the vovserver starts skipping scheduler calls, which frees up computing power to be used to service other requests such as listings and job terminations. The result is that the scheduler functionality is effectively bypassed, regardless of bucket priority, until the target "scheduler duty cycle" percentage is attained.The ratio of the computing power allocated to scheduling, which we call the "scheduler duty cycle", is controlled by the parameter schedMaxEffort, an integer that represents the percentage of time we want to allocate to the scheduler.

The default value of schedSkip is 0.1 seconds, while the default duty cycle reserved for scheduling is set to 20%, represented by schedMaxEffort value of 20.

Other Parameters that Control the Scheduler

There are a few more parameters that control the scheduler behavior and could impact performance on large workloads.
Note: There is a built-in dynamic server tuning feature for the maximum jobs dispatched per queue bucket when server is under heavy load conditions.
sched.maxpostponedjobs
Controls the exit from the scheduler when it is hard to dispatch jobs to taskers, i.e. when the scheduler has to postpone many jobs because it cannot find suitable taskers for them. Normally this parameter is set to be much larger than the maximum number of buckets in the system. Our default value is 10,000; this value can be decreased if you have very homogeneous farms where all taskers are identical.
Note: This parameter will disappear in future implementations of the scheduler.
fairshare.maxjobsperbucket
Controls how many jobs are allowed to be dispatched from the same bucket. The default value for this is 20 jobs. After 20 jobs have been dispatched from a bucket, the scheduler moves to the next bucket. Smaller values give a more accurate FairShare accounting. Larger values give a faster dispatch.
Note: This parameter will disappear in future implementations of the scheduler.
fairshare.maxjobsperloop
Controls how many jobs can be dispatched in a single scheduler loop (this includes the scanning of possibly all the buckets). The default value is 20 jobs per loop (note: this could mean that all 20 jobs are from the same bucket). Smaller values give a more accurate FairShare accounting. Larger values give a faster dispatch.
Note: This parameter will disappear in future implementations of the scheduler.
To control the scheduler, an admin can use the command line or the policy.tcl file. For example, to increase the threshold to morph into a cycle-based schedulers from the default 0.1 to 0.3 seconds and to increase the duty cycle from 20% to 40%, the following command line could be used:
% vovsh -x "vtk_server_config schedSkip 0.3"
% vovsh -x "vtk_server_config schedMaxEffort 40"
% vovsh -x "vtk_server_config sched.maxpostponedjobs 10000"
% vovsh -x "vtk_server_config fairshare.maxjobsperloop 20"
% vovsh -x "vtk_server_config fairshare.maxjobsperbucket 20"
Alternatively, the policy.tcl file can be modified:
# This is a fragment of policy.tcl
set config(schedSkip) 0.3
set config(schedMaxEffort) 40
set config(sched.maxpostponedjobs) 10000
set config(fairshare.maxjobsperloop) 20
set config(fairshare.maxjobsperbucket) 40