Autokill of Jobs by vovtasker

If a job has the autokill field set to a positive value, the job will be killed by vovtasker/vovtasker if its duration exceeds the autokill value. The autokill field can be applied to any job of which the status is not Failed or Done.

The check for this condition is done by vovtasker every minute. This frequency is controlled by the "tasker update" value).

If a job is to be auto-killed, the vovtasker can use one of three methods:
  • Direct: the tasker sends the job the signals TERM,HUP,INT,KILL. These can by overridden by defaultStopSignalCascade and defaultStopSignalDelay that can be set in policy.tcl, etc.

    If NC_STOP_SIGNALS and/or NC_STOP_SIG_DELAY are set, then these will be used instead. The format of NC_STOP_SIGNALS can be a comma separated list. Each signal name such as "USR1", or the format ":SIGNAL:includerx:excluderx:skiptop"

    that is,
    nc run -D autokill 10s -P
    NC_STOP_SIGNALS=USR1::python:1,TERM,KILL -P
    NC_STOP_SIGNAL_DELAY=1

    send USR1 excluding any Python processes that are part of the job. Then send TERM and then KILL with a delay of 1s between signals.

    VOV_STOP_SIGNALS and VOV_STOP_SIGNALS_DELAY work similarly to NC_STOP_SIGNALSand NC_STOP_SIG_DELAY

  • NC STOP: the tasker calls nc stop JOBID; this only works for tasker that are running within Accelerator; this method honors the values of NC_STOP_SIGNALS and NC_STOP_SIG_DELAY.
  • VOVSTOP : the tasker calls vovstop -f JOBID to kill the job.

The method used to autokill the jobs can be controlled on each tasker, meaning that all jobs on that tasker, if they need to be autokilled, will be killed with the same method.

Currently there is one way to change the autokill method on a tasker, and that is with this magic incantation:
% vovsh -x 'vtk_tasker_config TASKERNAME_OR_ID autokillmethod VALUE' 
where VALUE can be one of the following keywords:
  • direct
  • ncstop
  • vovstop

In reality, only the first letter of the keyword is used, i.e. 'd', 'n', 'v'. Anything else maps silently to 'direct'.

Examples:
% vovsh -x 'vtk_tasker_config lnx0123 autokillmethod ncstop'
% vovsh -x 'vtk_tasker_config lnx0123 autokillmethod n'
% vovsh -x 'vtk_tasker_config 00234567 autokillmethod direct'