Sequence of Firing of a Job on a Tasker
This is the sequence to fire a job using vovtaskerroot running on
UNIX.
- The tasker is running as root
- The tasker receives from vovserver a request to start a job. The request contains the information about the job and about all the properties attached to the job, including PRECMD and POSTCMD.
- If the flag -E of vovtasker is used,
then change command line of command to execute to
"vovfire $JOBID -l SOME_LABEL > /tmp/vovfire.$JOBID.log
. - If a PTY is requested, create the PTY and connect to the process on the submission host.
- Compute affinity mask if requested (NUMA control).
- Make sure all groups for the executing user have been cached.
- fork() a subtasker. The parent process goes back to the main loop. The child, that is, the subtasker, will be used to shepherd the job.
- The subtasker sets its own affinity mask (if required).
- The subtasker creates the PTY and connects it to the submission process.
- If VOV_DEBUG_TASKER is set, the subtasker sleeps 10 seconds (to allow connection of a debugger)
- The subtasker tries 3 times to switch user identity
(
uid
andgid
). In each attempt,- it switches
gid
withsetgid()
- it switches
uid
withsetuid()
- it rebuilds the environment for the user (HOME, USER, LOGNAME, SHELL )
- it switches
- The subtasker sets signals XCPU XFSZ PIPE USR1 USR2 to their default behavior.
- The subtasker calls
nice(8 - execPriority)
, based on the value of the execution priority for the job. - If the switch of user identity fails, the system calls the diagnostics
script
vov_diagnistics_setuid
(not as root, but as the owner of Accelerator). - Now the subtasker is running as the user that owns the job.
- The subtasker tries to change directory chdir(). If it fails the first time, it tries a few more times based on VOV_RETRY_CHDIR and VOV_RETRY_CHDIR_SLEEP.
- If the directory cannot be changed, the subtasker calls
vov_diagnostics_chdir
with arguments ID and DIR (the directory that could not be accessed) - The subtasker tries to switch environment
- The subtasker executes the .pre. scripts of the environment. (obsolete, but still there)
- The subtasker executes the precmd script with a system() call. If the precommand exits with a status that is not 0 (zero), the job is done and failed.
- The subtasker executes the job and waits for it to finish.
- The subtasker executes the postcmd script with another system() call. The exit status of the postcmd is used as exit status of the job.