Sequence of Firing of a Job on a Tasker

This is the sequence to fire a job using vovtaskerroot running on UNIX.
  1. The tasker is running as root
  2. The tasker receives from vovserver a request to start a job. The request contains the information about the job and about all the properties attached to the job, including PRECMD and POSTCMD.
  3. If the flag -E of vovtasker is used, then change command line of command to execute to"vovfire $JOBID -l SOME_LABEL > /tmp/vovfire.$JOBID.log.
  4. If a PTY is requested, create the PTY and connect to the process on the submission host.
  5. Compute affinity mask if requested (NUMA control).
  6. Make sure all groups for the executing user have been cached.
  7. fork() a subtasker. The parent process goes back to the main loop. The child, that is, the subtasker, will be used to shepherd the job.
  8. The subtasker sets its own affinity mask (if required).
  9. The subtasker creates the PTY and connects it to the submission process.
  10. If VOV_DEBUG_TASKER is set, the subtasker sleeps 10 seconds (to allow connection of a debugger)
  11. The subtasker tries 3 times to switch user identity (uid and gid). In each attempt,
    1. it switches gid with setgid()
    2. it switches uid with setuid()
    3. it rebuilds the environment for the user (HOME, USER, LOGNAME, SHELL )
    If the switch fails, the subtasker waits 10 seconds before the next attempt.
  12. The subtasker sets signals XCPU XFSZ PIPE USR1 USR2 to their default behavior.
  13. The subtasker calls nice(8 - execPriority), based on the value of the execution priority for the job.
  14. If the switch of user identity fails, the system calls the diagnostics script vov_diagnistics_setuid (not as root, but as the owner of Accelerator).
  15. Now the subtasker is running as the user that owns the job.
  16. The subtasker tries to change directory chdir(). If it fails the first time, it tries a few more times based on VOV_RETRY_CHDIR and VOV_RETRY_CHDIR_SLEEP.
  17. If the directory cannot be changed, the subtasker calls vov_diagnostics_chdir with arguments ID and DIR (the directory that could not be accessed)
  18. The subtasker tries to switch environment
  19. The subtasker executes the .pre. scripts of the environment. (obsolete, but still there)
  20. The subtasker executes the precmd script with a system() call. If the precommand exits with a status that is not 0 (zero), the job is done and failed.
  21. The subtasker executes the job and waits for it to finish.
  22. The subtasker executes the postcmd script with another system() call. The exit status of the postcmd is used as exit status of the job.