Interface with Altair Accelerator using vovelasticd

The daemon vovelasticd enables Altair Accelerator or Altair FlowTracer to allocate jobs on CPUs managed by Accelerator while avoiding per-job scheduling overhead.

Jobs that request an LSF queue resource containing the prefix "LSFqueue:" are ignored by vovelasticd. This allows vovlsfd and vovelasticd to cooperate and work together. All jobs that do not request an LSF queue are candidates for vovelasticd.

The basic idea is that, on demand, vovelasticd submits to Accelerator a request to execute vov_elastic_agent. Once vov_elastic_agent connects, Accelerator or FlowTracer can use it to execute one or more jobs.

In addition, the vovtasker connection timeout for elastic vovtaskers can be configured via the config.tcl file.

Configure vovelasticd

In order to use vovelasticd, the configuration file PROJECT.swd/vovelasticd/config.tcl must exist.

You can start from the example shown here, by copying it from $VOVDIR/etc/config/vovelasticd/config.tcl.
# How often should the vovelasticd daemon cycle?
# The value is a VOV time spec and the default is two seconds.
set VOVELASTICD(refresh)                  2s

# Which NC instance should the vovelasticd daemon submit jobs to?
set VOVELASTICD(queue)                    "vnc"

# Environment to be used when submitting an vov_elastic_agent.
set VOVELASTICD(runenv)                   "BASE"

# How old should a job bucket be before vovelasticd considers it for servicing?
# The value is a VOV time spec and the default is 5 seconds.
set VOVELASTICD(bucketAgeThreshold)       0

# What should we call jobs that are submitted to NC?
set VOVELASTICD(elasticJobName)           "$daemonName"

# To which set in the NC instance should our jobs be appended?
set VOVELASTICD(elasticSetName)           "$daemonName:$projInfo(name)@$projInfo(host)@$projInfo(port)"

# How often should we check for failed job submissions?
# When a failed job submission is detected, it is removed
# from the internal data structres used for calculating
# whether or not to submit a new tasker.
set VOVELASTICD(failed,checkfreq)         1m

# How often should we check for sick taskers?
# The value is a VOV time spec and the default is one minute
set VOVELASTICD(sick,checkfreq)           1m

# Remove sick taskers that are older than?
# Value is a VOV time spec, and the default value is five minutes.
set VOVELASTICD(sick,older)               5m

# Should we dequeue any extra taskers?
# Setting this to "1" will cause a dequeue
# of all not yet running vovtasker submissions for a job bucket.
# This only happens after three consecutive refresh cycles
# have gone by with no work scheduled for that bucket.
set VOVELASTICD(dequeueExtraTaskersEnable) 0

# How long should we wait to dequeue any extra taskers?
# The number of refresh cycles
set VOVELASTICD(dequeueExtraTaskersDelay) 3

# What is the maximum number taskers we should start?
# Should be set to a high value to enable lots of parallelism.
set VOVELASTICD(tasker,max)                99

# What is the maximum number of queued taskers per bucket that we should allow?
set VOVELASTICD(tasker,maxQueuedPerBucket) 99

# How many tasker submissions should be done for each
# resource bucket, during each refresh cycle?
# Setting this to "0" will disable blitz functionality.
set VOVELASTICD(tasker,blitz)              0

# What is the minimum number of taskers that will be grouped into an array
# for each resource bucket, during each refresh cycle?
# Setting this to "0" will disable array functionality.
set VOVELASTICD(tasker,jobArrayMin)        0

# What is the maximum number of taskers that will be grouped into an array
# for each resource bucket, during each refresh cycle? The absolute max number
# of taskers supercedes this value.
# Setting this to "0" will disable array functionality.
set VOVELASTICD(tasker,jobArrayMax)        0

# What is the longest a vovtasker should run before self-exiting?
# Ex: if you set it to 8 hours, and queue 4 3-hour jobs:
# the first tasker will run for nine hours (3 x 3-hr > 8-hr) and then exit
# the fourth job will only start when a second tasker has been requeued
# and started by the batch execution system.
# This controls the amount of reuse of a tasker while it processes jobs.
# To avoid the penalties of:
# noticing a tasker is needed
# + submitting to the batch system
# + the batch system to allocate a machine
# You should set this to a high value like a week.
# The value is a VOV time spec
# This is a default value. It can be overriden on a per job basis by putting
# a resource on the job that looks similar to the following.
# MAXlife:1w
set VOVELASTICD(tasker,maxlife)            1w

# How long should a tasker wait idle for a job to arrive?
# The shorter time, the faster the slot is released to the batch system.
# The longer time, the more chances the tasker will be reused.
# The default value is two minutes (usually takes a minute to allocate a
# slot through a batch system).  Value is a VOV time spec
# This is a default value. It can be overriden on a per job basis by putting
# a resource on the job that looks similar to the following.
# MAXidle:2m
set VOVELASTICD(tasker,maxidle)            2m

# Are there any extra resources you wish to pass along to the taskers?
# These resources will be passed directly along to the vovtasker. They
# are not processed in any way by vovelasticd. For example setting
# this to "MAXlife:1w" will not work as you might expect.
set VOVELASTICD(tasker,res)                ""

# What is the vovtasker update interval for resource calculation?
# Value is a VOV time spec, and the default value is 15 seconds.
set VOVELASTICD(tasker,update)             15s

# Do we want to enable debug messages in the vovtasker log files?
# 0=no; 1=yes; default=0
set VOVELASTICD(tasker,debug)              0

# What level of verbosity should the vovtasker use when writing to its the log file?
# Valid values are 0-4; default=1
set VOVELASTICD(tasker,verbose)            1

# How long should the vovtasker try to establish the initial connection to the vovserver?
# Values are in seconds, default is 120 seconds.
set VOVELASTICD(tasker,timeout)            120

# How much buffer should we consider when adjusting tasker,max based on available client connections?
# This number will be subtracted from the available maxNormalClient connections
# Twice this number will be subtracted from the available file descriptors
# Set this number based on how many non-tasker client connections are anticipated for this session.
set VOVELASTICD(client,derate)            50

# What is the name of the launchers directory? (Default: \"./launchers\")
set VOVELASTICD(launchers,dirname)        "./launchers"

# How often should we check the launchers directory for a cleanup?
# The value is a VOV time spec and the default is 10 minutes
set VOVELASTICD(launchers,checkfreq)      10m

# Remove launchers that are older than?
# Value is a VOV time spec, and the default value is one hour.
set VOVELASTICD(launchers,older)          1h

# How often should we check for preempted jobs?
# Value is a VOV time spec, and the default value is one minute.
set VOVELASTICD(preempt,checkfreq)        1m

# What action should we take on a prempted vovtasker?
# Allowable values are STOP and RESERVE.
# STOP is faster than RESERVE but requires > 2013.09u5
set VOVELASTICD(preempt,taskerstop)        "STOP"

# Should we allow vovtasker to be preempted?
# Value is 1 or 0.
# The preemption request must come from an appropriately configured NC.
set VOVELASTICD(preemptTaskersEnable)      0

# In FT 2013.09u6 and possibly later, this setting should be 1,
# if the underlying batch system is NC and NUMA support is required by the taskers.
set VOVELASTICD(resources,numamap)        1

# The maximum number of launcher attempts that should be made in the event the job submission
# to the back-end queue fails. In most cases, this would be due to a misconfigured queue name.
# The counter is reset once the configuration file has been updated.
set VOVELASTICD(maxQueueErrors)           10
This file does not initially exist, so it will have to be manually created. Use the above example as a template. Here is a sequence of commands to set up vovelasticd for a given project.
% vovproject enable <project>
% cd `vovserverdir -p .`
% mkdir vovelasticd
% cp $VOVDIR/etc/config/vovelasticd/config.tcl vovelasticd/config.tcl
% vi vovelasticd/config.tcl ; # Edit config file to suit your installation. 

Start vovelasticd Manually

Use vovdaemonmgr to start vovelasticd manually.
% vovproject enable <project>
% vovdaemonmgr start vovelasticd

Start vovelasticd Automatically

In the directory vnc.swd/autostart create a script called vovelasticd.csh with the following content:
#!/bin/csh -f
vovdaemonmgr start vovelasticd
Don't forget to make the script executable.
% chmod 755 vovelasticd.csh

Debug vovelasticd on the Command Line

When first starting vovelasticd, it is helpful to run it in foreground, possibly with the -v and/or -d options to verify operation as expected.
% vovproject enable <project>
% cd `vovserverdir -p vovelasticd`
% vovelasticd ; # Launch vovelasticd

Specify Job Resources

To submit a job that is sent to be executed on an Accelerator CPU, assign the job resources as you normally would for Accelerator. The resource request is passed along to Accelerator without modification. The following resources require special handling by vovelasticd internally.
Resource Name Explanation Notes
MAXlife:<TIME> This overrides the default maxlife value given to the launched vovtasker. $TIME is a VOV timespec. optional
MAXidle:<TIME> This overrides the default maxidle value given to the launched vovtasker. $TIME is a VOV timespec. optional
vovelasticd:<RESOURCE_NAME> By declaring a resource token vovelasticd:$RESOURCE_NAME in resources.tcl with some finite limit, and using the resource on a job, vovelasticd will track the resource so as not to over submit jobs to the batch queuing system. optional
TAG:<TAG> This allows attributes to be passed to the resulting vovtasker, without being considered by the batch system. This allows the user to force jobs to execute only on vovtaskers that have the corresponding <TAG>. optional

Submit Jobs to Accelerator using Accelerator

Examples of job submission with Accelerator:
% nc run -r MAXlife:1h -- spice abc.spi
% nc run -r CORES/2 MAXidle:1m -- dc_shell -f script.tcl
% nc run -clearcaseResource -r CORES/2 vovelasticd:dcThrottle/2 -- dc_shell -f script.tcl

Submit Jobs to Accelerator using FlowTracer (FDL)

Examples of assigning resources to jobs with FlowTracer:
N "spice"
R "MAXlife:1h"
J vw spice abc.spi

N "dc_shell"
R "CORES/2 MAXidle:1m"
J vw dc_shell -f script.tcl