Distributed Parallel Support
Accelerator supports jobs that require multiple CPUs that can be located on different machines.
When a Distributed Parallel (DP) job is submitted to the scheduler, the job is divided into partial jobs. Each partial job can require a different set of resources and cumulatively requires all the resources of the Distributed Parallel job. Accelerator schedules each partial job separately. When all partial jobs have been dispatched, one designated partial job executes the actual Distributed Parallel job. Depending on the submission method, there are several ways to take advantage of the computing resources assigned to the other partial jobs.
Submission Methods
There are three common methods to submit Distributed Parallel jobs.
- Use -dp to submit a Distributed Parallel job with N partial jobs
- This method creates a Distributed Parallel job with two partial jobs. When both partial jobs have been dispatched, the active partial job becomes the master and begins the execution of the command sleep 10, while the other partial job waits for the previous partial job to terminate. By default, the active partial job is the first one.
- Use -dp N with vovparallel LSB_HOSTS
- When all N partial jobs have been dispatched,the active partial jobs executes vovparallel, which sets the environment variable LSB_HOSTS before invoking the master command. LSB_HOSTS contains the list of all hosts, possibly with repetitions, currently set aside to run all partial jobs of the Distributed Parallel job. The master command is expected to use rsh or ssh to reach out to those hosts and launch the appropriate software.
- Use -dp N together with
vovparallel clone
- This method clones the specified wrapper script across the selected hosts. The wrapper script then executes the tool-specific commands on the appropriate hosts and will have access to the Distributed Parallel environment variables and job properties.
Partial Job Rank and Resources
% nc run -dp 8 -dpactive 8 uname -a
To define the resource list for each component of the Distributed Parallel job, use the -dpres option. This option takes a comma-separated list of simple resources lists. The first element in the list is used for the first partial job, the second element for the second partial job and so on. If there are more partial jobs than elements in the list, the last element in the list is used for all remaining partial jobs.
% nc run -dp 10 -dpres "linux,macosx,unix " -dpactive 2 vovparallel clone mywrapper
Distributed Parallel Properties
- DP_ACTIVERANK
- The rank of the active partial job, which is the partial job that launches the top-level job.
- DP_ATTEMPTS
- The number of times the partial jobs have failed to be allocated within the allowed time.
- DP_COHORTWAIT
- This property is created when
-nocohortwait
is passed with nc run. This instructs partialTool for each cohort task to finish when its subtask process has finished rather than wait for the primary job to complete (which is the default behavior). Using-nocohortwait
sets DP_COHORTWAIT to zero (0). Otherwise, the default is 1. - DP_FAILURE
- Explanation of failures.
- DP_HOSTS
- List of hosts for the distributed parallel job . This property is set when the last partial job is launched. The list may contain duplicates.
- DP_PORT_X
- This is set by partial tool X to the pair host dp_port.
- DP_SEMAPHORE
- Used by partial jobs to count the rank.
- DP_SEMAPHORE2
- A second synchronization counter used by partialTool.
- DP_SETID
- The ID of the set that contains the list of partial jobs.
- DP_WAIT
- Tells the partial job how long to wait before giving up the slot (default 30 sec).
- DP_WAIT_MAX
- This is the maximum allowed value for DP_WAIT for a job (default 30 mins).
- DP_PART_OF
- The top-level job ID.
Properties can be accessed via the vovprop command or via the Tcl API with vtk_prop_get. Once obtained, they can be used in conjunction with rsh/ssh, for example, to interact with the chosen hosts.
Using Different Resources and Jobclasses
When specifying DP jobs, you can stack jobclasses so that the primary and component jobs have different resources and job classes.
This is done by setting VOV_JOB_DESC(dp,resources) and VOV_JOB_DESC(dp,jobclasses) to specify the resources and jobclass labels for the master and subcomponent DP jobs.
mycalibre_a.tcl
and
mycalibre_b.tcl
are defined as follows:
::::::::::::::
mycalibre_a.tcl
::::::::::::::
# Copyright (c) 1995-2021, Altair Engineering
# All Rights Reserved.
# $Id: //vov/trunk/src/scripts/jobclass/short.tcl#3 $
set classDescription "My Calibre job resources"
puts "This is mycalibre jobclass"
if { [ info exists VOV_JOB_DESC(dp,resources) ] } {
set VOV_JOB_DESC(dp,resources) [ string cat $VOV_JOB_DESC(dp,resources) ",RAM/200 CORES/2" ]
set VOV_JOB_DESC(dp,jobclasses) [ string cat $VOV_JOB_DESC(dp,jobclasses) ",mycalibre_a" ]
} else {
set VOV_JOB_DESC(dp,resources) "RAM/200 CORES/2"
set VOV_JOB_DESC(dp,jobclasses) "mycalibre_a"
}
proc initJobClass {} {
}
::::::::::::::
mycalibre_b.tcl
::::::::::::::
# Copyright (c) 1995-2021, Altair Engineering
# All Rights Reserved.
# $Id: //vov/trunk/src/scripts/jobclass/short.tcl#3 $
set classDescription "My Calibre job resources"
puts "This is mycalibre2.tcl"
if { [info exists VOV_JOB_DESC(dp,res,*)] } {
puts "VOV_JOB_DESC(dp,res,*) already exists. This is good!"
} else {
puts "VOV_JOB_DESC(dp,res,*) does not appear to exist. This is bad."
}
if { [ info exists VOV_JOB_DESC(dp,resources) ] } {
set VOV_JOB_DESC(dp,resources) [ string cat $VOV_JOB_DESC(dp,resources) ",RAM/400 CORES/4" ]
set VOV_JOB_DESC(dp,jobclasses) [ string cat $VOV_JOB_DESC(dp,jobclasses) ",mycalibre_b" ]
} else {
set VOV_JOB_DESC(dp,resources) "RAM/400 CORES/4"
set VOV_JOB_DESC(dp,jobclasses) "mycalibre_b"
}
proc initJobClass {} {
}
nc run -v 5 -C mycalibre -C mycalibre2 -e BASE -J jeffjob -dp 4 vovparallel clone sleep 1
This results in the primary job having the jobclass "mycalibre_a
"
and the resources "RAM/200 CORES/2" while the secondary jobs have a jobclass of
"mycalibre_b
" and the resources "RAM/400 CORES/4".
Distributed Parallel Slot Timeout
With all of the methods described above, each partial job waits a finite amount of time for all other components to show up . If the time elapses, then the partial job gives up, fails, and returns the slot to the farm. The wait time is 30 seconds by default, but this may be larger if the farm is heavily loaded. The wait time can be specified using the dpwait option.
-P DP_WAIT_MAX=12.0
vovparallel Clone
#!/bin/csh -f
#
# Example of a script to be used with vovparallel clone.
#
# Example of usage:
# % nc run -dp 4 simple_dp_script.csh
#
set sleepTime = 30
if ( $#argv > 1 ) then
set sleepTime = $1
endif
# Each application has its own way to determine a rendezvous port.
# In this example, it is a fixed port.
set APPLICATION_PORT = 2345
if ( ! $?VOV_DP_RANK ) then
echo "ERROR: Variable VOV_DP_RANK not defined."
echo " This script needs to be run with vovparallel clone ..."
exit 0
endif
echo "Hello! I am component $VOV_DP_RANK"
if ( $VOV_DP_RANK == 1 ) then
echo "This is the master. "
echo "This should open a socket for communication e.g. $APPLICATION_PORT"
set DP_HOSTS = `vovprop GET $VOV_DP_TOPJOBID DP_HOSTS`
echo $DP_HOSTS
sleep $sleepTime
else
echo "This is the tasker"
set DP_HOSTS = `vovprop GET $VOV_DP_TOPJOBID DP_HOSTS`
set masterHost = $DP_HOSTS[1]
echo "This should communicate with master through ${masterHost}:$APPLICATION_PORT"
sleep $sleepTime
endif
exit 0
A more advanced script to use with vovparallel clone is available in $VOVDIR/eda/MentorGraphics/vovcalibremt.
OpenMPI Support
If you have an application that uses OpenMPI, you can submit it as a Distribute Parallel application (options -dp, -dpres, etc.) and you need to use the wrapper "vovmpirun".
% nc run -dp <N> vovmpirun ./path/to/application