RedHawk by Apache/Ansys

To run RedHawk in distributed parallel mode, you need to use the scripts in $VOVDIR/eda/Ansys/redhawk.
  • If you are running a version of Altair Accelerator before 2014.03, you need to apply the "redhawk_scripts" patch. Request the patch from https://www.pbsworks.com/ContactSupport.aspx
  • You need to create an environment called REDHAWK. Use as template the file in $VOVDIR/eda/Ansys/redhawk/REDHAWK.start.sh
    % cp $VOVDIR/eda/Ansys/redhawk/REDHAWK* $VOVDIR/local/environments/.
    % vi $VOVDIR/local/environments/REDHAWK.start.sh
  • Test the environment with this sequence:
    % ves BASE+REDHAWK
    % which redhawk              ## Do you have redhawk in the path?
    % which nc_redhawk           ## Do you have nc_redhawk?
    % lmstat -f redhawk          ## Is LM_LICENSE_FILE correct?
    
  • You may need to identify the taskers in your farm that can run RedHawk. If all taskers are ok, you may skip this step. To identify the taskers, use the taskerClass.table file and add the resource "hasRedhawk" to selected taskers.
    lnx0021:  hasRedhawk
    lnx0022:  hasRedhawk
    lnx0023:  
    lnx0024:  hasRedhawk
  • You need an nc.cfg configuration file for the -dmp option of redhawk.
    GRID_TYPE RTDA
    ## This number must match the number in option -dp in nc run.
    NUMBER_OF_JOBS 4
    
    ## This assumes you have a jobclass called "redhawk"
    QUEUE_NAME redhawk

Running Without License Management

To get started and understand how distributed processing works, let's run without worrying about licenses.

We want to run a redhawk job using the script run.tcl. We want to run the job on 4 machines, in addition to the master process, i.e. we need 5 parallel components. For now, let's assume that all components are similar in terms of resource requirements.
% ves BASE+REDHAWK
% setenv DISPLAY "good_name_for_DISPLAY:XX"
% nc run -e SNAPSHOT+D,DISPLAY=$DISPLAY -profile -preemptable 0 \
    -dp 4 -dpres hasRedhawk -dpwait 3m \
        nc_redhawk \
       -lmwait  -dmp nc.cfg -f run.tcl 

Explanation of options

-e SNAPSHOT+D,DISPLAY=$DISPLAY
Take the current environment, including the DISPLAY variable. This may not be necessary if running in batch mode (option -b of redhawk).
-profile
Track RAM and CPU usage of each component of the job. In the case of redhawk, you need to have 2014.03 or more to see the usage, because of the way the processes of redhawk detach themselves from their parents.
-preemptable 0
In general, you do not want to preempt jobs that are as complex as these.
-dp 4
We want 4 processes, one of which becomes the "master" and the other 3 will be the work-horses. The master runs in the first component of the Distributed Parallel job.
-dpres
Run each component on machines that have the "hasRedhawk" resource.
-dpwait 3m
Wait up to 3 minutes for all components to be started. If 3m pass from the start of the first component and not all components are started, the dispatch is aborted and soon after restarted with a longer wait time.
nc_redhawk
This is the script that is started on the master component and that activates all other components. Techincal note: In Altair Accelerator, all components are up and running when this script runs and the script convinces OpenMPI to use a special ssh script to launch the appropriate command on each of the remote components.
-lmwait -dmp nc.cfg -f run.tcl
The actual redhawk command you want to run