HPC Advice

This section provides recommendations to obtain the maximum performance from your Accelerator. As Accelerator is a fast system, fine-tuning performance may only be needed when running several hundreds of thousands of jobs daily.

Use the Latest Altair Accelerator Release

The performance of the Accelerator scheduler is frequently updated. Using the most current version is recommended.

Use the vwn Wrapper

The wrapper vwn (alias for vw -d) is a faster wrapped because it avoids communication with vovserver. The regular vw checks the timestamp of the outputs after the job is done, whereas vwn does not. An example is shown below:
% nc run -wrapper vwn -array 100 sleep 0
To further push performance of the scheduler, you may want to use two options:
  • -nolog: this disables the creation of the log file
  • -nodb: this disables the logging of the job execution used for adding job info to the database
% nc run -wrapper vwn -nodb -nolog -array 100 sleep 0
  • The benefit of using vwn is speed.
  • The disadvantage is that jobs that require the -wl option cannot be run. However, this disadvantage may be not be significant, as -wl adds a relatively high load for what it does: -wl requires an extra notify client to handle the event generated when the job terminates.

Reduce the FairShare Window

When running millions of jobs per day, it is not important to keep a long FairShare history. Typically, a window of 2 to 5 minutes tracks sufficient history. An example follows:
% nc cmd vovfsgroup modrec /some/fs/tree  window 2m

Reduce the autoForget Times

By forgetting jobs more quickly, the memory image of vovserver is kept smaller. An example is shown below:
# In policy.tcl
set config(autoForgetValid)  3m
set config(autoForgetFailed) 1h
set config(autoForgetOthers) 1h

Disable Wait Reasons

If analyzing what causes wait time in the workload, the wait reason analysis can be disabled as shown below:
# In policy.tcl
set config(enableWaitReasons) 0
Wait time analysis can then be re-enable as needed as shown below:
% nc cmd vovsh -x 'vtk_server_config enableWaitReasons 1'

### collect some data for a few minutes, then

% nc cmd vovsh -x 'vtk_server_config enableWaitReasons 0'

Disable File Access

Disabling file access is mostly a high-reliability option. By disabling file access, the vovserver never looks at any of the files in the user workspaces, which avoids the risk of disk slowness or disk unavailability. An example is shown below:
% nc cmd vovsh -x 'vtk_server_config disablefileaccess 2'

Reduce Update Rate of Notify Clients

Notify clients, clients that are tapping the event stream from vovserver (such as nc gui, voveventmon or nc run -wl), are updated immediately in the inner loop of the scheduler. If the environment includes hundreds of such clients, it may be beneficial to slow down the update rate by setting the parameter notifySkip. The default value is 0: no skip. Typically, the more events that take place, the more events that can be skipped without notice. For example, if several events are taking place, setting notifySkip to 100, fewer updates may not be noticed. If the number of events is small, a one-second delay may be noticed in some updates of the GUI. skipped without notice.
Note: Regardless of the setting, the maximum time between updates is one second.
# In policy.tcl
set config(notifySkip) 100