HPC Advice
This section provides recommendations to obtain the maximum performance from your Accelerator. As Accelerator is a fast system, fine-tuning performance may only be needed when running several hundreds of thousands of jobs daily.
Use the Latest Altair Accelerator Release
The performance of the Accelerator scheduler is frequently updated. Using the most current version is recommended.
Use the vwn Wrapper
The wrapper vwn (alias for
vw -d
) is a faster
wrapped because it avoids communication with vovserver. The regular vw checks the timestamp of
the outputs after the job is done, whereas vwn does not. An
example is shown below:
% nc run -wrapper vwn -array 100 sleep 0
To further push performance of the scheduler, you may want to use two options:
- -nolog: this disables the creation of the log file
- -nodb: this disables the logging of the job execution used for adding job info to the database
% nc run -wrapper vwn -nodb -nolog -array 100 sleep 0
- The benefit of using vwn is speed.
- The disadvantage is that jobs that require the -wl option cannot be run. However, this disadvantage may be not be significant, as -wl adds a relatively high load for what it does: -wl requires an extra notify client to handle the event generated when the job terminates.
Reduce the FairShare Window
When running millions of jobs per day, it is not important to keep a long FairShare history. Typically, a window of 2
to 5 minutes tracks sufficient history. An example follows:
% nc cmd vovfsgroup modrec /some/fs/tree window 2m
Reduce the autoForget Times
By forgetting jobs more quickly, the memory image of vovserver is kept smaller. An example is shown below:
# In policy.tcl
set config(autoForgetValid) 3m
set config(autoForgetFailed) 1h
set config(autoForgetOthers) 1h
Disable Wait Reasons
If analyzing what causes wait time in the workload, the wait reason analysis
can be disabled as shown below:
# In policy.tcl
set config(enableWaitReasons) 0
Wait time analysis can then be re-enable as needed as shown below:
% nc cmd vovsh -x 'vtk_server_config enableWaitReasons 1'
### collect some data for a few minutes, then
% nc cmd vovsh -x 'vtk_server_config enableWaitReasons 0'
Disable File Access
Disabling file access is mostly a high-reliability option. By disabling file access,
the vovserver never looks at any of the files in
the user workspaces, which avoids the risk of disk slowness or disk unavailability.
An example is shown below:
% nc cmd vovsh -x 'vtk_server_config disablefileaccess 2'
Reduce Update Rate of Notify Clients
Notify clients, clients that are tapping the event stream from vovserver (such as
nc gui
,
voveventmon
or nc run -wl
), are updated
immediately in the inner loop of the scheduler. If the environment includes hundreds
of such clients, it may be beneficial to slow down the update rate by setting the
parameter notifySkip. The default value is 0: no skip.
Typically, the more events that take place, the more events that can be skipped
without notice. For example, if several events are taking place, setting
notifySkip to 100, fewer updates may not be noticed. If the
number of events is small, a one-second delay may be noticed in some updates of the
GUI. skipped without notice.Note: Regardless of the setting, the maximum time
between updates is one second.
# In policy.tcl
set config(notifySkip) 100