Multiple Queues
Queue Name and Host Considerations
- Of course, each host that runs a vovserver to manage an Accelerator queue needs access to the Altair Accelerator software. In most cases, this is done by automounting the software from a file server, but a host- local install may also be used.
- The name of the queue must begin with the letters 'vnc'. This indicates that
it is an Accelerator
vovserver, so that it will check out the correct
license feature 'server_nc'.
It is helpful if the remainder of the queue name encodes something that identifes the queue. For example, 'vncsj' could represent a server running in San Jose, CA.
It is also required that the queue names are unique within those running on the same or a replicated Altair Accelerator software hierarchy. Two queues may not have the same name that run on the same Altair Accelerator software installation, even if you try to start them on different hosts.
An Accelerator queue is a specialized case of a VOV 'project', and the
ncmgr
command calls thevovproject
command to start the vovserver that manages the queue. The latter command uses the Altair Accelerator registry to store data about all the known projects. - The host and TCP/IP port combination must be unique for all queues. The
default is to compute the port number by hashing the queue name into the
range 6200-6455. You may specify the port number using the
-port
option when first starting the queue.
Create a New Queue
To create a queue with a given name, use the -queue
option of the
ncmgr start command.
% ncmgr start -q vnctest
To configure the new vnctest queue, you could copy the configuration files from vnc.swd except setup.tcl. This should not be copied, because it will usually have a different port number, and must have a different port if running on the same host.
You would also want to edit the taskers.tcl file for the vnc826 queue, so that only a few hosts are in it for testing.
How the Default Queue is Determined
When you have multiple queues, the Accelerator commands act on the default queue when no other is specified. The Accelerator administrator can control the default using files in the NC_CONFIG_DIR directory (usually $VOVDIR/local/vncConfig).
The files in that directory are in Tcl format, and set environment variables used to
determine the vovserver to which your Accelerator command sends RPC. Whatever is set by the one named
vnc.tcl determines the default. The file for each queue has
the form <queue-name>.tcl, and is created by the
ncmgr
command when the queue if first started. A useful trick
is to symbolic link from the queue-specific file to vnc.tcl,
permitting the Accelerator admin to quickly and easily change the
default.
Working with Multiple Queues
- The
-queue
command line option - The NC_QUEUE environment variable
The Accelerator commands accept a -queue
option
to specify which queue to act on. This permits the queue to be selected on a
command-by-command basis, but adds extra typing. You can abbreviate this to
-q
If you will be working primarily with a queue other than the default 'vnc', it is better to set the environment variable NC_QUEUE to the name of the queue.
For example, suppose you have two sites in San Jose, and Andover, MA, and the queues are named named 'vncsj' and vncma' respectively. The Accelerator admin would set the default queue in San Jose to be 'vncsj', and in Andover, to be 'vncma'.
% nc -q vncma list -u carl
% nc -q vncsj list -u carl
Tradeoffs Separating Farm Hosts
- When you divide your compute farm hosts into separate queues, you limit the number of job slots users have to run jobs without specifying a non-default queue.
- More important, once a job is submitted to a queue, it stays there. So, jobs could wait longer if they are submitted to a loaded queue, but there are open slots on a different queue.
- Separate queues permit maintenance shutdowns without completely stopping
batch queue service. Since the addition of the
-freeze
option toncmgr stop
, you can even replace the vovserver binary without needing to stop running jobs, making this less of a concern.