Troubleshooting
This section covers typical problems that come up in Accelerator.
The command nc summary will be useful here, as it tells you how
many jobs are failed, queued, or idle, and what the queued jobs are waiting for. For
example:
% nc summary
NC Summary For User bkring
TOTAL JOBS 0 Duration: 0s
Done 0
Idle 0
Queued 0
Running 0
Failed 0
My job won't start!
Often, your job won't start because it is waiting for a resource, usually a license, CPU, or memory.
To diagnose this, use the command:
nc info jobid
A job will only start if all resources requested by the job are available. If any
resource is missing, the job will not start. You can look at the resources of all
available vovtaskers to see if there is any that can run the job
with the command:
nc host
Additionally, a vovtaskers must either be READY or WRKNG (have a free job slot) to accept jobs. Any other condition will prevent the vovtaskers from taking the job.
My job failed!
You can find the reason for job failure with the
command:
nc info jobid
Some common failure conditions include:
- The job failure has nothing to do with Accelerator. Run the job without Accelerator to verify this.
- The job command doesn't exist, possibly because of a typo.
- You are using a wrong, nonexistent, or incomplete environment with the -e. In this case, nc info jobid will tell you that it cannot switch to the environment.
- You have failed to specify (or specified the wrong) architecture or memory
usage. This can be done with the -r option. For example,
-r linux64
for Linux 64 bit or-r RAM/2000
for 2GB of ram.