Multi-node Jobs

Recommendations for running multi-node jobs.

  1. Check that a standard single node job runs correctly to validate the installation, paths, and submission line.
  2. A host must be defined, and then passed to mpirun/mpiexec with --hostfile <hostfilename>.
    mpirun -np < numprocs > hostfile hostfilename > nfx_exe > i <casefile>
    Tip: Learn how to define a host file on the OpenMPI FAQ page.
    Note: Host files depend on the system topology. If PBS is used for scheduling jobs, it is aware of the topology and it is possible to use PBS_NODEFILE by using --hostfile $PBS_NODEFILE.
  3. Use a PBS or an equivalent job scheduler for multi-node runs.
  4. If launching directly from command line without using PBS or any equivalent job scheduler, ssh access between nodes without a password prompt is needed.
    Tip: Learn how to get ssh access without a password on the OpenMPI FAQ page.
    Important: Only use this method if you have an advanced understanding. Consult with your system admin for more information or recommendations.

2022.1 or Newer

It is advised to run general diagnostics on the system to make sure the infiniband connection (packages, connections, etc.) is working.

2021 or Older

It is advised to run general diagnostics on the system to make sure ibverbs (packages, connections, etc.) is working.

For recent versions of nanoFluidX, the ibverbs version of OpenMPI must be sourced. Starting from 2019.1, you can source set_nFX_environment.sh ibverbs.