Slurm - Requesting features or GPUs


How to request node features

Feature constraints allow you to tell the batch system that a job must only run on a certain subset (e.g. hardware generation) of nodes.  See Slurm's documentation of the --constraint and --prefer options in the sbatch(1) and srun(1) man-pages for more detail.

Example: request a farm19 node

#!/bin/sh
#SBATCH --constraint=farm19
srun hostname

Feature List

Generally speaking, jobs specified to require features not listed below will never run.  They may queue indefinitely, or be outright rejected by the batch system upon submission.  If you absolutely must continue to run such jobs, please submit your plan and timeline for no longer needing to as ServiceNow Incident.

FeatureNodes
el9with AlmaLinux 9 (redundant
since June 18th, 2024, now that
every available node is EL9)
farm16, farm18,
farm19, farm23,
sciml19, sciml21,
sciml23, sciml24
by generation of hardware (see
descriptions at scicomp.jlab.org.)
farm are for CPU-only jobs and
sciml for GPU jobs.
cx4ibhaving ConnectX-4 IB HCAs
(a subset of farm19)


It is intended that other requirements be able to expressed in terms of the above, e.g. Intel processors for CPU-only jobs would be farm16|farm18 and AMD would be farm19|farm23.  Excluding the legacy mlx4 driver would be cx4ib|farm23.  The most up-to-date information on what node features are available has can be queried directly from Slurm with sinfo -o %b.  Please submit a ServiceNow incident if you are unable to express your job's requirements to Slurm.

How to request GPUs

We separate GPUs from the production partition. To use GPUs, you must use the gpu partition.

How to request GPU(s)

If your jobs require GPU access, you must specify that using --gres as documented in Slurm.

--gres=gpu:[type:]<number>

An optional type can be supplied along with the number of GPUs requested.

Examples requesting GPU type and count

Request any available GPU:

#SBATCH --partition=gpu
#SBATCH --gres:gpu:1

Request 2 TitanRTX GPUs:

#SBATCH --partition=gpu
#SBATCH --gres=gpu:TitanRTX:2

GPU Types

TitanRTXNVIDIA TitanRTX (max 4 per node)
T4NVIDIA T4 (max 16 per node)
A100NVIDIA A100 (max 4 per node)
A800NVIDIA A800 (max 4 per node)