Array Jobs offers a mechanism for submitting and managing collections of similar jobs quickly and easily. Although, every job in the array must have the same job setup such as job size, memory, and time.
For instance, Job arrays are useful if a user wants to execute the same analysis on 100 separate files. So, instead of submitting N jobs individually, you can combine them into one array job. These provide benefits to both the user and the SLURM Workload Manager.
Submitting array jobs
Specify Array Job
To create an array job, use the
-a option to specify the number of tasks as a range of task IDs.
Job array indices for the number of tasks can be specified in several different ways.
- Submit a job array with index values between 0 and 31
The job array handling in SLURM is extremely flexible. A comma-separated list of task numbers can be used instead of a task range, especially to rerun a few unsuccessful jobs from a previously finished job array.
- Submit a job array with index values of 1, 3, 5 and 7
- Submit a job array with index values between 1 and 31 with a step size of 2 (i.e. 1, 3, 5, 7, 9 … 31)
- Submit a job array with index values ranging from 0 to 5000 and limit the number of concurrently running jobs to no more than 50.
Job ID and Variables
SLURM creates two new environment variables
SLURM_ARRAY_TASK_ID for the job arrays.
|Environment variable||SBATCH Field Code||Description|
|$SLURM_ARRAY_JOB_ID||%A||stores the value of the parent job submission.|
|$SLURM_ARRAY_TASK_ID||%a||stores the value of the array index.|
Naming output files
In SLURM, the output files are named
SLURM-<jobid>_<taskid>.out by default. So, the variables for the job ID
(%A) and the task ID
(%a) can be used when renaming the output or error files.
#SBATCH --output=array_test_%A_%a.out #SBATCH --error=array_test_%A_%a.error
The above SBATCH directive will generate a file named
array_test_654_10.out which will be written for the 10th task of job 654.
Remember to pay special attention to naming output files. If you only use
'%A' in the
--output flag, all array tasks will attempt to write to a single file. In the output file name, make sure to include both
Writing a job submission Script with Array Jobs
Finally, putting it all together, you can write a job submission script. The following script will generate a job array with five sub-jobs :
#!/bin/bash #SBATCH --job-name=testArray #SBATCH --array=1-5 #SBATCH --output=array_%A_%a.out #SBATCH --error=array_%A_%a.err #SBATCH --gres=gpu:1 #SBATCH --time=00:10:00 #SBATCH --mem=400 # Print the task id. echo "My SLURM_ARRAY_TASK_ID:" $SLURM_ARRAY_TASK_ID # Add lines here to run your computations.
In this simple example,
--array=1-5 requests five array tasks. In each array task, the environment variable
SLURM_ARRAY_TASK_ID is set to a unique value ( ranging from 1 to 5).
(base)[uw76577@ada ~]$ sbatch job_array_script.sh Submitted batch job 36777
Submitting the script to SLURM will return the parent
So, every sub-job in the above job array would have an
SLURM_ARRAY_JOB_ID that consists of both the parent
SLURM_ARRAY_JOB_ID and a unique
SLURM_ARRAY_TASK_ID separated by the character underscore
36777_1 36777_2 36777_3 36777_4 36777_5
You might want to create a pipeline in which a new job can only be started when all prior jobs have been completed. SLURM provides a way to implement such pipelines with its
For example, If you want to launch a job only after a job with job_id identifier is successfully completed in a non-error state, use the below
Below are some other dependency options:
||A Submitted Job will be launched after job with job_id identifier terminated i.e. completed successfully or failed.|
||A Job will be launched if and only if a job with job_id identifier failed. If job_id is a job array, then at least one job in that array failed.|
||Jobs will run one at a time.|
Checking Job Status
You may check the status of all the array jobs using
squeue. Detailed information for each job can be viewed using
scontrol show job <SLURM_ARRAY_JOB_ID>
Detailed information for each sub-job can be seen with the below command.
scontrol show job <SLURM_ARRAY_JOB_ID>_<SLURM_ARRAY_TASK_ID>
Deleting job arrays and tasks
To delete all of the tasks of an array job, use scancel with the job ID:
To delete a single task, specify the task ID with Job ID: