Array Jobs

Array Jobs offers a mechanism for submitting and managing collections of similar jobs quickly and easily. Although, every job in the array must have the same job setup such as job size, memory, and time.

For instance, Job arrays are useful if a user wants to execute the same analysis on 100 separate files. So, instead of submitting N jobs individually, you can combine them into one array job. These provide benefits to both the user and the SLURM Workload Manager.

Submitting array jobs

Specify Array Job

To create an array job, use the -–array or -a option to specify the number of tasks as a range of task IDs.

Job array indices for the number of tasks can be specified in several different ways.

  • Submit a job array with index values between 0 and 31
#SBATCH --array=0-31

The job array handling in SLURM is extremely flexible. A comma-separated list of task numbers can be used instead of a task range, especially to rerun a few unsuccessful jobs from a previously finished job array.

  • Submit a job array with index values of 1, 3, 5 and 7
#SBATCH --array=1,3,5,7
  • Submit a job array with index values between 1 and 31 with a step size of 2 (i.e. 1, 3, 5, 7, 9 … 31)
#SBATCH --array=1-31:2
  • Submit a job array with index values ranging from 0 to 5000 and limit the number of concurrently running jobs to no more than 50.
#SBATCH --array=0-5000%50
Job ID and Variables

SLURM creates two new environment variables SLURM_ARRAY_JOB_ID  and SLURM_ARRAY_TASK_ID  for the job arrays.

Naming output files

In SLURM, the output files are named SLURM-<jobid>_<taskid>.out by default. So, the variables for the job ID (%A) and the task ID (%a) can be used when renaming the output or error files.

For example:

#SBATCH --output=array_test_%A_%a.out
#SBATCH --error=array_test_%A_%a.error

The above SBATCH directive will generate a file named array_test_654_10.out which will be written for the 10th task of job 654.

Remember to pay special attention to naming output files. If you only use '%A' in the --output flag, all array tasks will attempt to write to a single file. In the output file name, make sure to include both %A and %a.

Writing a job submission Script with Array Jobs

Finally, putting it all together, you can write a job submission script. The following script  will generate a job array with five sub-jobs :

job_array_script.sh

#!/bin/bash

#SBATCH --job-name=testArray
#SBATCH --array=1-5
#SBATCH --output=array_%A_%a.out
#SBATCH --error=array_%A_%a.err
#SBATCH --gres=gpu:1
#SBATCH --time=00:10:00
#SBATCH --mem=400

# Print the task id.
echo "My SLURM_ARRAY_TASK_ID:" $SLURM_ARRAY_TASK_ID

# Add lines here to run your computations.

In this simple example, --array=1-5 requests five array tasks. In each array task, the environment variable SLURM_ARRAY_TASK_ID is set to a unique value ( ranging from 1 to 5).

(base)[uw76577@ada ~]$ sbatch job_array_script.sh
Submitted batch job 36777

Submitting the script to SLURM will return the parent SLURM_ARRAY_JOB_ID.

So, every sub-job in the above job array would have an SLURM_ARRAY_JOB_ID that consists of both the parent SLURM_ARRAY_JOB_ID and a unique SLURM_ARRAY_TASK_ID separated by the character underscore "_"

36777_1
36777_2
36777_3
36777_4
36777_5

Dependencies

You might want to create a pipeline in which a new job can only be started when all prior jobs have been completed. SLURM provides a way to implement such pipelines with its --dependency option.

For example, If you want to launch a job only after a job with job_id identifier is successfully completed in a non-error state, use the below --dependency option.

#SBATCH --dependency=afterok:<job_id>

Below are some other dependency options:

  • --dependency=afterany:<job_id> 
A Submitted Job will be launched after job with job_id identifier terminated i.e. completed successfully or failed. 
  • --dependency=afternotok:<job_id>             
A Job will be launched if and only if a job with job_id identifier failed. If job_id is a job array, then at least one job in that array failed. 
  • --dependency=afterany:singleton 
Jobs will run one at a time. 

Checking Job Status

You may check the status of all the array jobs using squeue. Detailed information for each job can be viewed using scontrol command.

scontrol show job <SLURM_ARRAY_JOB_ID>

Detailed information for each sub-job can be seen with the below command.

scontrol show job <SLURM_ARRAY_JOB_ID>_<SLURM_ARRAY_TASK_ID>

Deleting job arrays and tasks

To delete all of the tasks of an array job, use scancel with the job ID:

scancel <SLURM_ARRAY_JOB_ID>

To delete a single task, specify the task ID with Job ID:

scancel <SLURM_ARRAY_JOB_ID>_<SLURM_ARRAY_TASK_ID>