Array Jobs

Array Jobs offers a mechanism for submitting and managing collections of similar jobs quickly and easily. Although, every job in the array must have the same job setup such as job size, memory, and time.

For instance, Job arrays are useful if a user wants to execute the same analysis on 100 separate files. So, instead of submitting N jobs individually, you can combine them into one array job. These provide benefits to both the user and the SLURM Workload Manager.

Submitting array jobs

Specify Array Job

To create an array job, use the -–array or -a option to specify the number of tasks as a range of task IDs.

Job array indices for the number of tasks can be specified in several different ways.

Submit a job array with index values between 0 and 31

#SBATCH --array=0-31

The job array handling in SLURM is extremely flexible. A comma-separated list of task numbers can be used instead of a task range, especially to rerun a few unsuccessful jobs from a previously finished job array.

Submit a job array with index values of 1, 3, 5 and 7

#SBATCH --array=1,3,5,7

Submit a job array with index values between 1 and 31 with a step size of 2 (i.e. 1, 3, 5, 7, 9 … 31)

#SBATCH --array=1-31:2

Submit a job array with index values ranging from 0 to 5000 and limit the number of concurrently running jobs to no more than 50.

#SBATCH --array=0-5000%50

Job ID and Variables

SLURM creates two new environment variables SLURM_ARRAY_JOB_ID and SLURM_ARRAY_TASK_ID for the job arrays.

Environment variable	SBATCH Field Code	Description
$SLURM_ARRAY_JOB_ID	%A	stores the value of the parent job submission.
$SLURM_ARRAY_TASK_ID	%a	stores the value of the array index.

Naming output files

In SLURM, the output files are named SLURM-<jobid>_<taskid>.out by default. So, the variables for the job ID (%A) and the task ID (%a) can be used when renaming the output or error files.

For example:

#SBATCH --output=array_test_%A_%a.out
#SBATCH --error=array_test_%A_%a.error

The above SBATCH directive will generate a file named array_test_654_10.out which will be written for the 10th task of job 654.

Remember to pay special attention to naming output files. If you only use '%A' in the --output flag, all array tasks will attempt to write to a single file. In the output file name, make sure to include both %A and %a.

Writing a job submission Script with Array Jobs

Finally, putting it all together, you can write a job submission script. The following script will generate a job array with five sub-jobs :

job_array_script.sh

#!/bin/bash

#SBATCH --job-name=testArray
#SBATCH --array=1-5
#SBATCH --output=array_%A_%a.out
#SBATCH --error=array_%A_%a.err
#SBATCH --gres=gpu:1
#SBATCH --time=00:10:00
#SBATCH --mem=400

# Print the task id.
echo "My SLURM_ARRAY_TASK_ID:" $SLURM_ARRAY_TASK_ID

# Add lines here to run your computations.

In this simple example, --array=1-5 requests five array tasks. In each array task, the environment variable SLURM_ARRAY_TASK_ID is set to a unique value ( ranging from 1 to 5).

(base)[uw76577@ada ~]$ sbatch job_array_script.sh
Submitted batch job 36777

Submitting the script to SLURM will return the parent SLURM_ARRAY_JOB_ID.

So, every sub-job in the above job array would have an SLURM_ARRAY_JOB_ID that consists of both the parent SLURM_ARRAY_JOB_ID and a unique SLURM_ARRAY_TASK_ID separated by the character underscore "_"

Dependencies

You might want to create a pipeline in which a new job can only be started when all prior jobs have been completed. SLURM provides a way to implement such pipelines with its --dependency option.

For example, If you want to launch a job only after a job with job_id identifier is successfully completed in a non-error state, use the below --dependency option.

#SBATCH --dependency=afterok:<job_id>

Below are some other dependency options:

`--dependency=afterany:<job_id>`	A Submitted Job will be launched after job with job_id identifier terminated i.e. completed successfully or failed.
`--dependency=afternotok:<job_id>`	A Job will be launched if and only if a job with job_id identifier failed. If job_id is a job array, then at least one job in that array failed.
`--dependency=afterany:singleton`	Jobs will run one at a time.

Checking Job Status

You may check the status of all the array jobs using squeue. Detailed information for each job can be viewed using scontrol command.

scontrol show job <SLURM_ARRAY_JOB_ID>

Detailed information for each sub-job can be seen with the below command.

scontrol show job <SLURM_ARRAY_JOB_ID>_<SLURM_ARRAY_TASK_ID>

Deleting job arrays and tasks

To delete all of the tasks of an array job, use scancel with the job ID:

scancel <SLURM_ARRAY_JOB_ID>

To delete a single task, specify the task ID with Job ID:

scancel <SLURM_ARRAY_JOB_ID>_<SLURM_ARRAY_TASK_ID>

High Performance Computing Facility

High Performance Computing Facility

Array Jobs

Submitting array jobs

Specify Array Job

Job ID and Variables

Naming output files

Writing a job submission Script with Array Jobs

Dependencies

Checking Job Status

Deleting job arrays and tasks

High Performance Computing Facility

Submitting array jobs

Specify Array Job

Job ID and Variables

Naming output files

Writing a job submission Script with Array Jobs

Dependencies

Checking Job Status

Deleting job arrays and tasks

Subscribe to UMBC Weekly Top Stories

I am interested in: