Skip to Main Content

SBATCH File

Understanding how to submit jobs to the clusters using SLURM is the first step toward taking advantage of ADA clusters. The majority of HPC jobs are executed by creating and submitting batch scripts.

There are two aspects to job: resource requests and job steps.
  • resource requests describes the amount of computational resources (GPUs, RAM, run time etc) that the job will need to successfully run.
  • job steps define the tasks that must be executed

The best way to manage these two parts is within a single submission script that Slurm uses to allocate resources and process your job steps.

 

SBATCH FILE

A Job submission script or SBATCH file  is essentially a shell script (bash script) whose first comments, if they are prefixed with #SBATCH, are interpreted by Slurm as parameters describing resource requests and submissions options.

Creating a Batch Script

To create a batch script, use your favorite text editor nano or vim and create a file that contains both SLURM instructions and job instructions.The script is divided into three sections: the hashbang, the directives, and the commands.

  1. A hashbang appears on the first line of Slurm script line which specifies the program that executes the script. This is generally #!/bin/bash.
  2. The Directives are Slurm-specific ones which specifies resource requirements for the job. These lines must be placed before any other commands or job steps, else they will be ignored.
  3. The commands or applications you want to run as part of your job steps.

Below is a simple example of a submission script :

submit.sh
#!/bin/bash 
#SBATCH --job-name=test_job
#SBATCH --mem=2000
#SBATCH --gres=gpu:1
#SBATCH --time=1:00:00
#SBATCH --constraint=rtx_2080 

srun hostname
srun sleep 60

The “shebang” line must be the very first line, and simply informs the system that the file is a bash shell script. Following that are number of SBATCH directives which  which specifies resource requirements and other job-related data. #SBATCH informs bash that this is a Slurm directive.All of these must appear at the top of the file, prior any job steps.The above script would request one GPU for 1 hour, along with 2000 MB of memory and hardware constraint of ‘rtx_2080’ for the job Remember that these are just a few of the many #SBATCH directives available; for a detailed list, run ‘man sbatch’ .The last two lines consists of job steps which would execute the job by launching the command hostname on the node on which the requested GPU was allocated. Then, a second job step will start the sleep command.

 

Submitting a job:

This script can now be submitted to SLURM using the SBATCH command. Upon success, SBATCH will return the ID it has assigned to the job (the jobid).

(base)[uw76577@ada ~]$ sbatch submit.sh
Submitted batch job 33172

Once the job is submitted ,it enters the queue in the PENDING state. When resources become available and the job is determined to be the highest priority, an allocation is created for it and it goes to the RUNNING state. If the job is successfully completed, it is set to the COMPLETED state; otherwise, it is set to the FAILED state.