Introduction
Now we’ll see how to run a Bash script on the cluster. Before proceeding, make sure you’ve read the How To Run tutorial first.
We now know that we should be running our job on the compute nodes, rather than the front end node. However, we need to be careful with scripting, and make that the scheduler always has control over our job. We’ll see some examples of how to do this correctly, along with some counterexamples. Use of other scripting languages and shells should be very similar.
Simple Bash example
Let’s start with the following script. We initiate a one minute sleep to allow it to run for a little while. This is such a simple example, we could have included it directly in the batch script. In practice though, we’ll usually want to keep our functional code separate from our batch job running code. Make sure that the permissions for the file pause.bash below include executable permissions.
Download: ..code/bash_pause/pause.bash
Here is the qsub script we will use to launch it
Download: ..code/bash_pause/run.slurm
Now we launch the job
[araim1@maya-usr1 bash_pause]$ sbatch run.slurm sbatch: Submitted batch job 2618 [araim1@maya-usr1 bash_pause]$ squeue JOBID PARTITION NAME USER ST TIME NODES QOS NODELIST(REASON) 2620 serial pause araim1 R 0:00 1 normal (Resources) [araim1@maya-usr1 bash_pause]$
After about a minute, we get the following output
[araim1@maya-usr1 bash_pause]$ cat slurm.err [araim1@maya-usr1 bash_pause]$ cat slurm.out Script started at Thu Aug 20 18:12:36 EDT 2009 Script ended at Thu Aug 20 18:13:36 EDT 2009 [araim1@maya-usr1 bash_pause]$
If we had killed the job during its execution, the scheduler would have been able to stop it cleanly, and no pieces of it would continue to run on the compute node.
It would not be a good idea to try to run the pause.bash script as a background job, or through nohup, as a note to users familiar with these mechanisms. These could potentially run outside of the scheduler. If this happens, you would lose control of your job and need to contact HPC Support to stop it. If such a job is left running, other users’ jobs could be scheduled on your busy processors, which could interfere with their execution.