Skip to Main Content

Overview

This page is currently under construction.


What is SLURM?

On a local machine, an operating system decides exactly when and on what resources an executing process runs. In a distributed compute environment, this inter-machine coordination needs to be done by an authoritative workload manager (=WLM). This is SLURM. It coordinates all cluster resources by optimizing for resource utilization while not allowing any single user to monopolize the cluster resources. It’s quite a balancing act!

System administrators have tuned SLURM so that it is:

  1. Aware of all cluster compute resources and hardware usage; and
  2. Able to prioritize new requests for compute resources according to the needs of the users of ada; and
  3. Able to allocate requested compute resources for execution across compute resources.

All that the user is required to do is formulate well-defined requests for resources and submit those requests. There are a few important notes on this and a few advanced topics.

Please step through the following outline of pages with the right arrows found at the top and bottom of each page within this tutorial.

Throughout this document, please consider any mention of SLURM to be specific to the implementation of SLURM on the ada cluster environment.

  1. Requesting Resources (Flags)
  2. SBATCH File
  3. Monitoring Jobs
  4. Modifying Jobs
  5. Array Jobs
  6. Environment Variables
  7. Resource Contention
  8. Priority
  9. Preemption