Quick Summary of Preemption
- Jobs must specify how long they expect to run, but this specification can be arbitrarily long.
- Jobs that indicate they will run for more than 72 hours (3 days) are guaranteed 72 hours of run time.
- After 72 hours, depending on resource needs of other pending jobs, they may be preempted.
SLURM is an open-source workload manager for Linux clusters of any size. It performs three main functions as follows :
1) It allocates users exclusive and/or non-exclusive access to resources for a period of time, allowing them to complete tasks.
2) It provides a framework for initiating, executing, and monitoring jobs (usually a parallel job) across a group of allotted nodes.
3) Finally, it manages a queue of pending tasks to settle resource contention using Preemption mechanisms.
Preemption is a scheduling mechanism that allows for the suspension of some running jobs (preempted) by other pending jobs (preemptors). SLURM uses the job priority field to determine what running job allocations can be terminated. Thus, SLURM has a preemption mechanism in place to deal with situations where the cluster becomes overloaded.
Resource Contention and Need for Preemption
The ADA cluster environment has a finite amount of resources to offer its users. As a result, when multiple SLURM jobs seek the same resource, the cluster resources will be vulnerable to resource contention.
Resource contention is a conflict over access to a shared resource such as random access memory, disk storage, or other network devices. In HPC clusters, shared resources are requested more often by the job processes and when multiple jobs wish to use a similar shared resource, this can lead to the condition of resource contention. Failure to resolve resource contention issues properly can result in a variety of issues, including deadlock, livelock, and thrashing thus degrading the performance of the clusters. Hence, when contention may arise, it is necessary to provide some sort of resolution to determine which job has access to the resource.
An effective way to address this resource conflict issue in HPC clusters is the adoption of preemption of jobs that are already executing in the system which can be achieved by SLURM.
ADA Cluster’s Pre-emption Policy
Any job is guaranteed a minimum amount of run time of 72 hours (3 days) before it is preempted. Once the RUNNING job has been executed for the designated execution time of 72 hours and provided if other jobs are waiting in the PENDING queue requesting to access the resources acquired by this job, then the RUNNING job is suspended i.e pre-empted to make a room for PENDING job.
Surprisingly, Ada clusters do not follow a strict preemption policy, which means that not every task that lasts longer than three days is preempted. The ‘3-days preemption policy’ applies only if there are any other pending jobs requesting the resources which are currently being used. If these resources aren’t requested by other jobs, then the job should ideally be able to run for as long as the user wants.
So, to get your job running on the cluster, you should first submit them, which will wind up in the pending queue (if requested resources are not available). And, eventually will get executed on the Ada cluster by allocating available resources or if required, preempting the jobs running for more than 3 days.
Once, the job is preempted it will be canceled, just as if you had used
scancel to cancel it yourself whether it has been completed successfully or not. Hence, it is very crucial to checkpoint your SLURM jobs to avoid losing your computational efforts in the event of preemption.
SLURM does not automatically perform checkpointing, that is, create files from which your job can restart. Checkpointing differs by code type and must be implemented by the user as part of their codebase. Checkpointing your code is always recommended to protect against preemption mechanisms and job failure (due to code error or node failure) and to allow your job to be broken up into smaller chunks for re-submission of a job.
User’s next step after preemption
Resubmit the job: If your job is preempted and want to start the job again, you have to resubmit the job to continue its execution.