Skip to Main Content

Preemption

Preemption is a scheduling mechanism that allows for the suspension of some running jobs (preemptees) by other pending jobs (preemptors).SLURM uses the job priority field to determine what running allocations can be preempted. It is the act of “stopping” one or more “low-priority” jobs to let a “high-priority” job run.Job preemption is implemented as a variation of Slurm’s Gang Scheduling logic.

Need for Pre-emption :

The ada cluster environment has a finite amount of resources to offer its users. As a result, when multiple SLURM jobs seek the same resource, the cluster resources will be vulnerable to resource contention. When there is a resource conflict, adoption of preemption is the only approach.

ADA Cluster’s Pre-emption Policy :

Any job is guaranteed the maximum amount of run time of 72 hours (3 days) before it is preempted . Once the RUNNING job has executed for designated execution time of 72 hours and other jobs are waiting in PENDING queue requesting to access the resources acquired by this job, the RUNNING job is suspended i.e pre-empted to make a room for PENDING job.

ADA clusters does not follow a strict preemption policy, which means that every task that lasts longer than three days is preempted. The 3-days preemption policy works only if there are any pending jobs requesting the resources. So, if you want to perform jobs on the cluster, you should first submit them, which will wind up in the pending queue (if requested resources are not available). And ,eventually will get executed by preempting the jobs running for more than 3 days.

Once , the job is pre-empted it will be canceled , just as if you had used scancel to cancel it yourself whether it has completed successfully or not .Hence , it is very crucial to checkpoint your SLURM jobs to avoid losing your computational efforts in the event of preemption.

User’s next step after pre-emption:

Resubmit the job : If your job is preempted and want to start the job again, you have to resubmit the job to continue its execution.