Job Priority is an integer value that determines a job’s position in the pending queue relative to other jobs and the order in which the pending jobs will run.
Calculation of Job Priority:
In the case of the ADA cluster, the job priority is calculated as a weighted sum of two factors :
- Age: Age refers to the length of time your job has been pending in the queue, eligible to be scheduled. The job priority increases linearly with the wait time (age) until it starts.
- Fairshare: Fairshare is based on the historical usage of all the members of your PI group. Essentially, it is a measure of how much the members of your PI group have used the cluster in the previous days. The job priority decreases, with the increase in fairshare component. It ensures that the PI groups who have not fully utilized their resource allocation receive greater priority for jobs on the cluster, while groups that have most recently utilized their resource allocation do not overuse the cluster.
Job priority can be expressed as follows:
Job_priority = PriorityWeightAge * age_factor + PriorityWeightFairshare * fairshare_factor
sprio
and sshare
are two useful commands to view the priority of pending jobs and fairshare.
Display the list of jobs sorted by priority
Use the squeue
command to list your pending jobs starting from the highest priority.
$ squeue --Format=JobID,Jobname,User,userid,account,State,PriorityLong,\
tres-alloc:50,nodelist,feature -t PENDING
Determine the priority of your pending jobs
Use the sprio
command to know the priority of your pending jobs.
sprio -j <job_id>
Determine the current fairshare value
To monitor the usage of members of your group, use sshare
command for knowing the value of your current fairshare and that of your account.
sshare -A <account_name>
Request your job’s estimated start time.
To get an estimate of the start time of your jobs, make use of the most versatile squeue command. It is merely an estimation. SLURM derives this on the time restrictions assigned to all jobs, which are rarely the actual run times of the jobs.
squeue -j <jobid> --start
Backfill
In addition to the standard scheduling cycle, in which jobs run in the order of priority and resource availability, all jobs are taken into account for “backfill.” Backfill is a mechanism that allows lower priority jobs to begin earlier in order to fill idle slots, as long as they are completed before the next high priority job is expected to begin based on resource availability. In other words, if your job is small enough, it can be backfilled and scheduled alongside a larger, higher-priority job. For example, if a higher priority job requires 2 nodes with 10 cores on each node and must wait 10 hours for those resources to become available and in case if any lower priority job only requires a couple of cores for an hour, SLURM will run this lower priority job requiring less walltime in the meantime.
Hence, it is essential that you request accurate walltime limits for your jobs. If your job only requires 2 hours to complete but you request 24 hours, the chances of it being backfilled are greatly reduced.
The figure below shows the difference between SLURM’s Backfill scheduler and any other First In First Out (FIFO) Scheduler. The blocks demonstrate the jobs with respect to time and resources. FIFO scheduler schedules the jobs in the queue in order of their assigned priority score. However, Backfill based scheduler makes better use of available resources by scheduling jobs out of priority order. Thus, accommodating and backfilling a lower priority smaller job is in the shadow of a larger, higher-priority job.