slurm: chip-cpu

Users and Accounts

Within these slurm clusters, every user is associated with a slurm user and slurm account. These mirror the UNIX user and group. E.g., user student belonging to UNIX group pi_professor is associated with a slurm user student and a slurm account pi_professor.

Partitions and QOSs

Within chip-cpu, there exist a series of overlapping hardware partitions. The hardware partitions are designed to logically group cluster machines based on their high-level hardware differences. The access partitions are designed to logically group cluster machines based on who can access the machines with slurm.

 

Access Partition Requestable By Can jobs be preempted? Available QOS Machines
contrib contributor accounts Yes, by owners of hardware shared n[001-051]
match contributor accounts No shared n[029-051]
general all accounts No short, normal, medium, long N/A
support system administrators No support ALL
pi_<professorName> users within pi_<professorName> No pi_<professorName> Depends on contributor

Table 1: Overview of Access-level partitions configured in slurm on chip

 

QOS Requestable By “Wall” Time Cores per Account
shared pi_<professor> 14 days 2944 (23 machines)
short all accounts 1 hour 2944
normal all accounts 4 hours 1280
medium all accounts 1 day 640
long all accounts 5 days 256
pi_<professorName> pi_<professorName> Unlimited Unlimited

Table 2: Overview of QOS resource limitations

 

 

Figure 1: Cartoon overview of hardware and access-level partitions.

Running a Job (More content soon)

There are no required slurm flags, but each user will need to curate a command such as what is listed below for various runtime scenarios. In each case, srun is used as only for illustrative purposes, sbatch and salloc remain perfectly valid across chip.

`srun –account=pi_<professorName> –partition=pi_<professorName> –qos=pi_<professorName> –cluster=chip-cpu <bash executable>`

Note that only users belonging to the slurm account indicated by pi_<professorName> are able to run these commands.

`srun –account=pi_<professorName> –partition=match –qos=shared –cluster=chip-cpu <bash executable>`

This job will run on the match partition, which is not preemptable.

`srun –account=pi_<professorName> –partition=contrib –qos=shared –cluster=chip-cpu <bash executable>`

This job will run on the contrib partition, if the job is assigned to a machine also belonging to a partition belonging to another slurm account, the job will be cancelled via slurm preemption if a user belonging to the owning slurm account requests the machine.

Note that only users belonging to any of the 2024 contributor slurm accounts are able to run these commands.

`srun –account=pi_<professorName> –partition=general –qos=short –cluster=chip-cpu <bash executable>`

This job just won’t run 🙂 None of the general hardware has been integrated from taki or ada, yet.