Hardware Details

Compute Hardware

The Q2 2020 hardware acquisition that became the first generation of compute hardware for the ada cluster was grouped into three distinct machine types, with the main difference as the GPU architecture. The RTX 2080 Tis, the RTX 6000s, and the RTX 8000s. There are 13 compute machines in total.

RTX 2080 Tis

Four of these machines have 8x Nvidia RTX 2080 Ti GPUs. Each GPU has 11GB of GPU memory.

Each machine has 12x 32GB DDR4 (Double Data Rate 4th Generation) which boasts a nominal transfer rate of 2933 MT/s. This gives each machine 384GB of CPU memory.

 

RTX 6000s

Seven of these machines have 8x Nvidia Quadro RTX 6000 GPUs. Each of these GPUs has 24GB of GPU memory and uses GDDR6 (Graphics Double Data Rate 6th Generation).

Each machine has 12x 32GB DDR4 (Double Data Rate 4th Generation) which boasts a nominal transfer rate of 2933 MT/s. This gives each machine 384GB of CPU memory.

 

RTX 8000s

Two of these machines have 8x Nvidia Quadro RTX 8000 GPUs. Each of these GPUs has 48GB of GPU memory and uses GDDR6 (Graphics Double Data Rate 6th Generation).

Each machine has 12x 64GB DDR4 (Double Data Rate 4th Generation) which boasts a nominal transfer rate of 2933 MT/s. This gives each machine 768GB of CPU memory.

 

Shared Specifications

Each of these machines is equipped with two Intel Xeon Gold 6240R Processors. These processors are rated at 165W, each with 24 cores, a 35.75M Cache, and a base processor frequency of 2.4GHz. For more information on the chip family, see the following page on Intel Xeon Processor Scalable Family Technical Overview. For more information on the specific processors, see the Intel page on the Xeon Gold 6240R.

Each machine has 2x 240GB SATA SSDs that are mirrored (i.e., RAID1) to increase mean time between failures. The machine operating system resides on this virtual drive.

Each machine has a 2TB NVMe SSD. This is dedicated to transient scratch space used only for the lifetime of a SLURM job.

Each machine is a SuperMicro 4029GP-TRT system. See the following webpage on the 4029GP-TRT specifications.

 

PCIe Architecture

Interesting to note is that the RTX 2080 Ti machines employ a different PCIe (Peripheral Component Interconnect Express) architecture than the RTX 6000s or RTX 8000s. The PCIe connections connect the CPUs to the GPUs, so this difference may be noticeable in some applications.

The RTX 2080 Ti machines employ a dual-root PCIe architecture (left). This means that 4 GPUs “talk” to one CPU and the remaining 4 GPUs “talk” with the other CPU. This is in sharp contrast to the single-root (right) PCIe architecture that directs all GPU-CPU communication to a single CPU, requiring use of the Intel UPI (Intel Ultra Path Interconnect) to share CPU resources. See the two illustrations below for reference. Note that LOM stands for LAN-On-Motherboard, meaning that the network interface card is embedded onto the motherboard.

Dual-Root System. CPUs share the load of GPU-CPU communication. Single-Root PCIe Architecture. All GPU communication is handled by 1 CPU. This allows for better inter-GPU communication than the dual-root architecture.

 

Login Hardware

Check back later for more information on this!

Management Hardware

Check back later for more information on this!

 


[1] https://www.supermicro.org.cn/products/system/4U/4029/PCIe-Root-Architecture.cfm