The ada cluster environment is one of four cluster environments within the UMBC HPCF. The ada cluster environment is comprised of the login node, compute hardware, storage volumes, workload manager, the accessible software environment, and the documentation of the cluster.
This purpose of this document is to step through each of these ideas at a high-level. Please see the “main” pages positioned directly under each heading for more detailed information.
Throughout this document, the terms “node”, “machine”, and “computer” can be used interchangeably.
Main Page: Account
The login node (a.k.a. edge node or user node) provides access to all users and administrators of the cluster environment. Users on the login node monitor compute resource requests, submit new requests, and make modifications to research workflows. It is important that sessions on the login node do not consume large computational resources. The user experience within the ada cluster environment takes place almost exclusively on the login node. For more on the login node and account basics, see the main article, “Account Basics”.
Main Page: Hardware Details
The compute hardware is collected into a heterogenous cluster with equipment acquired in 2020. There are 13 nodes which each have two 24-core Intel Cascade Lake CPUs. Each core is capable of two threads giving a total of 96 possible threads per node. Each node has 384 GB of memory except for the RTX 8000 nodes which have double (i.e., 768 GB). Each node has 2TB of SSD scratch storage.
|Node Range||#||GPU Information||Memory||Threads|
|g01-g04||4||8x 2080 Ti||384 GB||48|
|g05-g11||7||8x RTX 6000||384 GB||48|
|g11-g13||2||8x RTX 8000||768 GB||48|
Main Page: Storage
The ada cluster environment provides each user with a home directory, two types of research storage, and scratch space. All but the last is accessible from any machine within the ada cluster environment.
- Home directories are accessible only to the user who owns it. These directories help establish a user environment.
- Research Storage
- HPCF Research Storage is group-shared storage and accessible across the UMBC research computing infrastructure provided by DoIT
- ada Research Storage is group-shared storage and accessible only from the ada cluster environment
- Scratch storage is temporary, job-specific storage space allocated to each user on the compute hardware as they run jobs on those nodes via the workload manager.
Workload Manager (SLURM)
Main Page: SLURM
The ada cluster environment runs a workload manager called SLURM(=Simple Linux Utility for Resource Management). SLURM orchestrates all requests for consumable compute resources (e.g., GPUs, memory, time) within the compute hardware. Users may request and monitor these resources via command-line interactions with SLURM on the login node.
Main Page: Software
The system administrators in DoIT work with the user base of the ada cluster to ensure secure and up-to-date software packages are available across the compute hardware. Much of the scientific software is maintained in a software listing of modules made accessible via LMoD. Please note that all nodes on ada run CentOS 7.
Main Page: Contact and Support
This webpage is written for the average Unix user. It is expected that new users familiarize themselves with general aspects of a Unix Environment (see the FAQs: “How to Unix?” entry). Aspects of the ada cluster environment that the system administrators feel are novel or atypical of a standard, local Unix Environment are communicated here and on neighboring pages (see the drop-down above “ada GPU Cluster”).
All users can opt-in (or out) of a UMBC Google Group that acts as a mailing list for discussions of account issues or support for the ada cluster environment. The address is email@example.com.
Users are encouraged to seek help from the community in this group, but system administrators moderate this list and may suggest that issues are moved to UMBC’s Research Computing Help Request System (see the Forms drop-down above).
All users are auto-enrolled in a UMBC Google Group that acts as a mailing list for important updates to the ada cluster environment. The address is firstname.lastname@example.org.
Users may always report an issue they are experiencing or submit a feature request by submitting an HPCF Help Request via the Forms drop-down in the menu above. Be sure to select the “ada” system!
The authors of these pages are human and therefore make mistakes or incorrectly assume a certain topic is already well-understood. Please submit an HPCF Help Request to request additional language be published, a new FAQ be written, or coorect any tpyos you noteice.
Perhaps twice a calendar year, system administrators will communicate to the user base a need to take system downtime. A system downtime will generally be on the order of a few hours. During this downtime period, system administrators make changes to keep the ada cluster environment secure and usable. Before this downtime period, all users are required to kill jobs and save changes. During these periods running jobs are likely to stop communicating or otherwise fail. System administrators will communicate impending downtimes via all of the following
- MoTD a.k.a. message of the day which is seen as the initial log-in message to the ada cluster environment
- A communication to the ada-updates-group mentioned above
- A new posting to the HPCF myUMBC Group (only if it impacts all aspects of UMBC’s HPCF)
This is a long page! Thanks for reading this through carefully.
System administrators retain the right to simply refer users to this page 🙂