If you are looking to create an account or otherwise gain access to this cluster, please see this FAQ entry.
Login, User, and Group Information
Your ada cluster login credentials exactly match what you enter into your single sign-on interface to the UMBC portals. Consider your username as $USER
in this document.
Further, your user account was created with a primary group affiliation. You may even have a supplementary group affiliation. In either case, that group name most likely matches the regex “(pi_)([a-z]{1,}[0-9]{0,})”. Consider the substring following the underscore (“_”) as $GROUP
in this document. For example, a GROUP=smith
corresponds to the group pi_smith
.
The HPC clusters are accessed through a terminal interface.
In order to log in, open your terminal application and login into the cluster environment using: ssh $USERNAME@ada.rs.umbc.edu
Home Directory and Research Storage
Every user has ownership of a directory shared across the system /home/$USER
. This directory has a storage quota of 500MB. This directory is not backed-up.
Every user also has access to two shared research storage volumes. The HPCF Research Storage (/nfs/rs/$GROUP
) or (/umbc/rs/$GROUP/$GROUP
) is generally 250GB in size. The ada Research Storage (/nfs/ada/$GROUP
) is 1TB in size. These two research volumes are not backed up.
By default, the home directory has the following symbolic links to group-shared research storage.
Link Name | Linked To | |
---|---|---|
~/${GROUP}_common |
/nfs/rs/$GROUP/common |
|
~/${GROUP}_user |
/nfs/rs/$GROUP/users/$USER |
|
~/ada_${GROUP} |
/nfs/ada/$GROUP |
|
~/.cache |
/nfs/ada/$GROUP/users/$USER/.cache |
|
~/.conda |
/nfs/ada/$GROUP/users/$USER/.conda |
As with other HPCF Infrastructure, storage in these volumes is not backed-up.
Software Modules
As with other HPCF Systems, ada makes use of LMoD which provides the module
command for listing, searching, and loading installed software packages.
SLURM
As with other HPCF Systems, SLURM is the workload manager. The implementation on ada is comparatively simple.
- There is only one QOS, so you are not required to specify it.
- There is only one partition, so you are not required to specify it.
- There is no group accounting, so you are not required to specify it.
- Every job must request at least one Generic RESource (i.e., a GPU with
--gres=gpu:<#GPU>
). - Every job must specify a memory limit (i.e. with
--mem=<#MB>
) - Every job must specify time limit (i.e., with
--time=<#MINUTES>
or--time=DD-HH:MM:SS
) - Jobs running longer than 3 wall-clock days may be preempted — so checkpointing code is imperative for runs longer than 3 days.
In this way, requesting 2 GPUs, 100GB of memory, and 24 hours of wall-clock time to run an executable, executable
, could be done with the following command:
srun --gres=gpu:2 --mem=100G --time=24:00:00 executable
Features
Specific GPU cards may be selected by specifying a “feature” with the --constraint
flag.
Available Resources
For a listing of resources available, run a combination of
sinfo
and
sinfo -o "%10N %4c %10m %10f %10G "
at the CLI.
Tutorial Slides
As a part of a training session held over the summer of 2021, Dr. Frank Ferraro of CSEE put together this slide deck which highlights some useful commands and concepts. This slide deck is intentionally incomplete and was meant only to complement a live tutorial session.