Skip to Main Content

Quick Start

If you are looking to create an account or otherwise gain access to this cluster, please see this FAQ entry.


Login, User, and Group Information

Your ada cluster login credentials exactly match what you enter into your single sign-on interface to the UMBC portals. Consider your username as $USER in this document.

Login to the cluster environment with, e.g., ssh $USER@ada.rs.umbc.edu .

Your user account was created with a primary group affiliation. You may even have a supplementary group affiliation. In either case, that group name most likely matches the regex “(pi_)([a-z]{1,}[0-9]{0,})”. Consider the substring following the underscore (“_”) as $GROUP in this document. For example, a GROUP=smith corresponds to the group pi_smith.

Home Directory and Research Storage

Every user has ownership of a directory shared across the system /home/$USER. This directory has a storage quota of 500MB. This directory is not backed-up.

Every user also has access to two shared research storage volumes. The HPCF Research Storage (/nfs/rs/$GROUP) is generally 100GB in size. The ada Research Storage (/nfs/ada/$GROUP) is 1TB in size. These two research volumes are not backed up.

By default, the home directory has the following symbolic links to group-shared research storage.

Link Name Linked To
~/${GROUP}_common /nfs/rs/$GROUP/common
~/${GROUP}_user /nfs/rs/$GROUP/users/$USER
~/ada_${GROUP} /nfs/ada/$GROUP
~/.cache /nfs/ada/$GROUP/users/$USER/.cache
~/.conda /nfs/ada/$GROUP/users/$USER/.conda

As with other HPCF Infrastructure, storage in these volumes is not backed-up.

Software Modules

As with other HPCF Systems, ada makes use of LMoD which provides the module command for listing, searching, and loading installed software packages.

SLURM

As with other HPCF Systems, SLURM is the workload manager. The implementation on ada is comparatively simple.

  • There is only one QOS, so you are not required to specify it.
  • There is only one partition, so you are not required to specify it.
  • There is no group accounting, so you are not required to specify it.
  • Every job must request at least one Generic RESource (i.e., a GPU with --gres=gpu:<#GPU>).
  • Every job must specify a memory limit (i.e. with --mem=<#MB>)
  • Every job must specify time limit (i.e., with --time=<#MINUTES> or --time=DD-HH:MM:SS)
  • Jobs running longer than 3 wall-clock days may be preempted — so checkpointing code is imperative for runs longer than 3 days.

In this way, requesting 2 GPUs, 100GB of memory, and 24 hours of wall-clock time to run an executable, executable, could be done with the following command:

srun --gres=gpu:2 --mem=100G --time=24:00:00 executable

Features

Specific GPU cards may be selected by specifying a “feature” with the --constraint flag.

Available Resources

For a listing of resources available, run a combination of

sinfo

and

sinfo -o "%10N  %4c  %10m  %10f  %10G " at the CLI.

 

Tutorial Slides

As a part of a training session held over the summer of 2021, Dr. Frank Ferraro of CSEE put together this slide deck which highlights some useful commands and concepts. This slide deck is intentionally incomplete and was meant only to complement a live tutorial session.