FAQs – High Performance Computing Facility

This page serves as a growing list of frequently asked questions.

If you have a question that is not answered on this page, please see the support section located at this site https://hpcf.umbc.edu/ada/environment/#support.

Quick Question Reference

How do I gain access to the ada cluster environment?
How to Unix?
How much storage do I get?
Help! My jobs are disappearing!
Is there a way to get around SLURM job preemption?
I would like to run a command as a super-user (sudo). How do I do this?
Why “ada”?
Who are the system administrators?

How do I gain access to the ada cluster environment?

Easy! Ask your advisor (if you are not faculty or staff) and then fill out the form found under the “Forms” drop-down above. Be sure to select System: “ada” and be sure to select “Account or Group Request”.

To access the cluster, use your UMBC login credentials and use your favorite remote connection client to login to ada.rs.umbc.edu .

How to Unix?

The system administrators and the first users of the ada cluster environment worked to establish an understanding of what most users probably already know about Unix environments. This is the summary of that conversation with links to external resources to help any user who might need guidance. In this summary, CLI means “Command Line Interface” and it is expected that this CLI uses BASH.

Note that commands appearing as this one are meant to be run on the ada login node.

How to securely access a remote machine (see https://www.howtogeek.com/311287/how-to-connect-to-an-ssh-server-from-windows-macos-or-linux/ (sorry for the ads))
What are and how to use man pages (run man man in the CLI, seriously)
How to navigate file systems and manipulate file contents from the CLI (an Econ TA from UCLA wrote a page on this http://www.econ.ucla.edu/TApages/wan/basic_commands.html)
How to query group membership from the CLI (see man groups or man id)
How to query objects to determine user, group, or other permissions (see Indiana University’s Page on this https://kb.iu.edu/d/abdb)
How to securely transfer data between a local and remote machine (see this good StackExchange https://unix.stackexchange.com/questions/188285/how-to-copy-a-file-from-a-remote-server-to-a-local-machine)
What are and how to use environment variables (See this page from GNU https://www.gnu.org/software/bash/manual/html_node/Environment.html)

We’re all here to learn! If you’re having issues understanding any of these, please see https://hpcf.umbc.edu/ada/environment/#support.

How much storage do I get?

Each user receives 500MB of home directory storage. This storage is backed-up daily and a record is kept for 30 days.

Each group receives two types of research storage.
1. HPCF Research Storage

This is available across both HPCF clusters (ada & taki).
This storage is shared between all members of the group.
By default, this storage has a quota of 250GB per group.
This storage is NOT backed-up.

2. ada Research Storage

This is available only on the ada cluster.
This storage is shared between all members of the group.
By default, this storage has a quota of 1TB per group.
This storage is NOT backed-up.

You can read more about the storage available on the ada cluster on the Storage page.

Help! My jobs are disappearing!

Take a deep breath, and prepare to brush-up on your SLURM commands.

If you still find that you can’t explain what’s happening to your jobs, submit a descriptive help request outlining your issues, questions, and concerns. Follow the following link for some good practices. https://hpcf.umbc.edu/ada/environment/#support

sacct is a SLURM command for displaying job data. Anyone can use sacct to look at anyone else’s job! You could just read through the man page on sacct, but here is at least one system administrator’s favorite command to use when trying to determine what’s happened to a job.

sacct -X -u $USER --format="jobid,account,state,elapsedraw"

Where $USER is your username. This should output information regarding the jobs that completed since the last local midnight. The information that’s given is (1) the Job ID, (2) the Sponsoring Account, (3) the State of the job (e.g., RUNNING, PENDING, COMPLETED, FAILED, …), and (4) the total walltime of the job in seconds.

Another way to use this command is the following:

sacct -X -j ${JOB_ID} --format="jobid,account,state,elapsedraw"

Where ${JOB_ID} is the SLURM-given job ID printed to stdout at the time of submission. This will output the same information as above.

If you find any entry at all for your job in this output, it has not disappeared. It’s very likely that your job simply completed or failed very quickly! Use the output written to the SLURM output and SLURM error files to determine the cause of the failure. Always feel free to submit a descriptive help request at the link above, but expect some minor shaming if it’s clear that you haven’t looked at this type of output yourself.

Is there a way to get around SLURM job preemption?

Short answer: No. Although, any job is guaranteed a run time of 72 hours (3 days) before it is preempted, the `ada` cluster does not follow a strict preemption policy . This means the ‘3-days preemption policy’ applies only when some other pending jobs are requesting the resources which are currently being used. If these resources aren’t requested by other jobs, then the job should essentially be able to run for as long as the user wants.

The system administrators and the first users of the ada cluster environment worked together in their design of the ada cluster environment to ensure an equal level of access.

As with all computing environments, the ada cluster environment has a limited number of resources to offer its users. For this reason, the cluster resources will sometimes be subject to resource contention(wiki), when the same resource is requested by different SLURM tasks. Preemption is our answer to cases of resource contention. Preemption is described on the preemption page located under “ada GPU Cluster” and “SLURM” in the above drop-down menu.

System administrators who grant an exemption to SLURM job preemption would be granting unequal access to a shared campus resource. Please refrain from asking them to grant such an exemption. The answer will always be the same.

Please remember to checkpoint your SLURM jobs to ensure that computational efforts are not lost in the case of preemption.

I would like to run a command as a super-user (sudo). How do I do this?

Users cannot run super-user commands on the ada cluster environment. In many cases, you do not need to. In the rare exception to this, please submit a descriptive help request at the link above under “Forms” or read about how ada cluster environment support at the following link. https://hpcf.umbc.edu/ada/environment/#support

Why “ada”?

The ada cluster environment is named in honor of Augusta Ada King, Countess of Lovelace a.k.a. “The Prophet of the Computer Age” [1] or “Enchantress of Number” [2].

Ada lived from 1815-1852 and was born and died in London, England [2]. Ada is commonly regarded as the first computer programmer given her unique insight to what we might call computer programming today [1,3]. She left an extensive set of notes on a French paper penned by Luigi Menabrea [1,2,3,4]. This paper was a summary of the Analytic Engine designed by Charles Babbage and presented to, among others, Menabrea [3,4]. These notes represent the first discussion of using symbols (mathematical or otherwise) and rules to represent complex patterns and processes beyond rote arithmetic [1]–that is, using symbols and rules to give machine instructions.

[1]: https://www.computerhistory.org/babbage/adalovelace/

[2]: https://writings.stephenwolfram.com/2015/12/untangling-the-tale-of-ada-lovelace/

[3]: http://www.fourmilab.ch/babbage/sketch.html

[4]: https://findingada.com/about/who-was-ada/

Who are the system administrators?

No one really knows. They hide behind the thin veil of the Research Computing RT system and venture out only for snacks and fancy cable management solutions.

System administrators retain the right to simply refer users to this page 🙂

High Performance Computing Facility