Table of Contents
- Introduction
- Good User Behaviors
- Access to the Facility
- Obligation of All Users to Help Maintain the Facility
Introduction
The UMBC High Performance Computing Facility (HPCF) is a shared resource for research at UMBC that requires a high performance, particularly a parallel computing cluster. The following rules intend to help make this facility effective for users and to ensure the maintenance of the facility. For the long-term benefit of the community of users, it is vital that all users comply with all aspects of the rules. There are several aspects to usage rules on a large computers that is shared by many users and additional aspects for a facility that relies on active support from its users for its maintenance. Therefore, the following items are grouped by their purpose. These rules will be reviewed and updated periodically by DoIT and the SIG and in response to issues that come to our attention and in response to usage patterns. This webpage always shows the current usage rules in effect. If you have any questions or concerns regarding the rules, contact the HPCF point of contact.
We will make a distinction between users and PIs (principal investigators). A user is anyone with an HPCF account. A PI is a UMBC faculty member who brings their research projects to HPCF and sponsors users to assist the PI on these projects. Users may be sponsored by one or more PIs, and PIs may be users themselves.
The philosophy adopted here is one of granting an account on this facility first and then requiring help in maintaining it, as opposed to requiring up-front payment to use the facility or on-going charge-backs. This approach allows researchers to start using the facility immediately at any point in the year and to obtain initial research results using it. In turn and using these results, it is then necessary for users to actively demonstrate results as well as to search for funding to sustain the facility.
Good User Behaviors
On a day-to-day basis, it is imperative that users run their code in a responsible fashion, so as not to hinder or damage other users’ workflows. To this end, we have developed a basic set of rules that all users must follow. The rules are posted below and we highly encourage all users to take steps to increase their understanding of cluster programming. This can be done through various means, including: studying information on this website, taking a class that utilizes the HPCF cluster (for instance, Math 447 or 627), or asking questions using the HPCF support ticketing system under the Forms tab. The HPCF has a small team of students and staff who can assist users with support and as well as educating users on best practices and good working habits. To access this type of support, please submit an RT ticket under Forms above.
Rules for cluster usage:
- All users must use the slurm submission system to reserve compute nodes for their cluster use. You are not allowed to log in to the login nodes for the purpose of running any computational or data processing tasks directly there.
- Users will be notified by e-mail about issues related to the system, such as scheduled downtime, upgrades, etc. Such mail may also include requests for information and feedback.
- Users are required to monitor their UMBC e-mail address and are required to respond to requests from DoIT. This is part of the active communication necessary for a shared resource such as this to be used effectively by all users.
- If a downtime is scheduled, this will be communicated several days in advance, if at all possible.
IMPORTANT NOTE: Any process that is judged by DoIT to be in violation or are behaving in any way that impacts other users’ ability to use the resource, will be either killed or have their resources restricted. Frequent users who violate this could have their accounts suspended and/or terminated. Ordinarily, we will try to make contact with the user first to discuss what is going on and to try to work with the user, but if other users are impacted, the account can be suspended first.
Access to the Facility
This facility is a shared resource for research at UMBC that requires a high performance parallel computer. To get an account to this facility, please submit an account request form found under Forms above. All accounts must be sponsored by a UMBC faculty member. If the faculty member does not already have an account, an account request form is also needed from the faculty member. To maintain their access, users must follow all rules outlined on the HPCF webpage at all times. To ensure the success of this facility in the long run, it is vital that there be demonstrated research results created on this resource, hence the PIs are required to have an on-going program of high performance computing research.
PIs are invited to contribute direct funding to the facility at the cost of a node or a multiple nodes. Contributions from faculty in this way will be bundled and used for a periodic expansion of the cluster. Contributing this money gives these PIs priority access over other PIs to a proportion of the cluster, in the sense explained in the following.
Access to compute nodes for users will be managed by a job scheduling software, called the scheduler, that reserves compute nodes for users. The scheduler reserves compute nodes based on the availability of resources in combination with a user’s priority. Note that priority only influences newly submitted jobs, not ones already running. The following principles will guide the setup of the scheduler:
- Basic scheduling is done by first-come-first-served, with priorities adjusted by factors like job size and time waited. The scheduler used currently is the software SLURM developed at Lawrence Livermore National Laboratory. SLURM has been configured to prioritize jobs based on a multi-factor plugin, which weighs several factors together to determine jobs’ priority.
- The priority of queued jobs incorporates a “fair-share component”. This component is determined by the SLURM accounting group that a job is charged to (
--account
). There are at least two accounting groups in HPCF: Contribution and General. PIs in the Contribution group have provided funding (or some other means of contribution) to the facility. Community consists of the remaining PIs. The relative weight between the two groups results from the total financial contribution by members of the Contribution group. Within the Contribution group, PI groups have weights proportional to their financial contribution to the facility. All PI groups in the Community group are weighed equally. - Principally, acceptable usage levels of the cluster are proportional to a PI’s standing in this hierarchy. This means for example that users under community group should not be monopolizing large portions of the cluster for excessively long periods of time, causing an unreasonable interference to jobs of contribution PIs. The scheduling mechanism on the cluster is designed accordingly to give increased access to users of paying PIs, while still serving the needs of our Community users.The scheduler tracks usage and uses a 30-day sliding time window in the fair-share calculation of a user’s priority. That is, each PI group has in effect a monthly allocation of the cluster expressed in terms of CPU hours (wall time multiplied by number of cores used). This allocation will be a nominal amount for users in the Community group. Contribution PIs will receive an allocation proportional to the number of nodes purchased. PIs who have used less than their monthly allocation will have their jobs’ priorities boosted, whereas PIs who have used more than their monthly allocation will have their priorities reduced.
- Much effort has gone into a designing scheduling rules that will automatically support the objectives of the cluster. However, it is ultimately the responsibility of each user to ensure he or she is maintaining an appropriate usage level. The meaning of “appropriate” varies with usage patterns of the overall community, so we may request that you adjust your usage based on the current situation.
- Users working under several PIs should take care to charge their computing time to the correct PI. This can be done on a per-job basis, and if no PI is specified one is considered to be the default.
- Contribution users will also get increased access to running long term jobs. We consider a “long term” job to be longer than over-night. Community users will also be given a more limited access to running long term jobs.
- Jobs requiring many resources (i.e. many processors) are allowed to use less wall time, to avoid tying up significant portions of the cluster for too long.
- Additionally, if current usage patterns on the machine allow for it, we are happy to let users run longer or larger jobs by arrangement; please file a help request.
- The above rules do not apply to system administration and testing of the machine, including select users running jobs for the purpose of testing, debugging, or benchmarking the system. For instance, users with existing code may be specifically running large jobs to test a new system or a new system configuration; that is, it is not just the actual system administrator running such jobs. Such efforts will be coordinated by the HPCF point of contact in collaboration with DoIT. We anticipate that such activity is limited to the initial phase of the machine or after significant changes in hardware or software.
Technical details of the scheduling system are discussed on the scheduling rules page. Once the principles on this page are understood, users should refer to that page and the how to run tutorial for further information on proper usage of the cluster.
Obligation of All Users to Help Maintain the Facility
This machine has been created by financial and ideal support both from faculty and from UMBC. To ensure the long-term existence of this facility, all users have an obligation to help actively to sustain it. This obligation has financial and scientific (non-financial) aspects, and support for both aspects is required from all users to maintain their accounts on the systems. The requirements includes the following methods of support:
- Each user must provide a title and abstract for all research projects conducted on the facility’s machines. Different projects should have each their own information. This information will be posted on the facility’s webpage to demonstrate the uses.
- Each user is required to provide information on outcomes of the research conducted on the facility’s machines. This includes both information on papers submitted and published and on presentations given. We are happy to post PDF files of papers or presentations on the facility’s webpage or point a link to another webpage.
- Each user must acknowledge the use of this facility, for instance, in papers and presentations. The short from of the acknowledgment is “HPCF, NSF, UMBC”. For a full paragraph version, see Supporting Materials under the Research tab on this webpage.
- Each user (or the sponsoring PI, if the user is not a faculty member) must be willing to participate as co-PI or co-investigator in future grant proposals. This implies a willingness to supply short descriptions of the research and its results (in particular papers) and to provide the necessary information for grant proposals (bio sketch, current/pending support, and similar), when requested.
- Each user is required to include budget requests for computational resources in individual grant proposals. The support requested should be commensurate with the amount of resources typically used; the cost per node for contributing users above is a guide for the cost. To support such efforts, we are ready to help with your proposal, including drafting text, acting as co-PI/co-investigator, supplying a support letter, or whatever way is deemed most effective. Contact the HPCF point of contact early enough before your proposal due date to work out details.
- All users including principal investigators must confirm when requested that they and their research group still require the account on the facility’s machines. Specifically, at the beginning of every Fall semester, all accounts will be reviewed to determine if they should be continued. The purpose is to avoid large numbers of inactive accounts. This facility is not suitable for long-term data storage; users are required to move their data off the machine upon the completion of projects. An account cannot be kept open solely for the purpose of access to data on the machine.
- Users who wish to continue their account on the system are required to supply proofs of outcomes of the usage of the machine, including for instance publications, presentations, preprints, grant proposals including funding requests for nodes on the machine. Users are required to submit such proofs continuously throughout the year, but also specifically at the time of account review. If no information is received upon request or there was no effort to help maintain the facility, the user’s account including all accounts sponsored by the faculty member will be suspended and/or their priority of usage reduced. To help with the documentation of research results, we provide as part of this website a Publications page where technical reports of results can be posted as well as webpages for each project, where publications and presentations of the research can be posted throughout the year.