Table of Contents
- Good User Behaviors
- Access to the Facility
- Obligation of All Users to Help Maintain the Facility
- Effectiveness of these Rules
The UMBC High Performance Computing Facility (HPCF) is a shared resource for research at UMBC that requires a high performance, particularly a parallel computer. The following rules intend to help make this facility effective for users and to ensure the maintenance of the facility. For the long-term benefit of the community of users, it is vital that all users comply with all aspects of the rules. There are several aspects to usage rules on a large computers that is shared by many users and additional aspects for a facility that relies on active support from its users for its maintenance. Therefore, the following items are grouped by their purpose. These rules will be reviewed and updated periodically by the HPCF Governance Committee and in response to issues that come to our attention and in response to usage patterns. This webpage always shows the current usage rules in effect. If you have any questions or concerns regarding the rules, do not hesitate to contact the chair of the user committee, Dr. Matthias Gobbert; for technical questions write to firstname.lastname@example.org preferrably; see the contact information on this webpage.
We will make a distriction between users and PIs (principal investigators). A user is anyone with an HPCF account. A PI is a UMBC faculty member who brings their research projects to HPCF and sponsors users to assist the PI on these projects. Users may be sponsored by one or more PIs, and PIs may be users themselves.
Good User Behaviors
On a day-to-day basis, it is imperative that users run their code in a responsible fashion, so as not to hinder or damage other users’ code. To this end, we have developed a basic set of rules that all users must follow. The rules are posted below and we highly encourage all users to take steps to increase their understanding of cluster programming. This can be done through various means, including: studying information on this website, taking a class that utilizes the HPCF cluster (for instance, Math 447 or 627), or asking questions using the HPCF support ticketing system. It is always better to ask first and allow us to potentially coordinate usage. The HPCF cluster has several RAs on staff to assist users with support and as well as educating users on best practices and good working habits.
Rules for cluster usage:
- All users must use the batch submission system to reserve compute nodes for their cluster use. You are not allowed to log in to the login nodes (maya-usr1, maya-usr2) for the purpose of running any computational or data processing tasks directly there.
- Users will be notified by e-mail about issues related to the system, such as scheduled downtime, upgrades, etc. Such mail may also include requests for information and feedback.
- Users are required to monitor their UMBC e-mail address and are required to respond to requests from DoIT. This is part of the active communication necessary for a shared resource such as this to be used effectively by all users.
- The time slot for scheduled downtime is every Tuesday evening. If a downtime is scheduled, this usually will be communicated several days in advance.
IMPORTANT NOTE: Any process that is judged by DoIT to be in violation or are behaving in any way that impacts other users’ ability to use the resource, will be either killed or have their resources restricted. Frequent users who violate this could have their accounts suspended and/or terminated. Ordinarily, we will try to make contact with the user first to discuss what is going on and to try to work with the user, but if other users are impacted, the account can be suspended first. If you have any questions or need further information the chair of the user committee acts here on behalf of the entire user community; see the contact page for details.
Access to the Facility
This facility is a shared resource for research at UMBC that requires a high performance parallel computer. To get an account to this facility, please submit an account request form completely filled out. All accounts must be sponsored by a UMBC faculty member. Thus, for a student to get an account, two forms need to be submitted: one by the student with name of the sponsor and one by the sponsor listing the student’s name. To maintain their access, users must follow all rules outlined on the HPCF webpage at all times. To ensure the success of this facility in the long run, it is vital that there be demonstrated research results created on this machine, hence the PIs are required to have an on-going program of high performance computing research. PIs are invited to contribute direct funding to the facility at $5,000 per node or a multiple thereof. Contributions from faculty in this way will be bundled and used for a periodic expansion of the cluster. Contributing this money gives these PIs priority access over other PIs to a proportion of the cluster, in the sense explained in the following. The access to compute nodes for users will be managed by a job scheduling software, called scheduler, that reserves compute nodes for users. The scheduler reserves compute nodes based on the availability of resources in combination with a user’s priority. Note that priority only influences newly submitted jobs, not ones already running. The following principles will guide the setup of the scheduler:
- Basic scheduling is done by first-come-first-served, with priorities adjusted by factors like job size and time waited. The scheduler used currently is the software SLURM developed at Lawrence Livermore National Laboratory. SLURM prioritizes jobs based on a multi-factor plugin, which weighs several factors together to determine jobs’ priority.
- The priority of queued jobs incorporates a “fair-share component”. This component is determined by the SLURM accounting group that a job is charged to. There are two accounting groups in HPCF: Contribution and Community. PIs in the Contribution group have provided funding (or some other means of contribution) to the facility. Community consists of the remaining PIs. The relative weight between the two groups results from the total financial contribution by members of the Contribution group. Within the Contribution group, PI groups have weights proportional to their financial contribution to the facility. All PI groups in the Community group are weighed equally. The following diagram demonstrates this hierarchy.Principally, acceptable usage levels of the cluster are proportional to a PI’s standing in this hierarchy. This means for example that users under community group should not be monopolizing large portions of the cluster for excessively long periods of time, causing an unreasonable interference to jobs of contribution PIs. The scheduling mechanism on the cluster is designed accordingly to give increased access to users of paying PIs, while still serving the needs of our Community users.The scheduler tracks usage and uses a 30-day sliding time window in the fair-share calculation of a user’s priority. That is, each PI group has in effect a monthly allocation of the cluster expressed in terms of CPU hours (wall time multiplied by number of cores used). This allocation will be a nominal amount for users in the Community group. Contribution PIs will receive an allocation proportional to the number of nodes purchased. PIs who have used less than their monthly allocation will have their jobs’ priorities boosted, whereas PIs who have used more than their monthly allocation will have their priorities reduced.
- Much effort has gone into a designing scheduling rules that will automatically support the objectives of the cluster. However, it is ultimately the responsibility of each user to ensure he or she is maintaining an appropriate usage level. The meaning of “appropriate” varies with usage patterns of the overall community, so we may request that you adjust your usage based on the current situation.
- Users working under several PIs should take care to charge their computing time to the correct PI. This can be done on a per-job basis, and if no PI is specified one is considered to be the default.
- Contribution users will also get increased access to running long term jobs. We consider a “long term” job to be longer than over-night. Community users will also be given a more limited access to running long term jobs.
- Jobs requiring many resources (i.e. many processors) are allowed to use less wall time, to avoid tying up significant portions of the cluster for too long.
- Additionally, if current usage patterns on the machine allow for it, we are happy to let users run longer or larger jobs by arrangement; contact email@example.com.
- The above rules do not apply to system administration and testing of the machine, including select users running jobs for the purpose of testing, debugging, or benchmarking the system. For instance, users with existing code may be specifically running large jobs to test a new system or a new system configuration; that is, it is not just the actual system administrator running such jobs. Such efforts will be coordinated by the chair of the user committee in collaboration with DoIT and the user committee. We anticipate that such activity is limited to the initial phase of the machine or after significant changes in, e.g., hardware or software.
Technical details of the scheduling system are discussed on the scheduling rules page. Once the principles on this page are understood, users should refer to that page and the how to run tutorial for further information on proper usage of the cluster.
Obligation of All Users to Help Maintain the Facility
This machine has been created by financial and ideal support both from faculty and from UMBC. To ensure the long-term existence of this facility, all users have an obligation to help actively to sustain it. This obligation has financial and scientific (non-financial) aspects, and support for both aspects is required from all users to maintain their accounts on the systems. The requirements includes the following methods of support:
- Each user must provide a title and abstract for all research projects conducted on the facility’s machines. Different projects should have each their own information. This information will be posted on the facility’s webpage to demonstrate the uses.
- Each user is required to provide information on outcomes of the research conducted on the facility’s machines. This includes both information on papers submitted and published and on presentations given. We are happy to post PDF files of papers or presentations on the facility’s webpage or point a link to another webpage.
- Each user must acknowledge the use of this facility, for instance, in papers and presentations. The short from of the acknowledgment is “HPCF, NSF, UMBC”. For a full paragraph version, see Supporting Materials under the Research tab on this webpage.
- Each user (or the sponsoring PI, if the user is not a faculty member) must be willing to participate as co-PI or co-investigator in future grant proposals. This implies a willingness to supply short descriptions of the research and its results and to provide the necessary information for grant proposals (bio sketch, current/pending support, and similar), when requested.
- Each user is required to include budget requests for computational resources in individual grant proposals. The support requested should be commensurate with the amount of resource typically used; the cost per node for contributing users above is a guide for the cost. To support such efforts, we are ready to help with your proposal, including drafting text, acting as co-PI/co-investigator, supplying a support letter, or whatever way is suitable. Contact the chair of the user committee early enough before your proposal due date to work out details. A standard phraseology for budget justifications is available under Supporting Materials in the Research tab on this webpage.
- All users including principal investigators must confirm when requested that they and their research group still require the account on the facility’s machines. Specifically, at the beginning of every Fall semester, all accounts will be reviewed to determine if they should be continued. The purpose is to avoid large numbers of inactive accounts. This facility is not suitable for long-term data storage; users are required to move their data off the machine at the completion of projects. An account cannot be kept open solely for the purpose of access to data on the machine.
- Users who wish to continue their account on the system are required to supply proofs of outcomes of the usage of the machine, including for instance publications, presentations, preprints, grant proposals including funding requests for nodes on the machine. Users are required to submit such proofs continuously throughout the year, but also specifically at the time of account review at the beginning of the Fall semester. If no information is received upon request or there was no effort to help maintain the facility, the user’s account including all accounts sponsored by the faculty member will be suspended and/or their priority of usage reduced. To help with the documentation of research results, we provide at part of this webpage a Publications page where technical reports of results can be posted as well as webpages for each project, where publications and presentations of the research can be posted throughout the year.
The philosophy adopted here is one of granting an account on this facility first and then requiring help in maintaining it, as opposed to requiring up-front payment to use the facility or on-going charge-backs. This approach allows researchers to start using the facility immediately at any point in the year and to obtain initial research results using it. In turn and using these results, it is then necessary for users to actively demonstrate results as well as to search for funding to sustain the facility.
Effectiveness of these Rules
The HPCF Governance Committee approved these usage rules and the associated implementation in the scheduling rules in Spring 2011. It is subject to periodic review and future changes.