Table of Contents
- Serial Hello World
- Parallel Hello World
- Logging which nodes are used
- Choosing a Compiler and MPI Implementation
In this tutorial we will illustrate how to compile C source code and run the resulting executable on the CPU cluster in taki. Working on a distributed cluster like taki is fundamentally different from working on a standard server (like gl.umbc.edu) or a personal computer, so please make sure to read and understand this material. We will first start with a classical serial example, and work our way to compiling parallel code. We will assume that you know some basic programming concepts, so the code will not be explained in explicit detail. More details can be found in manual pages on the system that are available for Linux commands (e.g., try “man mkdir”, “man cd”, “man pwd”, “man ls”) as well as for C functions (e.g., try “man fprintf”).
We also want to demonstrate here that it is a good idea to collect files for a project in a directory. This project is on a serial version of the “Hello, world!” program. Therefore, use the mkdir (= “make directory”) command to create a directory “Hello_Serial” and cd (= “change directory”) into it.
[gobbert@taki-usr1 ~]$ mkdir Hello_Serial [gobbert@taki-usr1 ~]$ cd Hello_Serial [gobbert@taki-usr1 Hello_Serial]$ pwd /home/gobbert/Hello_Serial
Notice that the command prompt indicates that I am in directory Hello_Serial now. Use the pwd (= “print working directory”) command any time to confirm where you are in your directory structure and ll (short for “ls -l”) to list the files that are there.
A convenient way to save the example code on this page directly into the current directory of your project uses the wget command as follows. There is a “download” link under each code example. You can copy the link address in your browser and copy it after the wget command in your taki terminal session to download the file to the local directory, as shown here.
[gobbert@taki-usr1 Hello_Serial]$ wget http://hpcf-files.umbc.edu/code-2018/taki/Hello_Serial/hello_serial.c --2019-01-28 09:43:08-- http://hpcf-files.umbc.edu/code-2018/taki/Hello_Serial/hello_serial.c Resolving hpcf-files.umbc.edu (hpcf-files.umbc.edu)... 220.127.116.11 Connecting to hpcf-files.umbc.edu (hpcf-files.umbc.edu)|18.104.22.168|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 184 [text/plain] Saving to: 'hello_serial.c' 100%[======================================>] 184 --.-K/s in 0s 2019-01-28 09:43:08 (46.0 MB/s) - 'hello_serial.c' saved [184/184]
You can list all files to see that the file is present now:
[gobbert@taki-usr1 Hello_Serial]$ ll total 5 -rw-rw---- 1 gobbert pi_gobbert 184 Feb 1 2014 hello_serial.c
We have shown the prompt in the examples above to emphasize that a command is being issued. When following the examples, your prompt may look a bit different (e.g., your own username will be there!), but be careful to only issue the command part, not the prompt or the example output.
We will now consider a simple “Hello, world!” program that prints the name of the host machine. Here is the code
Creating a directory for this project and downloading this code with wget was the example given above on this page.
Once you have saved this code to your workspace, we have to compile it before we can execute it, since C is a source code programming language. There are several C compilers on taki. We will demonstrate the Intel C compiler, which is the default on taki.
[gobbert@taki-usr1 Hello_Serial]$ icc hello_serial.c -o hello_serial
If successful, no errors or warnings will appear and an executable hello_serial will have been created, in addition to the source code in hello_serial.c.
[gobbert@taki-usr1 Hello_Serial]$ ll total 21 -rwxrwx--- 1 gobbert pi_gobbert 22488 Jan 28 09:45 hello_serial* -rw-rw---- 1 gobbert pi_gobbert 184 Feb 1 2014 hello_serial.c
Notice that the “x” in the permissions “-rwxrwx—” indicates that hello_serial is an executable; this is also indicated by the asterisk “*” following its name (the “*” is not part of the filename, it is just an indication from the ls command). When a file is not an executable (or there is no permission to execute it), a dash “-” appears in place of the “x”; the dashes in “-rw-rw—-” for hello_serial.c confirm that this C source code is not executable in its source code form.
To see how to run your serial executable on the cluster, jump to how to run serial programs.
Now we will compile a “Hello, world!” program which can be run in parallel on multiple processors. You may want to create a new directory for this project using “mkdir hello_parallel”. Use wget again to save the following code to your directory.
This version of the “Hello, world!” program collects several pieces of information at each MPI process: the MPI processor name (i.e., the hostname), the process ID, and the number of processes in our job. Notice that we needed a new header file mpi.h to get access to the MPI commands. We also need to call MPI_Init before using any other MPI commands, and MPI_Finalize is needed at the end to clean up. Compile the code with the following command.
[gobbert@taki-usr1 hello_parallel]$ mpiicc hello_parallel.c -o hello_parallel
After a successful compilation with no errors or warnings, an executable “hello_parallel” should have been created, which we confirm by “ll”.
[gobbert@taki-usr1 hello_parallel]$ ll total 5 -rwxrwx--- 1 gobbert pi_gobbert 22664 Oct 15 09:15 hello_parallel* -rw-rw---- 1 gobbert pi_gobbert 490 Feb 1 2014 hello_parallel.c
To see how to run your parallel executable on the cluster, jump to how to run parallel programs.
Logging which nodes are used
For a parallel program, it is always a good idea to log which compute nodes you have used. We can extend our parallel “Hello, world!” program to accomplish this, namely in addition to printing the information to stdout, we will save it to file. First, the functionality is contained in a self-contained function nodesused() that you can copy also into other programs and then call from the main program, as shown in the code below. Second, we noticed that the processes reported back in a random order to stdout. This is difficult to read for large numbers of processes, so for the output to file, we have the process with ID 0 receive the greeting message from each other process, in order by process ID, and only Process 0 will write the messages to file. Third, the code below actually creates and writes to two files: (i) The file “nodes_used.log” contains only the process ID and hostname, which is the same information as printed stdout already, but ordered. (ii) The file “nodesused_cpuid.log” additionally outputs the CPU ID, that is, the number of the computational core in the two CPUs on the node that the MPI process executed on.
Message sending is accomplished using the MPI_Send() function, and receiving with the MPI_Recv() function. Each process prepares its own message, then execution varies depending on the current process. Process 0 writes its own message first, then receives and writes the others in order by process ID. All other processes simply send their message to process 0. The fprintf function is used to write one line for each MPI process to each of the two output files.
To see how to run your parallel executable of the nodesused program on the cluster, jump to how to run parallel programs on the compute partition.
Choosing a Compiler and MPI Implementation
In the parallel code example, we used a special parallel compiler “mpiicc”, which generates a parallel executable. On taki, “mpiicc” is the default and refers to the Intel C compiler and the Intel MPI implementation.
When you compile MPI programs, the compiler needs information about where to find the MPI libraries and which libraries to link to. Fortunately, MPI implementations provide wrapper scripts which call the compiler for you, such as the “mpiicc”, which we just used above and thought of as parallel compiler. These scripts are mpicc (for C), mpicxx (for C++), mpif90 (for Fortran). In order to successfully compile or run any MPI program, you must have your PATH, LD_LIBRARY_PATH and pieces of your environment set correctly so that your shell can find the wrapper script and the MPI libraries. This configutation is set by loading appropriate modules.
Our system is run on CentOS 6.8. We support only the bash shell. The following explains how to access the available compiler suites and MPI implementations on maya:
We supply three compiler suites:
- Intel compiler suite (Default) with Composer XE – C, C++, Fortran 77, 90, 95, and 2003. This includes the Intel Math Kernel Library (LAPACK/BLAS)
- GNU compiler suite – C, C++, Fortran 70, 90, and 95
- Portland Group compiler suite – C, C++, Fortran 77, 90, and 95 plus limited Fortran 2003 support. This includes a commercial, optimzized ACML (LAPACK/BLAS/FFT) math library.
Maya gives the opportunity for the user to choose any combination of compiler suites and MPI implementations. MPI implementations available in maya are listed below in the parallel compiling section.
The command used to compile code depends on the language and compiler used.
Since Intel compiler suite is the default setting in Maya we can directly use the commands in the second column of the above table without loading any extra module. However let’s say we are trying to compile a serial C program and we want to use PGI compiler suite for the task. Since PGI compiler suites are not loaded by default we need to load the required module using “module load” command as given below.
[av02016@maya-usr1 ~]$ pgcc -bash: pgcc: command not found [av02016@maya-usr1 ~]$ module avail pgi ------------------------------------------------ /cm/shared/modulefiles -------------------------------------- pgi/64/16.5 [av02016@maya-usr1 ~]$ module load pgi/64/16.5 [av02016@maya-usr1 ~]$
Note that there are several versions of GNU compiler suites (gcc) available in maya,
[av02016@maya-usr1 ~]$ module avail gcc ----------------------------------------------------- /cm/shared/modulefiles ------------------- gcc/4.8.4 ---------------------------------------------------------- /cm/local/modulefiles -------------------- gcc/5.1.0 ----------------------------------------------------- /usr/cluster/contrib/modulefiles -------------- gcc/5.5.0
For parallel computing, the module utility needs to be used to switch between different MPI implementations available on maya. We also provide three implementations of Infiniband-enabled MPI:
By default, your account is set up to use the Intel compiler with the Intel MPI implementation. To verify this, issue the following command.
[av02016@maya-usr1 ~]$ module list Currently Loaded Modulefiles: 1) dot 4) gcc/4.8.4 7) intel/compiler/64/15.0/full 10) quoter 13) tmux/2.1 2) matlab/r2016b 5) hwloc/1.9.1 8) intel-mpi/64/5.0.3/048 11) monitor 14) default-environment 3) comsol/5.1 6) slurm/14.11.11 9) texlive/2014 12) git/2.0.4 [av02016@maya-usr1 ~]$
In order to load the MPI implementations we first have to load the required compiler suite. Given below is available MPI implementation under each compiler suite.
Intel compiler suite
For the intel compiler, there are the intel mpi, openmpi version 1.8.5, and mvapich2 version 2.1 on maya.
From the above “module list” output, you can see intel compiler suite and Intel MPI implementation is available by default, so you can directly use this compiler and the MPI implementation. The table provides the command necessary to compile MPI code for each implementation of MPI in C, C++, and Fortran is available at the end of this section.
The way of loading MVAPICH2 version 2.1 under intel compiler suite is,
[av02016@maya-usr1 ~]$ module load mvapich2/intel/64/2.1 [av02016@maya-usr1 ~]$
Notice that the format of path is “MPI implementation/ compiler/Architecture/ version”.
We can load OpenMPI implementation under intel compiler suite as below,
[av02016@maya-usr1 ~]$ module load openmpi/intel/64/1.8.5 [av02016@maya-usr1 ~]$
IMPORTANT:It is important to be aware of how each MPI interface interacts with SLURM as sometimes the will require particular command and command syntax to work! Please check out this page, it is Lawrence Livermore National Laboratories’ official document on how to get certain MPI interfaces to work with SLURM.
IMPORTANT: If you have multiple modules loaded for MPI, the last one loaded will be first on your path. As an example, lets say we have both intel-mpi module and after that we loaded mvapich2. Here, the version of “mpirun” we get is from mvapich2, not intel-mpi. When we remove the mvapich2 module by unloading, we then get mpirun from intel-mpi.
The module files set a number of environment variables, and in cases where they conflict, the last loaded module win.
GNU compiler suite
For the gcc compiler, there are the openmpi version 1.8.5 and mvapich version 2.1 on maya.
To load the combination (mvapich2+gcc) first load the mpi implementation and then the compiler suite.
[av02016@maya-usr1 ~]$ module load mvapich2/gcc/64/2.1 [av02016@maya-usr1 ~]$ [av02016@maya-usr1 ~]$ module load gcc [av02016@maya-usr1 ~]$
To load the openMPI with gcc compiler use the below command,
[av02016@maya-usr1 ~]$ module load openmpi/gcc/64/1.8.5 [av02016@maya-usr1 ~]$ [av02016@maya-usr1 ~]$ module load gcc [av02016@maya-usr1 ~]$