Table of Contents
- Serial Hello World
- Parallel Hello World
- Choosing a Compiler and MPI Implementation
- Logging which nodes are used
In this tutorial we will illustrate how to compile C source code and run the resulting executable on Stampede2. Working on a distributed cluster like Stampede2 is fundamentally different from working on a standard server (like gl.umbc.edu) or a personal computer, so please make sure to read and understand this material. We will first start with a familiar serial example, and work our way to compiling parallel code. We will assume that you know some basic programming concepts, so the code will not be explained in explicit detail. More details can be found in manual pages on the system (e.g. try the command “man mpicc”).
A convenient way to save the example code to your workspace is as follows. There is a “download” link under each code example. You can copy this link from your browser and issue the following command in your Stampede2 terminal session.
login1(1012)$ wget <paste_the_link_here>
login1(1012)$ wget http://hpcf-files.umbc.edu/code-2018/stampede2/Hello_Serial/hello_serial.c --2018-07-17 20:53:48-- http://hpcf-files.umbc.edu/code-2018/stampede2/Hello_Serial/hello_serial.c Resolving hpcf-files.umbc.edu (hpcf-files.umbc.edu)... 220.127.116.11, 2620:0:5301:100::10:bc Connecting to hpcf-files.umbc.edu (hpcf-files.umbc.edu)|18.104.22.168|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 184 [text/plain] Saving to: ‘hello_serial.c’ 100%[=========================================================================>] 184 --.-K/s in 0s 2018-07-17 20:53:48 (38.3 MB/s) - ‘hello_serial.c’ saved [184/184] login1(1002)$ ls -l total 4 -rw-rw---- 1 jdella-g G-819243 184 Feb 1 2014 hello_serial.c
We have shown the prompt in the examples above to emphasize that a command is being issued. When following the examples, your prompt may look a bit different (e.g. your own username will be there!), but be careful to only issue the command part, not the prompt or the example output.
We will write a simple “hello world” program that prints the name of the host machine. Here is the code
Once you have saved this code to your workspace, try and compile it. There are several C compilers on Stampede2. We will demonstrate the Intel C compiler, which is the default on Stampede2.
login1(1003)$ icc hello_serial.c -o hello_serial
If successful, no warnings will appear and an executable “hello_serial” will have been created.
login1(1005)$ ls -l total 228 -rwxrwx--- 1 jdella-g G-819243 225496 Jul 18 09:14 hello_serial -rw-rw---- 1 jdella-g G-819243 184 Feb 1 2014 hello_serial.c login1(1006)$
Here we can also use the GNU compiler to produce another executable.
login1(1006)$ gcc hello_serial.c -o hello_serial2 login1(1007)$ ls -l total 229 -rwxrwx--- 1 jdella-g G-819243 225496 Jul 18 09:14 hello_serial -rwxrwx--- 1 jdella-g G-819243 216080 Jul 18 09:34 hello_serial2 -rw-rw---- 1 jdella-g G-819243 184 Feb 1 2014 hello_serial.c
To see how to run your serial executable on the cluster, jump to how to run serial programs.
Now we will compile a “hello world” program which can be run in parallel on multiple processors. Save the following code to your workspace.
This version of the “hello world” program collects several pieces of information at each MPI process: the MPI processor name (i.e., the hostname), the process ID, and the number of processes in our job. Notice that we needed a new header file mpi.h to get access to the MPI commands. We also needed to call MPI_Init before using any of them, and MPI_Finalize at the end to clean up. Try to compile the code with the following command.
login1(1017)$ mpiicc hello_parallel.c -o hello_parallel
After a successful compilation with no warnings, an executable “hello_parallel” should have been created
login1(1018)$ ls -l total 228 -rwxrwx--- 1 jdella-g G-819243 227424 Jul 18 09:41 hello_parallel -rw-rw---- 1 jdella-g G-819243 490 Feb 1 2014 hello_parallel.c
To see how to run your parallel executable on the cluster, jump to how to run parallel programs.
Choosing a Compiler and MPI Implementation
In the parallel code example, we’ve used a special compilation command “mpicc”, which generates a parallel executable. Stampede2 uses “mpicc” no matter the MPI module loaded, but by default “mpicc” is the MPI compiler for Intel MPI.
When you compile MPI programs, the compiler needs information about where to find the MPI libraries and which libraries to link to. Fortunately you don’t have to worry about this since the MPI implementations provide wrapper scripts which call the compiler for you. These scripts are mpicc (for C), mpicxx (for C++), mpif90 (for Fortran). In order to successfully compile or run any MPI program, you must have your PATH, LD_LIBRARY_PATH and pieces of your environment set correctly so that your shell can find the wrapper script and the MPI libraries. This configutation is set by loading appropriate modules.
By default, your account is set up to use the Intel compiler with the Intel MPI implementation. To verify this, issue the following command.
login1(1023)$ module list Currently Loaded Modules: 1) intel/17.0.4 2) impi/17.0.3 3) git/2.9.0 4) autotools/1.1 5) python/2.7.13 6) xalt/2.1.2 7) TACC
Generally, the Intel and Portland Group compilers produce more highly optimized programs which take advantage of the specific architecture, while GCC produces programs that are more likely to be portable across architectures (e.g. so a program copied from another machine is likely to still run). Therefore, the Intel and Portland Group compilers are recommended for programs requiring the best possible performance.
Other compilers and MPI implementations can be loaded via the module command; see using modules for more information. As an example outside of the system default of matching the MPI interface with the right compiler, consider the following MVAPICH2 MPI module:
login1(1026)$ module load mvapich2 Lmod is automatically replacing "impi/17.0.3" with "mvapich2/2.3rc2". Due to MODULEPATH changes, the following have been reloaded: 1) python/2.7.13
Notice here that Stampede2 utilizes TACC’s LMOD, a Lua-based module system, which dynamically loads and unloads dependent packages based on the compatibility of the user chosen software package.
IMPORTANT: Likewise it is important to be aware of how each MPI interface interacts with SLURM as sometimes the will require particular command and command syntax to work! Please check out this page, it is Lawrence Livermore National Laboratories’ official document on how to get certain MPI interfaces to work with SLURM.
Above we saw that we had a specific compiler/MPI combination loaded. The compiler and MPI implementation are combined because the MPI libraries your code uses at runtime should have been compiled with the same compiler you’re now using to compile your code. That means you have to pick one of the combinations first, before both compiling and running your program. It also means that if you change to a different combination, you’ll need to recompile your code before running it. Another useful thing to mention is the “-show” flag for the MPI compiler commands. This will display which options are currently in use. For example:
login1(1035)$ mpicc -show icc -I/opt/intel/compilers_and_libraries_2017.4.196/linux/mpi/intel64/include - L/opt/intel/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib/release_mt - -lmpifort -lmpi -lmpigi -ldl -lrt -lpthread
We can see that a call to “mpicc” will show the default Intel MPI library loaded.
Logging which nodes are used
For a parallel program, it’s always a good idea to log which compute nodes you’ve used. We can modify our parallel hello world a little bit to accomplish this. Running this example on the cluster is the same as the Parallel Hello World program, see how to run parallel jobs.
Logging which nodes are used – Version 1
First we’ll start by logging some information to stdout. If you ran the Parallel Hello World example above, you probably noticed that the processes reported back in a random order. This is a bit difficult to read, so let’s try sorting the responses before we print them out. To do this, we’ll have the process with ID 0 receive greeting messages from every other process, in order by process ID. Process 0 will handle writing the messages to stdout.
Message sending is accomplished using the MPI_Send function, and receiving with the MPI_Recv function. Each process prepares its own message, then execution varies depending on the current process. Process 0 writes its own message first, then receives and writes the others in order by process ID. All other processes simply send their message to process 0.
- Move the logging code into a function, and move this function into a separate .h and .c file. That way we can call this piece of code from other programs we write later.
- Instead of using printf to write to stdout, we’ll use fprintf to write to a FILE* object. Using this mechanism we can write to a file, or stdout / stderr if we wanted to.
- Add a Makefile, to help simplify the build process. This is useful especially now that we have more than one source file.
Now it should be a simple matter to compile the program.
[hu6@maya-usr1 hello_send_recv-2]$ login1(1059)$ make mpicc -g -O3 -c nodes_used.c -o nodes_used.o mpicc -g -O3 -c -o hello_send_recv.o hello_send_recv.c mpicc -g -O3 nodes_used.o hello_send_recv.o -o hello_send_recv -lm login1(1060)$ ls -l total 260 -rwxrwx--- 1 jdella-g G-819243 231856 Jul 18 11:28 hello_send_recv -rw-rw---- 1 jdella-g G-819243 736 Feb 1 2014 hello_send_recv.c -rw-rw---- 1 jdella-g G-819243 6656 Jul 18 11:28 hello_send_recv.o -rw-rw---- 1 jdella-g G-819243 307 Jul 18 11:28 Makefile -rw-rw---- 1 jdella-g G-819243 657 Aug 26 2014 nodes_used.c -rw-rw---- 1 jdella-g G-819243 176 Apr 28 2011 nodes_used.h -rw-rw---- 1 jdella-g G-819243 7392 Jul 18 11:28 nodes_used.o
It’s also simple to clean up the project (object files and executables)
login1(1061)$ make clean rm -f *.o hello_send_recv login1(1062)$ ls -l total 16 -rw-rw---- 1 jdella-g G-819243 736 Feb 1 2014 hello_send_recv.c -rw-rw---- 1 jdella-g G-819243 307 Jul 18 11:28 Makefile -rw-rw---- 1 jdella-g G-819243 657 Aug 26 2014 nodes_used.c -rw-rw---- 1 jdella-g G-819243 176 Apr 28 2011 nodes_used.h
Now the nodes_used.h and nodes_used.c files can be copied to other projects, and used as a utility.
Now we are ready to run our code, see how to run parallel programs.