Table of Contents
- Serial Hello World
- Parallel Hello World
- Choosing a Compiler and MPI Implementation
- Logging which nodes are used
- Compiling C programs on other Hardware
In this tutorial we will demonstrate compilation of C source code on maya. First we start with a simple serial example, then work our way to compiling parallel code. Once the code is compiled we will see how to run it. We will assume that you know some basic C, so the code will not be explained in much detail. Working on a distributed cluster like maya is fundamentally different from working on a standard server (like gl.umbc.edu) or a personal computer, so please make sure to read and understand this material. More details can be found in manual pages on the system (e.g. try the command “man mpicc”).
A convenient way to save the example code to your account is as follows. There is a “download” link under each code example. You can copy this link from your browser and issue the following command in your maya terminal session.
[araim1@maya-usr1 ~]$ wget <paste_the_link_here>
[araim1@maya-usr1 ~]$ wget http://hpcf-files.umbc.edu/code/hello_serial/hello_serial.c --16:08:24-- http://hpcf-files.umbc.edu/code/hello_serial/hello_serial.c Resolving www.umbc.edu... 188.8.131.52 Connecting to www.umbc.edu|184.108.40.206|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 183 [text/plain] Saving to: `hello_serial.c' 100%[======================================================================================>] 183 --.-K/s in 0s 16:08:24 (29.1 MB/s) - `hello_serial.c' saved [183/183] [araim1@maya-usr1 ~]$ ls hello_serial.c [araim1@maya-usr1 ~]$
We have shown the prompt in the examples above to emphasize that a command is being issued. When following the examples, your prompt may look a bit different (e.g. your own username will be there!). When following along, be careful to only issue the command part, and not the prompt or the example output.
We will write a simple “hello world” program that prints the name of the host machine. Here is the code
Once you have saved this code to your account, try to compile it. There are several C compilers on the system. We will demonstrate the Intel C compiler, which is the default on maya.
[hu6@maya-usr1 hello_serial]$ icc hello_serial.c -o hello_serial [hu6@maya-usr1 hello_serial]$
If successful, no warnings will appear and an executable “hello_serial” will have been created.
[hu6@maya-usr1 hello_serial]$ ls hello_serial hello_serial.c
To see how to run your serial executable on the cluster, jump to how to run serial programs.
Now we will compile a “hello world” program which can be run in parallel on multiple processors. Save the following code to your account.
This version of the “hello world” program collects several pieces of information at each MPI process: the MPI processor name (i.e., the hostname), the process ID, and the number of processes in our job. Notice that we needed a new header file mpi.h to get access to the MPI commands. We also needed to call MPI_Init before using any of them, and MPI_Finalize at the end to clean up. Try to compile the code with the following command.
[hu6@maya-usr1 hello_parallel]$ mpiicc hello_parallel.c -o hello_parallel
After a successful compilation with no warnings, an executable “hello_parallel” should have been created
[hu6@maya-usr1 hello_parallel]$ ls hello_parallel hello_parallel.c [hu6@maya-usr1 hello_parallel]$
To see how to run your parallel executable on the cluster, jump to how to run parallel programs.
Choosing a Compiler and MPI Implementation
In the parallel code example, we’ve used a special compilation command “mpiicc”, that knows how to generate a parallel executable. “mpiicc” is the MPI compiler for Intel MPI. If you are loaded with other MPI modules such as OpenMPI or MVAPICH2, you should use ‘mpicc’.
When you compile MPI programs, the compiler needs information about where to find the MPI libraries and which libraries to link to. Fortunately you don’t have to worry about this since the MPI implementations provide wrapper scripts which call the compiler for you. These scripts are mpiicc (for C), mpiicpc (for C++), mpiifort (for Fortran). In order to successfully compile or run any MPI program, you must have your PATH, LD_LIBRARY_PATH and pieces of your environment set correctly so that your shell can find the wrapper script and the MPI libraries. This configutation is set by loading appropriate modules.
By default, your account is set up to use the Intel compiler with the Intel MPI implementation. To verify this, issue the following command.
[jongraf1@maya-usr1 ~]$ module list Currently Loaded Modulefiles: 1) dot 7) intel-mpi/64/4.1.3/049 2) matlab/r2014a 8) texlive/2014 3) comsol/4.4 9) quoter 4) gcc/4.8.2 10) git/2.0.4 5) slurm/14.03.6 11) default-environment 6) intel/compiler/64/15.0/2015.1.133
Generally, the Intel and Portland Group compilers produce more highly optimized programs which take advantage of the specific architecture, while GCC produces programs that are more likely to be portable across architectures (e.g. so a program copied from another machine is likely to still run). Therefore, the Intel and Portland Group compilers are recommended for programs requiring the best possible performance.
Other compilers and MPI implementations can be loaded via the module command; see Using modules on maya for more information. As an example outside of the system default of matching the MPI interface with the right compiler, consider the following MVAPICH2 MPI module:
$ module load mvapich2/gcc
We can tell from the name of module that this MPI module also will require the ‘gcc’ to be loaded to work properly:
$ module load gcc
IMPORTANT: Likewise it is important to be aware of how each MPI interface interacts with SLURM as sometimes the will require particular command and command syntax to work! Please check out this page, it is Lawrence Livermore National Laboratories’ official document on how to get certain MPI interfaces to work with SLURM.
Above we saw that we had a specific compiler/MPI combination loaded. The compiler and MPI implementation are combined because the MPI libraries your code uses at runtime should have been compiled with the same compiler you’re now using to compile your code. That means you have to pick one of the combinations first, before both compiling and running your program. It also means that if you change to a different combination, you’ll need to recompile your code before running it. Another useful thing to mention is the “-show” flag for the MPI compiler commands. This will display which options are currently in use. For example:
[hu6@maya-usr1 ~]$ mpiicc -show icc -I/cm/shared/apps/intel/mpi/4.1.3.049/intel64/include -L/cm/shared/apps/intel/mpi/4.1.3.049/intel64/lib -Xlinker --enable-new-dtags -Xlinker -rpath -Xlinker /cm/shared/apps/intel/mpi/4.1.3.049/intel64/lib -Xlinker -rpath -Xlinker /opt/intel/mpi-rt/4.1 -lmpigf -lmpi -lmpigi -ldl -lrt -lpthread
We can see that a call to “mpiicc” will invoke the icc compiler with Intel MPI options set, as we would expect from our module setup.
NOTE: It is possible that during executing you might get an error stating “could not set locale”:
[user@maya-usr1 ~]$ mpiicc test.c Catastrophic error: could not set locale "" to allow processing of multibyte characters compilation aborted for test.c (code 4)
If this error occurs the solution for this is to set your LANG variable to a valid Locale such as C the procedure to do that is as follows
[user@maya-usr1 ~]$ export LANG=C
Your job should now run with no issues
Logging which nodes are used
For a parallel program, it’s always a good idea to log which compute nodes you’ve used. We can modify our parallel hello world a little bit to accomplish this. Running this example on the cluster is the same as the Parallel Hello World program, see how to run parallel jobs.
Logging which nodes are used – Version 1
First we’ll start by logging some information to stdout. If you ran the Parallel Hello World example above, you probably noticed that the processes reported back in a random order. This is a bit difficult to read, so let’s try sorting the responses before we print them out. To do this, we’ll have the process with ID 0 receive greeting messages from every other process, in order by process ID. Process 0 will handle writing the messages to stdout.
Message sending is accomplished using the MPI_Send function, and receiving with the MPI_Recv function. Each process prepares its own message, then execution varies depending on the current process. Process 0 writes its own message first, then receives and writes the others in order by process ID. All other processes simply send their message to process 0.
- Move the logging code into a function, and move this function into a separate .h and .c file. That way we can call this piece of code from other programs we write later.
- Instead of using printf to write to stdout, we’ll use fprintf to write to a FILE* object. Using this mechanism we can write to a file, or stdout / stderr if we wanted to.
- Add a Makefile, to help simplify the build process. This is useful especially now that we have more than one source file.
Now it should be a simple matter to compile the program.
[hu6@maya-usr1 hello_send_recv-2]$ make mpiicc -g -O3 -c nodes_used.c -o nodes_used.o mpiicc -g -O3 -c -o hello_send_recv.o hello_send_recv.c mpiicc -g -O3 nodes_used.o hello_send_recv.o -o hello_send_recv -lm [hu6@maya-usr1 hello_send_recv-2]$ ls hello_send_recv hello_send_recv.o nodes_used.c nodes_used.o hello_send_recv.c Makefile nodes_used.h [hu6@maya-usr1 hello_send_recv-2]$
It’s also simple to clean up the project (object files and executables)
[hu6@maya-usr1 hello_send_recv-2]$ make clean rm -f *.o hello_send_recv [hu6@maya-usr1 hello_send_recv-2]$ ls hello_send_recv.c Makefile nodes_used.c nodes_used.h [hu6@maya-usr1 hello_send_recv-2]$
Now the nodes_used.h and nodes_used.c files can be copied to other projects, and used as a utility.
Compiling C programs on other Hardware
Compiling on GPUs
For instructions on how to compile code for the GPU, see CUDA for GPU.
Compiling on Intel Phi coprocessors
For instructions on how to compile code for the the Intel Phi coprocessor, see Intel Phi.