Skip to Main Content

How to Compile C Programs on Stampede2

Table of Contents


In this tutorial we will illustrate how to compile C source code and run the resulting executable on Stampede2. Working on a distributed cluster like Stampede2 is fundamentally different from working on a standard server (like or a personal computer, so please make sure to read and understand this material. We will first start with a familiar serial example, and work our way to compiling parallel code. We will assume that you know some basic programming concepts, so the code will not be explained in explicit detail. More details can be found in manual pages on the system (e.g. try the command “man mpicc”).

A convenient way to save the example code to your workspace is as follows. There is a “download” link under each code example. You can copy this link from your browser and issue the following command in your Stampede2 terminal session.

login1(1012)$ wget <paste_the_link_here>

For example

login1(1012)$ wget

--2018-07-17 20:53:48--
Resolving (, 2620:0:5301:100::10:bc
Connecting to (||:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 184 [text/plain]
Saving to: ‘hello_serial.c’

100%[=========================================================================>] 184         --.-K/s   in 0s

2018-07-17 20:53:48 (38.3 MB/s) - ‘hello_serial.c’ saved [184/184]

login1(1002)$ ls -l
total 4
-rw-rw---- 1 jdella-g G-819243 184 Feb  1  2014 hello_serial.c

We have shown the prompt in the examples above to emphasize that a command is being issued. When following the examples, your prompt may look a bit different (e.g. your own username will be there!), but be careful to only issue the command part, not the prompt or the example output.

Serial Hello World

We will write a simple “hello world” program that prints the name of the host machine. Here is the code

Download: ../code-2018/stampede2/Hello_Serial/hello_serial.c

Once you have saved this code to your workspace, try and compile it. There are several C compilers on Stampede2. We will demonstrate the Intel C compiler, which is the default on Stampede2.

login1(1003)$ icc hello_serial.c -o hello_serial

If successful, no warnings will appear and an executable “hello_serial” will have been created.

login1(1005)$ ls -l
total 228
-rwxrwx--- 1 jdella-g G-819243 225496 Jul 18 09:14 hello_serial
-rw-rw---- 1 jdella-g G-819243    184 Feb  1  2014 hello_serial.c

Here we can also use the GNU compiler to produce another executable.

login1(1006)$ gcc hello_serial.c -o hello_serial2
login1(1007)$ ls -l
total 229
-rwxrwx--- 1 jdella-g G-819243 225496 Jul 18 09:14 hello_serial
-rwxrwx--- 1 jdella-g G-819243 216080 Jul 18 09:34 hello_serial2
-rw-rw---- 1 jdella-g G-819243    184 Feb  1  2014 hello_serial.c

To see how to run your serial executable on the cluster, jump to how to run serial programs.

Parallel Hello World

Now we will compile a “hello world” program which can be run in parallel on multiple processors. Save the following code to your workspace.
Download: ../code-2018/stampede2/hello_parallel/hello_parallel.c

This version of the “hello world” program collects several pieces of information at each MPI process: the MPI processor name (i.e., the hostname), the process ID, and the number of processes in our job. Notice that we needed a new header file mpi.h to get access to the MPI commands. We also needed to call MPI_Init before using any of them, and MPI_Finalize at the end to clean up. Try to compile the code with the following command.

login1(1017)$ mpiicc hello_parallel.c -o hello_parallel

After a successful compilation with no warnings, an executable “hello_parallel” should have been created

login1(1018)$ ls -l
total 228
-rwxrwx--- 1 jdella-g G-819243 227424 Jul 18 09:41 hello_parallel
-rw-rw---- 1 jdella-g G-819243    490 Feb  1  2014 hello_parallel.c

To see how to run your parallel executable on the cluster, jump to how to run parallel programs.

In this example, we’ve written output from our MPI program to stdout. As a general guideline, stdout and stderr should be used for reporting status information, and not for returning large datasets. If your program does need to write out a lot of data, it would be more appropriate to use file I/O instead.

Choosing a Compiler and MPI Implementation

In the parallel code example, we’ve used a special compilation command “mpicc”, which generates a parallel executable. Stampede2 uses “mpicc” no matter the MPI module loaded, but by default “mpicc” is the MPI compiler for Intel MPI.

When you compile MPI programs, the compiler needs information about where to find the MPI libraries and which libraries to link to. Fortunately you don’t have to worry about this since the MPI implementations provide wrapper scripts which call the compiler for you. These scripts are mpicc (for C), mpicxx (for C++), mpif90 (for Fortran). In order to successfully compile or run any MPI program, you must have your PATH, LD_LIBRARY_PATH and pieces of your environment set correctly so that your shell can find the wrapper script and the MPI libraries. This configutation is set by loading appropriate modules.

By default, your account is set up to use the Intel compiler with the Intel MPI implementation. To verify this, issue the following command.

login1(1023)$ module list

Currently Loaded Modules:
  1) intel/17.0.4   2) impi/17.0.3   3) git/2.9.0   4) autotools/1.1   5) python/2.7.13   
  6) xalt/2.1.2   7) TACC

Generally, the Intel and Portland Group compilers produce more highly optimized programs which take advantage of the specific architecture, while GCC produces programs that are more likely to be portable across architectures (e.g. so a program copied from another machine is likely to still run). Therefore, the Intel and Portland Group compilers are recommended for programs requiring the best possible performance.

Other compilers and MPI implementations can be loaded via the module command; see using modules for more information. As an example outside of the system default of matching the MPI interface with the right compiler, consider the following MVAPICH2 MPI module:

login1(1026)$ module load mvapich2

Lmod is automatically replacing "impi/17.0.3" with "mvapich2/2.3rc2".

Due to MODULEPATH changes, the following have been reloaded:
  1) python/2.7.13

Notice here that Stampede2 utilizes TACC’s LMOD, a Lua-based module system, which dynamically loads and unloads dependent packages based on the compatibility of the user chosen software package.

IMPORTANT: Likewise it is important to be aware of how each MPI interface interacts with SLURM as sometimes the will require particular command and command syntax to work! Please check out this page, it is Lawrence Livermore National Laboratories’ official document on how to get certain MPI interfaces to work with SLURM.

Users of tara should note that the module system replaces the switcher system. Configurations are accessed by loading and unloading modules.

Above we saw that we had a specific compiler/MPI combination loaded. The compiler and MPI implementation are combined because the MPI libraries your code uses at runtime should have been compiled with the same compiler you’re now using to compile your code. That means you have to pick one of the combinations first, before both compiling and running your program. It also means that if you change to a different combination, you’ll need to recompile your code before running it. Another useful thing to mention is the “-show” flag for the MPI compiler commands. This will display which options are currently in use. For example:

login1(1035)$ mpicc -show
icc -I/opt/intel/compilers_and_libraries_2017.4.196/linux/mpi/intel64/include -
L/opt/intel/compilers_and_libraries_2017.4.196/linux/mpi/intel64/lib/release_mt -
-lmpifort -lmpi -lmpigi -ldl -lrt -lpthread

We can see that a call to “mpicc” will show the default Intel MPI library loaded.

Logging which nodes are used

For a parallel program, it’s always a good idea to log which compute nodes you’ve used. We can modify our parallel hello world a little bit to accomplish this. Running this example on the cluster is the same as the Parallel Hello World program, see how to run parallel jobs.

Logging which nodes are used – Version 1

First we’ll start by logging some information to stdout. If you ran the Parallel Hello World example above, you probably noticed that the processes reported back in a random order. This is a bit difficult to read, so let’s try sorting the responses before we print them out. To do this, we’ll have the process with ID 0 receive greeting messages from every other process, in order by process ID. Process 0 will handle writing the messages to stdout.

Message sending is accomplished using the MPI_Send function, and receiving with the MPI_Recv function. Each process prepares its own message, then execution varies depending on the current process. Process 0 writes its own message first, then receives and writes the others in order by process ID. All other processes simply send their message to process 0.

Logging which nodes are used – Version 2

Now we’ll make a few improvements to the first version, (1) to make it look more like a real project and (2) to create a useful utility function.

  • Move the logging code into a function, and move this function into a separate .h and .c file. That way we can call this piece of code from other programs we write later.
  • Instead of using printf to write to stdout, we’ll use fprintf to write to a FILE* object. Using this mechanism we can write to a file, or stdout / stderr if we wanted to.
  • Add a Makefile, to help simplify the build process. This is useful especially now that we have more than one source file.

Download the following files

Warning: if you copy and paste the Makefile text from your browser, you will lose some of the formatting. Namely, some of the lines need to begin with tabs. The easiest way to avoid this problem is to download the file using the link.

Now it should be a simple matter to compile the program.

[hu6@maya-usr1 hello_send_recv-2]$ login1(1059)$ make
mpicc -g -O3   -c nodes_used.c -o nodes_used.o
mpicc -g -O3    -c -o hello_send_recv.o hello_send_recv.c
mpicc -g -O3   nodes_used.o hello_send_recv.o -o hello_send_recv -lm
login1(1060)$ ls -l
total 260
-rwxrwx--- 1 jdella-g G-819243 231856 Jul 18 11:28 hello_send_recv
-rw-rw---- 1 jdella-g G-819243    736 Feb  1  2014 hello_send_recv.c
-rw-rw---- 1 jdella-g G-819243   6656 Jul 18 11:28 hello_send_recv.o
-rw-rw---- 1 jdella-g G-819243    307 Jul 18 11:28 Makefile
-rw-rw---- 1 jdella-g G-819243    657 Aug 26  2014 nodes_used.c
-rw-rw---- 1 jdella-g G-819243    176 Apr 28  2011 nodes_used.h
-rw-rw---- 1 jdella-g G-819243   7392 Jul 18 11:28 nodes_used.o

It’s also simple to clean up the project (object files and executables)

login1(1061)$ make clean
rm -f *.o hello_send_recv
login1(1062)$ ls -l
total 16
-rw-rw---- 1 jdella-g G-819243 736 Feb  1  2014 hello_send_recv.c
-rw-rw---- 1 jdella-g G-819243 307 Jul 18 11:28 Makefile
-rw-rw---- 1 jdella-g G-819243 657 Aug 26  2014 nodes_used.c
-rw-rw---- 1 jdella-g G-819243 176 Apr 28  2011 nodes_used.h

Now the nodes_used.h and nodes_used.c files can be copied to other projects, and used as a utility.

Now we are ready to run our code, see how to run parallel programs.