How to run OpenMP programs on maya

Introduction

On this page we’ll see how to run OpenMP programs on the cluster. Before proceeding, make sure you’ve read the How To Run tutorial first.

OpenMP is a parallel programming model for shared memory systems. In this model, the user creates worker threads which are coordinated by a master thread. The user marks sections of code as parallel using special preprocessor directives. The nodes on maya do not share memory, so OpenMP by itself cannot be used to coordinate multiple node jobs. But it can be used for multiple cores on a single node. For this reason, we recommend MPI as the more general programming model. For multi-node jobs, hybrid programs using both MPI + OpenMP are also possible.

OpenMP is available from several programming languages such as C and FORTRAN.

Hello World example C

Let’s start with a simple Hello World script written in C (taken from an example at Purdue)

#include <omp.h>
#include <stdio.h>

int main (int argc, char *argv[])
{
    int nthreads, thread_id;

    #pragma omp parallel private(nthreads, thread_id)
    {
        thread_id = omp_get_thread_num();
        printf("Thread %d says: Hello World\n", thread_id);

        if (thread_id == 0)
        {
            nthreads = omp_get_num_threads();
            printf("Thread %d reports: the number of threads are %d\n", 
                thread_id, nthreads);
        }
    }

    return 0;
}


Download: ../code/hello_openmp_c/hello_openmp.c

Here is the batch script we will use to launch it

#!/bin/bash
#SBATCH --job-name=OMP_hello
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8

export OMP_NUM_THREADS=8
./hello_openmp

Download: ../code/hello_openmp_c/run.slurm

Notice the setting of the environment variable OMP_NUM_THREADS to 8; this controls how many OpenMP threads will be used for the job. Setting this to a higher number will generally not improve performance, since there are 8 cores on each node on maya2009 and maya2010 nodes. If you don’t require 8 threads, you can also decrease “–ntasks-per-node” accordingly (you should make OMP_NUM_THREADS match the total allocated cpus which can be calculated by multiplying –ntasks-per-node and –cpus-per-task).

Another important thing to note – if we change “–nodes” to 2, the job will be duplicated on two nodes, not parallelized across them as we would probably want. So it’s recommended to leave –nodes=1

Now we will compile and launch the job

[araim1@maya-usr1 hello_openmp_c]$ gcc -fopenmp hello_openmp.c -o hello_openmp -lm # For GNU compiler
[araim1@maya-usr1 hello_openmp_c]$ icc -openmp  hello_openmp.c -o hello_openmp -lm # For Intel compiler
[araim1@maya-usr1 hello_openmp_c]$ ls
hello_openmp.c   run.slurm
[araim1@maya-usr1 hello_openmp_c]$ sbatch run.slurm 
Submitted batch job 37532
[araim1@maya-usr1 hello_openmp_c]$ cat slurm.out 
Thread 1 says: Hello World
Thread 5 says: Hello World
Thread 6 says: Hello World
Thread 2 says: Hello World
Thread 7 says: Hello World
Thread 0 says: Hello World
Thread 3 says: Hello World
Thread 0 reports: the number of threads are 8
Thread 4 says: Hello World
[araim1@maya-usr1 hello_openmp_c]$ 

Hello World example FORTRAN

Now let’s see a similar program in FORTRAN. Begin by downloading the hello world FORTRAN example from here. Then grab the following batch script (which is the same as for the C code above)

#!/bin/bash
#SBATCH --job-name=OMP_hello
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8

export OMP_NUM_THREADS=8
./hello_open_mp

Download: ../code/hello_openmp_f90/run.slurm

Now we can compile and run the code, the same way as in the C example

[araim1@maya-usr1 hello_openmp_f90]$ gfortran -fopenmp hello_open_mp.f90 -o hello_open_mp # For GNU compiler
[araim1@maya-usr1 hello_openmp_f90]$ ifort    -openmp  hello_open_mp.f90 -o hello_open_mp # For Intel Compiler
[araim1@maya-usr1 hello_openmp_f90]$ sbatch run.slurm 
Submitted batch job 37537
[araim1@maya-usr1 hello_openmp_f90]$ cat slurm.out 
 
HELLO_OPEN_MP
  FORTRAN90/OpenMP version
  The number of processors available =        8
  The number of threads available    =        8
 
  OUTSIDE the parallel region.
 
  HELLO from process        0
 
  Going INSIDE the parallel region:
 
  HELLO from process        0
  HELLO from process        4
  HELLO from process        5
  HELLO from process        3
  HELLO from process        6
  HELLO from process        2
  HELLO from process        7
  HELLO from process        1
 
  Back OUTSIDE the parallel region.
 
HELLO_OPEN_MP
  Normal end of execution.
 
  Elapsed wall clock time =   0.131280E-01
[araim1@maya-usr1 hello_openmp_f90]$ 

MPI/OpenMP Hybrid in C

It may be useful to consider hybrid programming using both MPI and OpenMP. For example, MPI can be used for communication between nodes, and OpenMP can be used for shared memory programming within a node.

The following Hello World program launches a predefined number of OpenMP threads (which we will take to be 8 – the number of processor cores on a node), and prints a message from each. The thread with thread ID 0 also reports the number of threads in its group.

#include <stdio.h>
#include <omp.h>
#include <mpi.h>

int main (int argc, char *argv[])
{
    int nthreads, thread_id;
    int id, np;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int processor_name_len;

    MPI_Init(&argc, &argv);

    MPI_Comm_size(MPI_COMM_WORLD, &np);
    MPI_Comm_rank(MPI_COMM_WORLD, &id);
    MPI_Get_processor_name(processor_name, &processor_name_len);

    #pragma omp parallel private(nthreads, thread_id)
    {
        thread_id = omp_get_thread_num();
        printf("Hello World from thread %d, process %d of %d, hostname %s\n",
            thread_id, id, np, processor_name);

        if (thread_id == 0)
        {
            nthreads = omp_get_num_threads();
            printf("Thread %d on process %d of %d reports: nthreads = %d\n", 
                thread_id, id, np, nthreads);
        }
    }

    MPI_Finalize();
    return 0;
}


Download: ../code/hello-omp-mpi/hello_omp_mpi.c

The following batch script launches two MPI processes which will each run on their own node. We set the environment variable OMP_NUM_THREADS=8 to tell the OpenMP framework that there should be eight threads per process. The “–exclusive” flag lets the scheduler know that we will be using the entire nodes, and that no other jobs should run alongside ours. As usual, launching the executable with “srun” ensures that the MPI framework is used.

#!/bin/bash
#SBATCH --job-name=OMP_hello
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=1
#SBATCH --exclusive

export OMP_NUM_THREADS=8
srun ./hello_omp_mpi

Download: ../code/hello-omp-mpi/run.slurm

The first line below shows how to compile the program with the GNU compiler and the second line shows how to compile with the Intel compiler. Notice that mpicc is used for both but a different module must be loaded for the different compilers.

[araim1@maya-usr1 hello-omp-mpi]$ mpicc -fopenmp hello_omp_mpi.c -o hello_omp_mpi # For GNU Compiler
[araim1@maya-usr1 hello-omp-mpi]$ module swap mvapich2/gcc/4.8.1/1.9 mvapich2/intel/composer_xe_2013_sp1.1.106/1.9 
[araim1@maya-usr1 hello-omp-mpi]$ mpicc -openmp  hello_omp_mpi.c -o hello_omp_mpi # For Intel compiler
[araim1@maya-usr1 hello-omp-mpi]$ sbatch run.slurm 
Submitted batch job 1381632
[araim1@maya-usr1 hello-omp-mpi]$ cat slurm.err 
[araim1@maya-usr1 hello-omp-mpi]$ cat slurm.out 
Hello World from thread 7, process 0 of 2, hostname n3
Hello World from thread 4, process 0 of 2, hostname n3
Hello World from thread 2, process 0 of 2, hostname n3
Hello World from thread 3, process 0 of 2, hostname n3
Hello World from thread 6, process 0 of 2, hostname n3
Hello World from thread 5, process 0 of 2, hostname n3
Hello World from thread 1, process 0 of 2, hostname n3
Hello World from thread 0, process 1 of 2, hostname n4
Thread 0 on process 1 of 2 reports: nthreads = 8
Hello World from thread 5, process 1 of 2, hostname n4
Hello World from thread 6, process 1 of 2, hostname n4
Hello World from thread 7, process 1 of 2, hostname n4
Hello World from thread 3, process 1 of 2, hostname n4
Hello World from thread 4, process 1 of 2, hostname n4
Hello World from thread 1, process 1 of 2, hostname n4
Hello World from thread 2, process 1 of 2, hostname n4
Hello World from thread 0, process 0 of 2, hostname n3
Thread 0 on process 0 of 2 reports: nthreads = 8
[araim1@maya-usr1 hello-omp-mpi]$ 

More OpenMP programming