How to run MapReduce-MPI on taki

Note: This page has not been updated to reflect the move to taki from maya.

Introduction

MapReduce-MPI (MR-MPI) is a library written at Sandia National Lab which implements the MapReduce parallel programming paradigm on an MPI system. MapReduce was popularized by Google. Its most widely used implementation is probably the open source Java library called Hadoop. On this page we will show how to get started with MR-MPI on maya.

Currently MR-MPI is built only for the “GCC + OpenMPI 1.3.3” switcher combination, so we must first change to that setting. We also define an environment variable MRMPI_HOME for convenience.

[araim1@maya-usr1 ~]$ switcher mpi = gcc-openmpi-1.3.3-p1
[araim1@maya-usr1 ~]$ switcher_reload
[araim1@maya-usr1 ~]$ export MRMPI_HOME=/usr/cluster/contrib/gcc-openmpi-1.3.3/mrmpi-20Jun11/

We can now compile and run the examples that come with MR-MPI. Let’s focus on the “wordfreq” example.

[araim1@maya-usr1 ~]$ cp /examples ~/
[araim1@maya-usr1 ~]$ cd ~/examples
[araim1@maya-usr1 examples]$ make
[araim1@maya-usr1 examples]$ ls -l wordfreq
-rwxrwx--- 1 araim1 contrib 375462 Oct  5 19:36 wordfreq

We can use a standard batch script. However, note that “mpirun” must be used to launch programs which are compiled to use OpenMPI.

#!/bin/bash
#SBATCH --job-name=mrmpi
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

mpirun ./wordfreq $MRMPI_HOME/doc/*.txt

Download: ../code/mrmpi-wordfreq/run.slurm

In the example above, we count word frequencies in the MRMPI documentation. Running this script yields the following.

[araim1@maya-usr1 examples]$ sbatch run.slurm 
Submitted batch job 447272
[araim1@maya-usr1 examples]$ cat slurm.err
[araim1@maya-usr1 examples]$ cat slurm.out
Map time (secs) = 0.0131121
Map KV =   KV pairs:   24277 ave 24277 max 24277 min
  Histogram:  1 0 0 0 0 0 0 0 0 0
  Kdata (Mb): 0.149896 ave 0.149896 max 0.149896 min
  Histogram:  1 0 0 0 0 0 0 0 0 0
  Vdata (Mb): 0 ave 0 max 0 min
  Histogram:  1 0 0 0 0 0 0 0 0 0
Collate time (secs) = 0.0173779
Collate KMV =   KMV pairs:  3568 ave 3568 max 3568 min
  Histogram:  1 0 0 0 0 0 0 0 0 0
  Kdata (Mb): 0.0345516 ave 0.0345516 max 0.0345516 min
  Histogram:  1 0 0 0 0 0 0 0 0 0
  Vdata (Mb): 0 ave 0 max 0 min
  Histogram:  1 0 0 0 0 0 0 0 0 0
...
[araim1@maya-usr1 examples]$

New MR-MPI programs can be written by using the header files in $MRMPI_HOME/include and linking to the library “libmrmpi_mpicc.a” in $MRMPI_HOME/lib. See the documentation page for MR-MPI for more details.