Introduction
MapReduce-MPI (MR-MPI) is a library written at Sandia National Lab which implements the MapReduce parallel programming paradigm on an MPI system. MapReduce was popularized by Google. Its most widely used implementation is probably the open source Java library called Hadoop. On this page we will show how to get started with MR-MPI on maya.
Currently MR-MPI is built only for the “GCC + OpenMPI 1.3.3” switcher combination, so we must first change to that setting. We also define an environment variable MRMPI_HOME for convenience.
[araim1@maya-usr1 ~]$ switcher mpi = gcc-openmpi-1.3.3-p1 [araim1@maya-usr1 ~]$ switcher_reload [araim1@maya-usr1 ~]$ export MRMPI_HOME=/usr/cluster/contrib/gcc-openmpi-1.3.3/mrmpi-20Jun11/
We can now compile and run the examples that come with MR-MPI. Let’s focus on the “wordfreq” example.
[araim1@maya-usr1 ~]$ cp /examples ~/ [araim1@maya-usr1 ~]$ cd ~/examples [araim1@maya-usr1 examples]$ make [araim1@maya-usr1 examples]$ ls -l wordfreq -rwxrwx--- 1 araim1 contrib 375462 Oct 5 19:36 wordfreq
We can use a standard batch script. However, note that “mpirun” must be used to launch programs which are compiled to use OpenMPI.
#!/bin/bash #SBATCH --job-name=mrmpi #SBATCH --output=slurm.out #SBATCH --error=slurm.err #SBATCH --partition=develop #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 mpirun ./wordfreq $MRMPI_HOME/doc/*.txt
Download: ../code/mrmpi-wordfreq/run.slurm
In the example above, we count word frequencies in the MRMPI documentation. Running this script yields the following.
[araim1@maya-usr1 examples]$ sbatch run.slurm Submitted batch job 447272 [araim1@maya-usr1 examples]$ cat slurm.err [araim1@maya-usr1 examples]$ cat slurm.out Map time (secs) = 0.0131121 Map KV = KV pairs: 24277 ave 24277 max 24277 min Histogram: 1 0 0 0 0 0 0 0 0 0 Kdata (Mb): 0.149896 ave 0.149896 max 0.149896 min Histogram: 1 0 0 0 0 0 0 0 0 0 Vdata (Mb): 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 Collate time (secs) = 0.0173779 Collate KMV = KMV pairs: 3568 ave 3568 max 3568 min Histogram: 1 0 0 0 0 0 0 0 0 0 Kdata (Mb): 0.0345516 ave 0.0345516 max 0.0345516 min Histogram: 1 0 0 0 0 0 0 0 0 0 Vdata (Mb): 0 ave 0 max 0 min Histogram: 1 0 0 0 0 0 0 0 0 0 ... [araim1@maya-usr1 examples]$
New MR-MPI programs can be written by using the header files in $MRMPI_HOME/include and linking to the library “libmrmpi_mpicc.a” in $MRMPI_HOME/lib. See the documentation page for MR-MPI for more details.