- Introduction
- Downloading the case study
- report_time program
- Generating the study
- Running the study
- Viewing a table of timings
- Creating LaTeX tables
- Timing tables by processes per node
Introduction
Suppose you are conducting a parallel performance study using your code. Typically, you will want to observe its performance in solving several different problems, as you vary the numbers of nodes and processes per node in use. For each run of your program, you’ll need a slightly different submission script. You’ll also need to make sure your runs are neatly organized; perhaps each run has its own directory. Then it’s necessary to run the study and collect the results into a table.
Managing a performance study can become tedious and also prone to error. On this page, we will show how to automate some repetitive tasks through shell scripting. These will include
- Creating the directory structure for the study
- Setting up each run directory so that it can be submitted to SLURM
- Displaying the timings as a table
- Displaying timing tables that can be quickly copied and pasted into LaTeX
The scripts on this page are written in Bash, but little knowledge of scripting should be required to get the example working. Based on your objectives and how your data is organized, the scripts shown here may not meet your specific needs. Hopefully they can be customized to your project, or at least give some ideas about what is possible.
Make sure you’ve read the tutorial for C programs first, to understand the basics of serial and parallel programming on maya.
Downloading the case study
For the remainder of this page, we will be using a case study to demonstrate the scripts. You can download it to maya using wget
[araim1@maya-usr1 ~]$ wget http://www.umbc.edu/hpcf/code/scripting-case-study/scripting-case-study.tar.gz --2011-06-25 11:36:12-- http://www.umbc.edu/hpcf/code/scripting-case-study/scripting-case-study.tar.gz Resolving www.umbc.edu... 130.85.12.11 Connecting to www.umbc.edu|130.85.12.11|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 3040 (3.0K) [application/x-tar] Saving to: `scripting-case-study.tar.gz' 100%[======================================>] 3,040 --.-K/s in 0s 2011-06-25 11:36:12 (108 MB/s) - `scripting-case-study.tar.gz' saved [3040/3040] [araim1@maya-usr1 ~]$
Then untar/unzip the file
[araim1@maya-usr1 ~]$ tar xvzf scripting-case-study.tar.gz scripting-case-study/ scripting-case-study/studies/ scripting-case-study/studies/ppn.tex scripting-case-study/studies/create-study.bash scripting-case-study/studies/get-summary-table.bash scripting-case-study/studies/get-ppn-table-latex.bash scripting-case-study/studies/get-summary-table-latex.bash scripting-case-study/studies/summary.tex scripting-case-study/studies/summary.pdf scripting-case-study/studies/get-ppn-table.bash scripting-case-study/studies/ppn.pdf scripting-case-study/src/ scripting-case-study/src/Makefile scripting-case-study/src/utilities.c scripting-case-study/src/utilities.h scripting-case-study/src/report_time.c [araim1@maya-usr1 ~]$ cd scripting-case-study/ [araim1@maya-usr1 scripting-case-study]$ ls src studies [araim1@maya-usr1 scripting-case-study]$
Under the “src” directory we have an example C program, and inside the “studies” directory are some scripts we will demonstrate.
report_time program
Our performance study will be based on the report_time program, which we will now describe. The program will take a single command line argument, the “problem size” N. In a real application, this might represent a the number of grid points in a mesh for example. Our program will pretend that it took N / p seconds to run (where p is the number of MPI processes), and and write a file called “diag_time.dat” with this elapsed time in the following format
[araim1@maya-usr1 ~]$ cat diag_time.dat 00:02:03 0.03 2.06 123.45 % HH:MM:SS=hours=minutes=seconds [araim1@maya-usr1 ~]$
Notice that there are four columns
- The time in HH:MM:SS format
- The time as a number of hours
- The time as a number of minutes
- The time as a number of seconds
We’ve found this file format with these four reported values to be useful. Of course, many other variations are possible.
In the interests of making a quick demonstration, the program does not actually take this long to run. It simply reports the time and exits. In a real performance study of course, your program will report an actual measured time.
We should also note that this program is a slight extension of hello_send_recv version 2 on the How to Compile page. This means that report_time is a full-fledged MPI program, although the requested processor cores are only used for sending a “hello” message.
Let’s compile the program to get our executable
[araim1@maya-usr1 scripting-case-study]$ cd src/ [araim1@maya-usr1 src]$ ls Makefile report_time.c utilities.c utilities.h [araim1@maya-usr1 src]$ make mpicc -g -O3 -c utilities.c -o utilities.o mpicc -g -O3 -c -o report_time.o report_time.c mpicc -g -O3 utilities.o report_time.o -o report_time -lm [araim1@maya-usr1 src]$ ls Makefile report_time report_time.c report_time.o utilities.c utilities.h utilities.o [araim1@maya-usr1 src]$
Generating the study
Let’s change to the “studies” directory
[araim1@maya-usr1 src]$ cd ../studies/ [araim1@maya-usr1 studies]$
Our goal will be to create the following directory structure
[araim1@maya-usr1 studies]$ ls study_* study_n01024: n001ppn1 n001ppn4 n002ppn1 n002ppn4 n004ppn1 n004ppn4 n008ppn1 n008ppn4 n016ppn1 n016ppn4 n032ppn1 n032ppn4 n001ppn2 n001ppn8 n002ppn2 n002ppn8 n004ppn2 n004ppn8 n008ppn2 n008ppn8 n016ppn2 n016ppn8 n032ppn2 n032ppn8 study_n02048: n001ppn1 n001ppn4 n002ppn1 n002ppn4 n004ppn1 n004ppn4 n008ppn1 n008ppn4 n016ppn1 n016ppn4 n032ppn1 n032ppn4 n001ppn2 n001ppn8 n002ppn2 n002ppn8 n004ppn2 n004ppn8 n008ppn2 n008ppn8 n016ppn2 n016ppn8 n032ppn2 n032ppn8 study_n04096: n001ppn1 n001ppn4 n002ppn1 n002ppn4 n004ppn1 n004ppn4 n008ppn1 n008ppn4 n016ppn1 n016ppn4 n032ppn1 n032ppn4 n001ppn2 n001ppn8 n002ppn2 n002ppn8 n004ppn2 n004ppn8 n008ppn2 n008ppn8 n016ppn2 n016ppn8 n032ppn2 n032ppn8 [araim1@maya-usr1 studies]$
The directory “study_n01024” represents problem size N = 1024, and so forth for study_n02048 and study_n04096. Within each study_nXXXXX directory we’ll have many subdirectories of the form “nNNNppnY”, which represent a run using NNN nodes, Y processes per node. Within each nNNNppnY, we’ll place a batch script and a symlink to our executable (to avoid copying it many times)
[araim1@maya-usr1 studies]$ ls -l study_n01024/n002ppn4/ total 20 lrwxrwxrwx 1 araim1 pi_nagaraj 44 Jun 25 11:01 report_time -> /home/araim1/scripting-case-study/src/report_time -rwxrwx--- 1 araim1 pi_nagaraj 282 Jun 25 11:01 run.slurm [araim1@maya-usr1 studies]$
The contents of the batch script should look something like this
[araim1@maya-usr1 studies]$ cat study_n01024/n002ppn4/run.slurm #!/bin/bash #SBATCH --job-name=test_study #SBATCH --output=slurm.out #SBATCH --error=slurm.err #SBATCH --partition=develop #SBATCH --nodes=2 #SBATCH --ntasks-per-node=4 srun ./report_time 1024 [araim1@maya-usr1 studies]$
To generate this structure, we can use the following script
#!/bin/bash EXECUTABLE='/home/araim1/scripting-case-study/src/report_time' # This function writes a SLURM script. We can call it with different parameter # settings to create different experiments function write_script { STUDY_NAME=$(printf 'study_n%05d' ${N}) DIR_NAME=$(printf '%s/n%03dppn%d' ${STUDY_NAME} ${NODES} ${NPERNODE}) if [ -d $DIR_NAME ] ; then echo "$DIR_NAME already exists, skipping..." return 0 else echo "Creating job $DIR_NAME" fi mkdir -p $DIR_NAME cat << _EOF_ > ${DIR_NAME}/run.slurm #!/bin/bash #SBATCH --job-name=test_study #SBATCH --output=slurm.out #SBATCH --error=slurm.err #SBATCH --partition=batch #SBATCH --nodes=${NODES} #SBATCH --ntasks-per-node=${NPERNODE} srun ./report_time ${N} _EOF_ chmod 775 ${DIR_NAME}/run.slurm ln -s ${EXECUTABLE} ${DIR_NAME}/ } # For each problem size, we'll run the experiment with 1, 2, 4, and 8 processors # on 1, 2, 4, ..., 32 nodes for N in 1024 2048 4096 do for NPERNODE in 1 2 4 8 do for NODES in 1 2 4 8 16 32 do write_script done done done
Download: ../code/scripting-case-study/studies/create-study.bash
The function write_script is responsible for creating each single job directory, setting up the symlink to report_time, and creating the batch script. The loop at the bottom determines which combinations of N, number of nodes, and number of processes per node will be used in the study. Make a special note of the EXECUTABLE variable at the top, you will need to change the path to your report_time executable. Now we can simply run create-study.bash to get our directory structure
[araim1@maya-usr1 studies]$ ./create-study.bash Creating job study_n01024/n001ppn1 Creating job study_n01024/n002ppn1 Creating job study_n01024/n004ppn1 Creating job study_n01024/n008ppn1 Creating job study_n01024/n016ppn1 Creating job study_n01024/n032ppn1 Creating job study_n01024/n001ppn2 Creating job study_n01024/n002ppn2 Creating job study_n01024/n004ppn2 ... Creating job study_n04096/n008ppn8 Creating job study_n04096/n016ppn8 Creating job study_n04096/n032ppn8 [araim1@maya-usr1 studies]$
Running the study
We can submit many batch scripts at once with scripting. Here we will demonstrate a Bash “for” loop directly on the command prompt
[araim1@maya-usr1 studies]$ for i in study_n*/n*ppn*; > do > cd $i; sbatch run.slurm; cd ../../; > done Submitted batch job 64989 Submitted batch job 64990 Submitted batch job 64991 ... [araim1@maya-usr1 studies]$
[araim1@maya-usr1 studies]$ for i in study_n01024/n032ppn8 study_n02048/n032ppn8 study_n04096/n032ppn8; > do > cd $i; sbatch run.slurm; cd ../../; > done Submitted batch job 64992 Submitted batch job 64993 Submitted batch job 64994 [araim1@maya-usr1 studies]$
Viewing a table of timings
Suppose we’ve run the test study above. We should now have a “diag_time.dat” file in each job directory.
[araim1@maya-usr1 studies]$ cat study_n01024/n001ppn1/diag_time.dat 00:17:04 0.28 17.07 1024.00 % HH:MM:SS=hours=minutes=seconds [araim1@maya-usr1 studies]$
Our goal is to create the following table
N = 1024 p=1 p=2 p=4 p=8 p=16 p=32 1 process per node 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 2 processes per node 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 00:00:16 4 processes per node 00:04:16 00:02:08 00:01:04 00:00:32 00:00:16 00:00:08 8 processes per node 00:02:08 00:01:04 00:00:32 00:00:16 00:00:08 00:00:04 N = 2048 p=1 p=2 p=4 p=8 p=16 p=32 1 process per node 00:34:08 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 2 processes per node 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 4 processes per node 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 00:00:16 8 processes per node 00:04:16 00:02:08 00:01:04 00:00:32 00:00:16 00:00:08 N = 4096 p=1 p=2 p=4 p=8 p=16 p=32 1 process per node 01:08:16 00:34:08 00:17:04 00:08:32 00:04:16 00:02:08 2 processes per node 00:34:08 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 4 processes per node 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 8 processes per node 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 00:00:16
To do this, we must first extract the appropriate column from each relevant diag_time.dat file. This can be accomplished using the gawk command
[araim1@maya-usr1 studies]$ cat study_n04096/n032ppn8/diag_time.dat 00:00:16 0.00 0.27 16.00 % HH:MM:SS=hours=minutes=seconds [araim1@maya-usr1 studies]$ gawk -F' ' '{ print $1 }' study_n04096/n032ppn8/diag_time.dat 00:00:16 [araim1@maya-usr1 studies]$ gawk -F' ' '{ print $2 }' study_n04096/n032ppn8/diag_time.dat 0.00 [araim1@maya-usr1 studies]$ gawk -F' ' '{ print $3 }' study_n04096/n032ppn8/diag_time.dat 0.27 [araim1@maya-usr1 studies]$ gawk -F' ' '{ print $4 }' study_n04096/n032ppn8/diag_time.dat 16.00 [araim1@maya-usr1 studies]$
We’ll choose the first column for this demonstration, to display HH:MM:SS format. We can now create the table by iterating through each study, and extracting / printing the times in the right order, using careful formatting. The following script accomplishes this
#!/bin/bash write_result() { N=$1 NODES=$2 NPN=$3 FILENAME=$(printf 'study_n%05d/n%03dppn%d/diag_time.dat' $N $NODES $NPN) if [ -f $FILENAME ] ; then RESULT=$(gawk -F' ' '{ print $1 }' $FILENAME 2>/dev/null) printf ' %8s ' $RESULT else # If the file does not exist, write out a '---' printf ' %8s ' '---' fi } write_header() { printf '%20s' '' for i in $@ { printf '%10s ' $i } printf '\n' } for N in 1024 2048 4096 do echo "N = $N" write_header 'p=1' 'p=2' 'p=4' 'p=8' 'p=16' 'p=32' for NPERNODE in 1 2 4 8 do if [ $NPERNODE -eq 1 ] ; then printf '%d process per node' $NPERNODE else printf '%d processes per node' $NPERNODE fi for NODES in 1 2 4 8 16 32 do write_result $N $NODES $NPERNODE done printf '\n' done printf '\n' done
Download: ../code/scripting-case-study/studies/get-summary-table.bash
The function write_result executes our gawk script, but prints a “—” if the diag_time.dat file does not exist. The loop at the bottom ensures that the timings are printed in order, and with the correct formatting, so that we get our table. Running this script from within the “studies” directrory will yield the table.
[araim1@maya-usr1 studies]$ ./get-summary-table.bash N = 1024 p=1 p=2 p=4 p=8 p=16 p=32 1 process per node 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 2 processes per node 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 00:00:16 4 processes per node 00:04:16 00:02:08 00:01:04 00:00:32 00:00:16 00:00:08 8 processes per node 00:02:08 00:01:04 00:00:32 00:00:16 00:00:08 00:00:04 N = 2048 p=1 p=2 p=4 p=8 p=16 p=32 1 process per node 00:34:08 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 2 processes per node 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 4 processes per node 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 00:00:16 8 processes per node 00:04:16 00:02:08 00:01:04 00:00:32 00:00:16 00:00:08 N = 4096 p=1 p=2 p=4 p=8 p=16 p=32 1 process per node 01:08:16 00:34:08 00:17:04 00:08:32 00:04:16 00:02:08 2 processes per node 00:34:08 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 4 processes per node 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 8 processes per node 00:08:32 00:04:16 00:02:08 00:01:04 --- --- [araim1@maya-usr1 studies]$
The last two cases of N = 4096 were not run in this example, to demonstrate the “—” feature.
Creating LaTeX tables
The script above can easily be modified to generate a LaTeX-ready table. That is, using “&” as the column separator and “\\” for newlines. It can be tedious to enter these manually, but it’s easy to modify our script to include them. See the file get-summary-table-latex.bash in the tar.gz. Here is an example output
[araim1@maya-usr1 studies]$ ./get-summary-table-latex.bash N = 1024 p=1 p=2 p=4 p=8 p=16 p=32 1 process per node& 00:17:04 & 00:08:32 & 00:04:16 & 00:02:08 & 00:01:04 & 00:00:32 \\ 2 processes per node& 00:08:32 & 00:04:16 & 00:02:08 & 00:01:04 & 00:00:32 & 00:00:16 \\ 4 processes per node& 00:04:16 & 00:02:08 & 00:01:04 & 00:00:32 & 00:00:16 & 00:00:08 \\ 8 processes per node& 00:02:08 & 00:01:04 & 00:00:32 & 00:00:16 & 00:00:08 & 00:00:04 \\ N = 2048 p=1 p=2 p=4 p=8 p=16 p=32 1 process per node& 00:34:08 & 00:17:04 & 00:08:32 & 00:04:16 & 00:02:08 & 00:01:04 \\ 2 processes per node& 00:17:04 & 00:08:32 & 00:04:16 & 00:02:08 & 00:01:04 & 00:00:32 \\ 4 processes per node& 00:08:32 & 00:04:16 & 00:02:08 & 00:01:04 & 00:00:32 & 00:00:16 \\ 8 processes per node& 00:04:16 & 00:02:08 & 00:01:04 & 00:00:32 & 00:00:16 & 00:00:08 \\ N = 4096 p=1 p=2 p=4 p=8 p=16 p=32 1 process per node& 01:08:16 & 00:34:08 & 00:17:04 & 00:08:32 & 00:04:16 & 00:02:08 \\ 2 processes per node& 00:34:08 & 00:17:04 & 00:08:32 & 00:04:16 & 00:02:08 & 00:01:04 \\ 4 processes per node& 00:17:04 & 00:08:32 & 00:04:16 & 00:02:08 & 00:01:04 & 00:00:32 \\ 8 processes per node& 00:08:32 & 00:04:16 & 00:02:08 & 00:01:04 & --- & --- \\ [araim1@maya-usr1 studies]$
We can easily convert this into a LaTeX table. See the file summary.tex included in the tar.gz.
[araim1@maya-usr1 studies]$ pdflatex summary.tex This is pdfeTeX, Version 3.141592-1.21a-2.2 (Web2C 7.5.4) ... Output written on summary.pdf (1 page, 26486 bytes). Transcript written on summary.log. [araim1@maya-usr1 studies]$
This produces the output summary.pdf
It is often necessary to recreate these tables more than once when preparing a report. For example, after an initial draft we might realize that there is a bug, or maybe an opportunity for much better performance. We can create new LaTeX tables with minimal work now: simply rerun “get-summary-table-latex.bash” on the current output, and copy/paste the relevant lines into the tex file.
Timing tables by processes per node
Another commonly used table type is timing for 1, 2, 4, or 8 processes per node. For example:
Results for npn = 1 p=1 p=2 p=4 p=8 p=16 p=32 p=64 p=128 p=256 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 --- --- 00:34:08 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 --- --- 01:08:16 00:34:08 00:17:04 00:08:32 00:04:16 00:02:08 --- --- Results for npn = 2 p=1 p=2 p=4 p=8 p=16 p=32 p=64 p=128 p=256 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 00:00:16 --- --- 00:34:08 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 --- --- 01:08:16 00:34:08 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 --- --- Results for npn = 4 p=1 p=2 p=4 p=8 p=16 p=32 p=64 p=128 p=256 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 00:00:16 00:00:08 --- 00:34:08 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 00:00:16 --- 01:08:16 00:34:08 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 --- Results for npn = 8 p=1 p=2 p=4 p=8 p=16 p=32 p=64 p=128 p=256 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 00:00:16 00:00:08 00:00:04 00:34:08 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 00:00:32 00:00:16 00:00:08 01:08:16 00:34:08 00:17:04 00:08:32 00:04:16 00:02:08 00:01:04 --- ---
The scripts to generate these tables are very similar in nature to the ones from the previous sections. In the tar.gz file, see
- get-ppn-table.bash
- get-ppn-table-latex.bash includes delimiters for LaTeX
- ppn.tex an example LaTeX file
- ppn.pdf PDF generated from “pdflatex ppn.tex”