How to run MATLAB programs on maya

Introduction

Running MATLAB on HPC’s cluster nodes is similar to running any other serial job. Make sure you’ve read the tutorial for C programs first, to understand the basics. We will not demonstrate any parallel code here, so reading just the serial section is okay for now. A basic introduction to running MATLAB in the computer labs at UMBC is available on the CIRC webpage here.

For more information about the software, see the MATLAB website.

The current version of MATLAB should be loaded as part of the default environment. In case of difficulties with that, run again manually the command

[araim1@maya-usr1 ~]$ module load default-environment

Running interactive jobs on compute nodes

It is possible to use a compute node interactively. This may be accomplished by issuing the following command on the front end node, take notice of the ‘–mem’ option which specifies total memory usage in MB. This can be increased if MATLAB requires more total memory, but doing so might affect how long your jobs wait in the SLURM queue:

[jongraf1@maya-usr1 ~]$ srun --partition=batch -N1 -n1 --pty --preserve-env --mem=6G $SHELL
salloc: Granted job allocation 102560

We can now launch MATLAB, or other interactive tools, and the computations will run on node n84 within our job. When finished simply type ‘exit’ to relinquish the node.

[araim1@n84 ~]$ matlab
...
...
[araim1@n84 ~]$ exit
salloc: Relinquishing job allocation 102560

Interactively Performing Calculations on the Cluster Nodes

Let’s try to run this sample MATLAB program, given below


Download: ../code/matrixmultiply-matlab/matrixmultiply.m

To run this job interactively on a compute node, run the command

[jongraf1@maya-usr1 ~]$ srun --partition=batch --pty --preserve-env --mem=5000 $SHELL
...
[jongraf1@n119 ~]$ 

Notice that the prompt has changed to indicate that we are using a specific node, for example here, n119, in the batch queue. A call to the squeue command shows that we are indeed inside of the SLURM job. We can now launch MATLAB interactively and the computations will run on node n119 within our job.

To make use of the parallel toolbox one should allocate the entire node:

[jongraf1@maya-usr1 ~]$ srun --partition=batch --pty --preserve-env --mem=5000 --exclusive $SHELL
...
[jongraf1@n109 ~]$

Note: To run for less than 15 minutes, you should use the develop partition using the flags “–partition=develop”. It is also possible to add other flags like ” –qos=long –time=4:00:00″ before $SHELL to request a session with other particular constraints on the system. The time can be changed by updating the –time flag to your best estimate of how long you will be needing the interactive session. You should not simply begin an interactive session and not uses that session. To close an interactive session simply type ‘exit’.

To start Matlab interactively simply type matlab on the command line. Then the matrixmultiply file can then be run by entering matrixmultiply in the command window. This will generate an out.mat file that contains the AB and sumAB variables. Logging out of the shell will end the job. Note that the usual limitations on memory, time usage, etc are in effect for interactive jobs. For more details on running jobs interactively on compute nodes see Interacting with Compute Nodes.

[khsa1@maya-usr2 matrixmultiply-matlab]$ srun --pty --preserve-env --mem=5000 $SHELL                                          
salloc: Granted job allocation 42023
srun: Job step created
[khsa1@n70 matrixmultiply-matlab]$ matlab

                                                        < M A T L A B (R) >
                                              Copyright 1984-2014 The MathWorks, Inc.
                                                R2014a (8.3.0.532) 64-bit (glnxa64)
                                                         February 11, 2014

    ----------------------------------------------------
        Your MATLAB license will expire in 60 days.
        Please contact your system administrator or
        MathWorks to renew this license.
    ----------------------------------------------------

To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.

>> matrixmultiply
>> ls
matrixmultiply.m  out.mat

>> load out.mat
>> exit
[khsa1@n70 matrixmultiply-matlab]$ exit
exit
salloc: Relinquishing job allocation 42023

Performing Calculations on the Cluster Nodes using Slurm

We will again use the code matrixmultiply in this example. As always with slurm, we will need a batch script.



Download: ../code/matrixmultiply-matlab/run.slurm

Note that by default, we’ve requested a single core of one node for our job. This will help to yield the best throughput of MATLAB jobs on the cluster. See the technical report HPCF-2009-1 (Sharma & Gobbert) on the publications page for more details. We can run our batch script in the usual way

[araim1@maya-usr1 matrixmultiply-matlab]$ sbatch run.slurm
sbatch: Submitted batch job 2621
[araim1@maya-usr1 matrixmultiply-matlab]$

After your job completes, you should see an out.mat MATLAB save file in your directory. Later on, if you want to get the data out of that file, you can use the the load command in MATLAB:

>> load out.mat

which will load in the AB and sumAB variables that you saved using your save command. Also in your directory, there should also be slurm.out and slurm.err files. The slurm.err file should be empty and the slurm.out file should contain something like this

[araim1@maya-usr1 matrixmultiply-matlab]$ cat slurm.out
                            < M A T L A B (R) >
                  Copyright 1984-2008 The MathWorks, Inc.
                         Version 7.6.0.324 (R2008a)
                             February 10, 2008


  To get started, type one of these: helpwin, helpdesk, or demo.
  For product information, visit www.mathworks.com.

[araim1@maya-usr1 matrixmultiply-matlab]$

You should be able to use any of the usual non-graphical MATLAB functionality if you follow the directions in this section. If you want to generate graphics in your MATLAB jobs, continue to the next section.

Generating Plots on the Cluster Nodes

As with all cluster jobs, you will need a batch script in order to run MATLAB


Download: ../code/plotsine-matlab/run.slurm

Now you’ll need the plotsine.m file that the script tries to run


Download: ../code/plotsine-matlab/plotsine.m

Now submit the batch script and wait for it to finish. After it finishes, you should see the following files

[araim1@maya-usr1 plotsine-matlab]$ ls
run.slurm      plotsine.m  sine.eps        sine.jpeg     
sine.png       slurm.err   slurm.out
[araim1@maya-usr1 plotsine-matlab]$ 

The sine.eps, sine.jpeg and sine.png files contain a plot of sin(x) from x=0..2*pi. The files are encapsulated postscript (.eps), joint photographic experts group (.jpeg) and portable network graphics files (.png), respectively. The slurm.err file should be empty and the slurm.out file should contain the same text as in the previous section. The three images you made should look something like this

PNG sine plot

The encapsulated postscript file (sine.eps) will be in greyscale since I used -deps instead of -depsc. Here are links to the three output files if you want to download them

Checking memory in Matlab programs

On the How to check memory usage page, we discuss various ways of monitoring memory usage, including logging it directly from your C code. In Matlab, there doesn’t seem to be a built-in way to do this (not in the Linux version at least). But with a small amount of work, we can add the capability ourselves.

First grab the following C files, which are also used in How to check memory usage

The following C code is written in a specific form which Matlab can interface to. It will call the get_memory_usage_kb function defined in the C files above. The code retrieves the VmRSS and VmSize quantities for the current process (see How to check memory usage for more information), and returns them as a pair to Matlab.

Download: ../code/check_memory-matlab/getmemusage.c

To compile this code, we need to use the Matlab MEX compiler. This is already installed on the cluster. We can use the following slurm script to compile our code.

Download: ../code/check_memory-matlab/run.slurm

If the compilation succeeds, the file getmemusage.mexa64 is created.

[araim1@maya-usr1 check_memory-matlab]$ sbatch run.slurm
Submitted batch job 22421
[araim1@maya-usr1 check_memory-matlab]$ ls
getmemusage.c  getmemusage.mexa64  memory.c  memory.h run.slurm  slurm.err  slurm.out
[araim1@maya-usr1 check_memory-matlab]$

Now we can start up Matlab and call our new getmemusage function just like any usual function

[araim1@maya-usr1 check_memory-matlab]$ matlab -nodisplay

                                                   < M A T L A B (R) >
                                         Copyright 1984-2009 The MathWorks, Inc.
                                       Version 7.9.0.529 (R2009b) 64-bit (glnxa64)
                                                     August 12, 2009

 
  To get started, type one of these: helpwin, helpdesk, or demo.
  For product information, visit www.mathworks.com.
 
>> [vmrss, vmsize] = getmemusage

vmrss =

      101140


vmsize =

      933132
>> A = rand(5000, 5000);
>> [vmrss, vmsize] = getmemusage 

vmrss =

      298572


vmsize =

     1128448

>> 

Note that this approach has a few limitations. It can only keep track of memory used in the current process. Matlab may invoke external processes for some tasks, whose memory usage will not be counted by this method.

Parallel Programming

Access to the Parallel Computing Toolbox is now available on maya. This allows simple multicore programming, however it is limited to single node jobs. There are several programming constructs available in the Parallel Computing Toolbox:

  • Parallel for provides a simple way to parallelize “for” loops.
  • spmd provides a “single process multiple data” programming model, which can be thought of as a simplified MPI
  • Support for distributed data structures

We will provide a simple example below. Detailed documentation on the use of the Parallel Computing Toolbox is available from MathWorks. Consider the following program, which is a multicore Hello World.

Download: ../code/matlab-pct-hello/driver.m

The code starts up a parallel pool with 8 workers on the local machine and creates a parallel.Pool object which we call poolobj. Within the “spmd” block, the string “msg” is built on each process. Notice the special variables “labindex” (ID for parallel worker) and numlabs (number of parallel workers).

After the “spmd” block, the “msg” data is available as a data structure that can be manipulated in serial. Here it consists of eight strings; we loop through and print each one. Notice here that the number of workers can be accessed by “poolobj.NumWorkers”.

Download: ../code/matlab-pct-hello/run.slurm

A run of the code is shown below.

[araim1@maya-usr1 matlab-pct-hello]$ sbatch run.slurm
[araim1@maya-usr1 matlab-pct-hello]$ cat slurm.err
[araim1@maya-usr1 matlab-pct-hello]$ cat slurm.out

                            < M A T L A B (R) >
                  Copyright 1984-2014 The MathWorks, Inc.
                    R2014a (8.3.0.532) 64-bit (glnxa64)
                             February 11, 2014

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
Starting parallel pool (parpool) using the 'local' profile ... connected to 8 workers.
Hello world from process 1 of 8
Hello world from process 2 of 8
Hello world from process 3 of 8
Hello world from process 4 of 8
Hello world from process 5 of 8
Hello world from process 6 of 8
Hello world from process 7 of 8
Hello world from process 8 of 8
Parallel pool using the 'local' profile is shutting down.


[araim1@maya-usr1 matlab-pct-hello]$

Next is a simple example of “parallel for”. We can replace simple loops with parallel loops with minimal programming effort.

poolobj = parpool(8);
x = zeros(1, 40);
parfor i = 1:40
  x(i) = i;
end
delete(poolobj);

x

Download: ../code/matlab-pct-parfor/driver.m

#!/bin/bash
#SBATCH --job-name=matlab-parallel-toolkit
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --ntasks-per-node=8
#SBATCH --mem=5000

matlab -nodisplay -r "driver, exit"

Download: ../code/matlab-pct-parfor/run.slurm

[araim1@maya-usr1 matlab-pct-parfor]$ sbatch run.slurm
[araim1@maya-usr1 matlab-pct-parfor]$ cat slurm.err
[araim1@maya-usr1 matlab-pct-parfor]$ cat slurm.out

                            < M A T L A B (R) >
                  Copyright 1984-2014 The MathWorks, Inc.
                    R2014a (8.3.0.532) 64-bit (glnxa64)
                             February 11, 2014

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
Starting parallel pool (parpool) using the 'local' profile ... connected to 8 workers.
Parallel pool using the 'local' profile is shutting down.

x =

  Columns 1 through 13

     1     2     3     4     5     6     7     8     9    10    11    12    13

  Columns 14 through 26

    14    15    16    17    18    19    20    21    22    23    24    25    26

  Columns 27 through 39

    27    28    29    30    31    32    33    34    35    36    37    38    39

  Column 40

    40

[araim1@maya-usr1 matlab-pct-parfor]$

GPU Computing with MATLAB

Access to GPU through MATLAB programming is now available on maya through the CUDA modules. To follow along, ensure you are logged into maya-usr1 and have the CUDA modules loaded.

[hu6@maya-usr1 ~]$ module list
Currently Loaded Modulefiles:
  1) cuda60/toolkit/6.0.37   3) gcc/4.8.2
  2) matlab/r2014a           4) slurm/14.03.6

While providing greatly increased throughput, GPU programming has additional cost. Data must be sent from the CPU to the GPU before calculation and then retrieved from it afterwards. This memory access pattern resulted in only a subset of applications or algorithms are suitable for speedup via GPU Computing. Generally, your program should satisfy the following criteria:

  • Computationally intensive Heavy computation can be done on the GPU with few data transfer.
  • Massively parallel Similar task is performed repeatedly on different data.

We will provide a simple example below. This example setups two matrices in GPU memory, multiply them, then copy the result back to CPU memory and display.

gpuDeviceCount

gpuDevice

A = ones(10, 'single', 'gpuArray');
B = 5 .* eye(10, 'single', 'gpuArray');
C = A * B;
C_host = gather(C);

C_host

Download: ../code/matlab-gpu/driver_gpu.m

The slurm file to submit this example is below:

#!/bin/bash
#SBATCH --job-name=matlab-gpu
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=batch
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu
#SBATCH --mem=5000

matlab -nodisplay -r "driver_gpu, exit"

Download: ../code/matlab-gpu/run_gpu.slurm

A run of the code is shown below.

[hu6@maya-usr1 sec6_gpu]$ cat slurm.out 

                            < M A T L A B (R) >
                  Copyright 1984-2014 The MathWorks, Inc.
                    R2014a (8.3.0.532) 64-bit (glnxa64)
                             February 11, 2014

 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
ans =

     1


ans = 

  CUDADevice with properties:

                      Name: 'Tesla K20m'
                     Index: 1
         ComputeCapability: '3.5'
            SupportsDouble: 1
             DriverVersion: 6
            ToolkitVersion: 5.5000
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 5.0327e+09
                FreeMemory: 4.9211e+09
       MultiprocessorCount: 13
              ClockRateKHz: 705500
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 0
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1


C_host =

     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5
     5     5     5     5     5     5     5     5     5     5

The command gpuDeviceCount will return the number of GPUs you have access now, my slurm file requested one node and one GPU, but you can request two GPU by having gres=gpu:2.

The command gpuDevice returns the property of the GPU, including type, the max block and thread size supported, GPU memory, etc.

In the computation part, I setup matrix A with all enties equal to 1, setup matrix B as a diagonal matrix with diagonal entries all equal to 5. The two matrices are already in GPU memory since I specified gpuArray at the beginning, but you can also copy existing data from CPU to GPU. One thing to notice is matrix C is calculated on GPU and therefore stays in GPU memory, one has to copy it to CPU memory by calling gather(), to display or use in other code that may execute on CPU.

Detailed documentation on GPU programming with MATLAB is available from GPU Computing.