Checking memory usage

In just about any computing activity, it’s important ensure that your programs are using memory efficiently. This is especially crucial in high performance computing, where your problem may be so large that it won’t fit on a single machine, or even a few machines. In this page, we’ll have a look at how to monitor memory usage on the cluster.

Checking the top command

The easiest way to check the memory usage of a running process is to use the interactive “top” command. At the command line, try running

[araim1@maya-usr1 ~]$ top

You’ll probably get a long list of processes as below, most of which you aren’t interested in. You’ll also see some interesting numbers like free memory, swap space used, and percent CPU currently utilized. Each process has several memory statistics shown. The most conservative one is VIRT, which includes code, data, and virtual memory. The one that probably reflects our actual usage the most is RES, which only includes code and data. These two values together give us a good idea of our usage. The top display automatically updates itself every few seconds. For more information, see the top manual page (“man top”).

top - 01:19:53 up 79 days, 12:52,  4 users,  load average: 0.00, 0.00, 0.00
Tasks: 232 total,   1 running, 230 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni, 99.5%id,  0.4%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  49433548k total, 35805616k used, 13627932k free,   747232k buffers
Swap:  8385888k total,        0k used,  8385888k free, 32890968k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
28336 araim1    15   0 10976 1148  776 R  0.3  0.0   0:00.02 top
    1 root      15   0 10348  696  584 S  0.0  0.0   0:01.45 init
    2 root      RT  -5     0    0    0 S  0.0  0.0   0:01.91 migration/0
    3 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
    4 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
    5 root      RT  -5     0    0    0 S  0.0  0.0   0:01.39 migration/1
    6 root      34  19     0    0    0 S  0.0  0.0   0:00.05 ksoftirqd/1
    7 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/1
    8 root      RT  -5     0    0    0 S  0.0  0.0   0:01.04 migration/2
    9 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/2
   10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/2
   11 root      RT  -5     0    0    0 S  0.0  0.0   0:01.66 migration/3
   12 root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/3
   13 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/3
   14 root      RT  -5     0    0    0 S  0.0  0.0   0:07.36 migration/4
   15 root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/4
   16 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/4
   17 root      RT  -5     0    0    0 S  0.0  0.0   0:00.50 migration/5
   18 root      34  19     0    0    0 S  0.0  0.0   0:00.15 ksoftirqd/5
   19 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/5
   20 root      RT  -5     0    0    0 S  0.0  0.0   0:00.12 migration/6
   21 root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/6

We can narrow the list down to just our processes. Type “u”, then your username, then enter. You’ll get a shorter list that looks something like this:

top - 01:30:57 up 79 days, 13:03,  4 users,  load average: 0.00, 0.00, 0.00
Tasks: 232 total,   1 running, 230 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  49433548k total, 35805656k used, 13627892k free,   747232k buffers
Swap:  8385888k total,        0k used,  8385888k free, 32891308k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                              
28336 araim1    15   0 10976 1152  780 R  0.3  0.0   0:01.47 top                                                  
25694 araim1    15   0 99196 1768  976 S  0.0  0.0   0:02.60 sshd                                                 
25695 araim1    15   0 66260 3672 1184 S  0.0  0.0   0:01.67 bash

One more useful thing we’ll mention here – it’s possible to toggle the status of individual CPU cores on/off by typing “1”

top - 01:32:09 up 79 days, 13:05,  4 users,  load average: 0.00, 0.00, 0.00
Tasks: 232 total,   1 running, 230 sleeping,   0 stopped,   1 zombie
Cpu0  :  0.0%us,  0.1%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.1%us,  0.1%sy,  0.0%ni, 99.5%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu2  :  0.1%us,  0.1%sy,  0.0%ni, 99.5%id,  0.4%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu3  :  0.2%us,  0.1%sy,  0.0%ni, 99.5%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu4  :  0.1%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu5  :  0.1%us,  0.3%sy,  0.0%ni, 99.6%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu6  :  0.0%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu7  :  0.1%us,  0.0%sy,  0.0%ni, 99.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  49433548k total, 35805832k used, 13627716k free,   747232k buffers
Swap:  8385888k total,        0k used,  8385888k free, 32891340k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                              
25694 araim1    15   0 99196 1768  976 S  0.0  0.0   0:02.60 sshd                                                 
25695 araim1    15   0 66260 3672 1184 S  0.0  0.0   0:01.67 bash                                                 
28336 araim1    15   0 10976 1152  780 R  0.0  0.0   0:01.61 top 

The issue with top is that it’s interactive. When we’re running our high performance parallel code, we may want to log the memory usage at some very specific times. For example, when we finish allocating a large data structure. We’d prefer not to have to watch the top command and track things manually.

Checking the proc filesystem

Let’s take one step in the direction of automating memory checking. To do this, we’ll use the proc filesystem. This is a special filesystem on Unix machines which contains information about the system. We’ll try a few commands to get a feel for it. Here is information about the CPU cores on the front end node. This is fairly static information which we do not expect to change much.

[araim1@maya-usr1 ~]$ cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 26
model name	: Intel(R) Xeon(R) CPU           X5550  @ 2.67GHz
stepping	: 5
cpu MHz		: 1596.000
cache size	: 8192 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 11
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 
	clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc ida 
	nonstop_tsc pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips	: 5333.69
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management: [8]

... (7 other cores are displayed as well) ...

We can also see top level memory information, such as how much memory is free and how much swap space is being used. This information is more dynamic, and is changing constantly.

[araim1@maya-usr1 ~]$ cat /proc/meminfo 
MemTotal:     49433548 kB
MemFree:      13626952 kB
Buffers:        747232 kB
Cached:       32891628 kB
SwapCached:          0 kB
Active:        4659696 kB
Inactive:     29089212 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:     49433548 kB
LowFree:      13626952 kB
SwapTotal:     8385888 kB
SwapFree:      8385888 kB
Dirty:             136 kB
Writeback:           0 kB
AnonPages:      109404 kB
Mapped:          18788 kB
Slab:          1922440 kB
PageTables:      11608 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  33102660 kB
Committed_AS:   305584 kB
VmallocTotal: 34359738367 kB
VmallocUsed:    270516 kB
VmallocChunk: 34359467755 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

We can also check the memory usage of a specific process. Try “cat /proc/<PID>/status” to get information about a process with a given PID (process ID). We can check “cat /proc/self/status” to the get information about the current process.

[araim1@maya-usr1 check_memory_parallel]$ cat /proc/self/status
Name:	cat
State:	R (running)
SleepAVG:	88%
Tgid:	28665
Pid:	28665
PPid:	25695
TracerPid:	0
Uid:	28398	28398	28398	28398
Gid:	1057	1057	1057	1057
FDSize:	256
Groups:	700 701 1057 32296 1104637136 
VmPeak:	   58904 kB
VmSize:	   58904 kB
VmLck:	       0 kB
VmHWM:	     476 kB
VmRSS:	     476 kB
VmData:	     164 kB
VmStk:	      84 kB
VmExe:	      20 kB
VmLib:	    1444 kB
VmPTE:	      40 kB
StaBrk:	04eed000 kB
Brk:	04f0e000 kB
StaStk:	7fff9cdf3b70 kB
Threads:	1
SigQ:	0/409600
SigPnd:	0000000000000000
ShdPnd:	0000000000000000
SigBlk:	0000000000000000
SigIgn:	0000000000000000
SigCgt:	0000000000000000
CapInh:	0000000000000000
CapPrm:	0000000000000000
CapEff:	0000000000000000
Cpus_allowed:	00000000,00000000,00000000,00000000,00000000,00000000,00000000,0000ffff
Mems_allowed:	00000000,00000003

Notice that we’re getting information about the “cat” command, which is the “self” when we run “cat /proc/self/status” directly from the command line. Earlier when we ran the top command, we looked at the VIRT and RES columns. From the display above, we can get the same information from the VmSize and VmRSS fields, respectively.

[araim1@maya-usr1 ~]$ cat /proc/self/status | egrep 'VmSize|VmRSS'
VmSize:	   58908 kB
VmRSS:	     468 kB

Next we will show how to gather this information from a C function.

Checking memory from a serial C program

The following function reads the file “/proc/self/status”, and parses out the numbers in the VmSize and VmRSS fields. Now “self” will refer to the C program that’s invoking this function.

#include "memory.h"

/*
* Look for lines in the procfile contents like: 
* VmRSS:         5560 kB
* VmSize:         5560 kB
*
* Grab the number between the whitespace and the "kB"
* If 1 is returned in the end, there was a serious problem 
* (we could not find one of the memory usages)
*/
int get_memory_usage_kb(long* vmrss_kb, long* vmsize_kb)
{
    /* Get the the current process' status file from the proc filesystem */
    FILE* procfile = fopen("/proc/self/status", "r");

    long to_read = 8192;
    char buffer[to_read];
    int read = fread(buffer, sizeof(char), to_read, procfile);
    fclose(procfile);

    short found_vmrss = 0;
    short found_vmsize = 0;
    char* search_result;

    /* Look through proc status contents line by line */
    char delims[] = "\n";
    char* line = strtok(buffer, delims);

    while (line != NULL && (found_vmrss == 0 || found_vmsize == 0) )
    {
        search_result = strstr(line, "VmRSS:");
        if (search_result != NULL)
        {
            sscanf(line, "%*s %ld", vmrss_kb);
            found_vmrss = 1;
        }

        search_result = strstr(line, "VmSize:");
        if (search_result != NULL)
        {
            sscanf(line, "%*s %ld", vmsize_kb);
            found_vmsize = 1;
        }

        line = strtok(NULL, delims);
    }

    return (found_vmrss == 1 && found_vmsize == 1) ? 0 : 1;
}


Download: ../code/check_memory_serial/memory.c

Here’s the corresponding header file

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int get_memory_usage_kb(long* vmrss_kb, long* vmsize_kb);


Download: ../code/check_memory_serial/memory.h

Here’s a small program to test our function. It allocates 20 large buffers, reporting memory usage each time.

#include "memory.h"

int main()
{
    int n = 20;
    int entrySize = 10000000;
    int* buffer[n];
    long vmrss, vmsize;

    for (int i = 0; i < n; i++)
    {
        buffer[i] = malloc( entrySize * sizeof(int) );

        if (!buffer[i])
        {
            printf("Couldn't allocate memory!\n");
            exit(1);
        }

        for (int j = 0; j < entrySize; j++)
        {
            buffer[i][j] = 0;
        }

        get_memory_usage_kb(&vmrss, &vmsize);
        printf("%2d: Current memory usage: VmRSS = %6ld KB, VmSize = %6ld KB\n", 
            i, vmrss, vmsize);
    }

    return 0;
}


Download: ../code/check_memory_serial/check_memory.c

Finally, here is a simple Makefile to compile the test program

PROGNAME := check_memory

main: memory.h
    mpicc $(PROGNAME).c memory.c -o $(PROGNAME)

clean:
    rm -f $(PROGNAME) *.o


Download: ../code/check_memory_serial/Makefile

Building and running the code produces output like this

[araim1@maya-usr1 check_memory_serial]$ make
mpicc check_memory.c memory.c -o check_memory
check_memory.c:
memory.c:
[araim1@maya-usr1 check_memory_serial]$ ./check_memory
 0: Current memory usage: VmRSS =  40260 KB, VmSize =  68052 KB
 1: Current memory usage: VmRSS =  79432 KB, VmSize = 107120 KB
 2: Current memory usage: VmRSS = 118492 KB, VmSize = 146180 KB
 3: Current memory usage: VmRSS = 157556 KB, VmSize = 185244 KB
 4: Current memory usage: VmRSS = 196620 KB, VmSize = 224304 KB
 5: Current memory usage: VmRSS = 235680 KB, VmSize = 263368 KB
 6: Current memory usage: VmRSS = 274744 KB, VmSize = 302432 KB
 7: Current memory usage: VmRSS = 313804 KB, VmSize = 341492 KB
 8: Current memory usage: VmRSS = 352868 KB, VmSize = 380556 KB
 9: Current memory usage: VmRSS = 391932 KB, VmSize = 419620 KB
10: Current memory usage: VmRSS = 430992 KB, VmSize = 458680 KB
11: Current memory usage: VmRSS = 470056 KB, VmSize = 497744 KB
12: Current memory usage: VmRSS = 509120 KB, VmSize = 536804 KB
13: Current memory usage: VmRSS = 548180 KB, VmSize = 575868 KB
14: Current memory usage: VmRSS = 587244 KB, VmSize = 614932 KB
15: Current memory usage: VmRSS = 626304 KB, VmSize = 653992 KB
16: Current memory usage: VmRSS = 665368 KB, VmSize = 693056 KB
17: Current memory usage: VmRSS = 704432 KB, VmSize = 732120 KB
18: Current memory usage: VmRSS = 743492 KB, VmSize = 771180 KB
19: Current memory usage: VmRSS = 782556 KB, VmSize = 810244 KB
[araim1@maya-usr1 check_memory_serial]$

Checking memory from a parallel C program

We’ve seen how to check memory usage for a single process, but what about and MPI job with multiple processes? Let’s suppose we want to see the usage for each process, as well as the total (sum) across all processes. We’ll use the serial function from the previous section, and gather the results into an array on a single process (with ID “root”). We’ll also make a simple helper function that sums over this array (with the result being stored in process 0).

#include "memory_parallel.h"

int get_cluster_memory_usage_kb(long* vmrss_per_process, long* vmsize_per_process, int root, int np)
{
    long vmrss_kb;
    long vmsize_kb;
    int ret_code = get_memory_usage_kb(&vmrss_kb, &vmsize_kb);

    if (ret_code != 0)
    {
        printf("Could not gather memory usage!\n");
        return ret_code;
    }

    MPI_Gather(&vmrss_kb, 1, MPI_UNSIGNED_LONG, 
        vmrss_per_process, 1, MPI_UNSIGNED_LONG, 
        root, MPI_COMM_WORLD);

    MPI_Gather(&vmsize_kb, 1, MPI_UNSIGNED_LONG, 
        vmsize_per_process, 1, MPI_UNSIGNED_LONG, 
        root, MPI_COMM_WORLD);

    return 0;
}

int get_global_memory_usage_kb(long* global_vmrss, long* global_vmsize, int np)
{
    long vmrss_per_process[np];
    long vmsize_per_process[np];
    int ret_code = get_cluster_memory_usage_kb(vmrss_per_process, vmsize_per_process, 0, np);

    if (ret_code != 0)
    {
        return ret_code;
    }

    *global_vmrss = 0;
    *global_vmsize = 0;
    for (int i = 0; i < np; i++)
    {
        *global_vmrss += vmrss_per_process[i];
        *global_vmsize += vmsize_per_process[i];
    }

    return 0;
}


Download: ../code/check_memory_parallel/memory_parallel.c

Here’s the corresponding header file

#include <mpi.h>
#include <memory.h>

int get_cluster_memory_usage_kb(long* vmrss_per_process, long* vmsize_per_process, int root, int np);
int get_global_memory_usage_kb(long* global_vmrss, long* global_vmsize, int np);


Download: ../code/check_memory_parallel/memory_parallel.h

Here is a program to test our functions. This time, for simplicity, we allocate only one vector on each process. Then we print out the memory information.

#include <mpi.h>
#include <stdio.h>
#include <unistd.h>
#include "memory_parallel.h"

int main (int argc, char *argv[])
{
    int id, np;
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    char hostname[MPI_MAX_PROCESSOR_NAME];
    int processor_name_len;

    MPI_Init(&argc, &argv);
    MPI_Comm_size(MPI_COMM_WORLD, &np);
    MPI_Comm_rank(MPI_COMM_WORLD, &id);
    MPI_Get_processor_name(processor_name, &processor_name_len);

    printf("Number_of_processes=%03d, My_rank=%03d, processor_name=%5s\n", 
        np, id, processor_name);

    int entrySize = 1000000 + id * 100000;

    long* l_buffer[entrySize];

    for (int j = 0; j < entrySize; j++)
    {
        l_buffer[j] = 0;
    }

    long vmrss_per_process[np];
    long vmsize_per_process[np];
    get_cluster_memory_usage_kb(vmrss_per_process, vmsize_per_process, 0, np);

    if (id == 0)
    {
        for (int k = 0; k < np; k++)
        {
            printf("Process %03d: VmRSS = %6ld KB, VmSize = %6ld KB\n", 
                k, vmrss_per_process[k], vmsize_per_process[k]);
        }
    }

    long global_vmrss, global_vmsize;
    get_global_memory_usage_kb(&global_vmrss, &global_vmsize, np);
    if (id == 0)
    {
        printf("Global memory usage: VmRSS = %6ld KB, VmSize = %6ld KB\n", 
            global_vmrss, global_vmsize);
    }

    MPI_Finalize();
    return 0;
}

Download: ../code/check_memory_parallel/check_memory.c

Here is the Makefile

PROGNAME := check_memory

main: memory.h memory_parallel.h
    mpicc $(PROGNAME).c memory.c memory_parallel.c -o $(PROGNAME)

clean:
    rm -f $(PROGNAME) *.o


Download: ../code/check_memory_parallel/Makefile

And here is a simple SLURM script to run the program

#!/bin/bash
#SBATCH --job-name=MPI_check_memory
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=develop
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=2

srun ./check_memory

Download: ../code/check_memory_parallel/mvapich2.slurm

Make sure to obtain memory.c and memory.h from the previous section as well. Building and running the code yields the following result.

[araim1@maya-usr1 check_memory_parallel]$ make
mpicc check_memory.c memory.c memory_parallel.c -o check_memory
check_memory.c:
memory.c:
memory_parallel.c:
[araim1@maya-usr1 check_memory_parallel]$ sbatch mvapich2.slurm 
sbatch: Submitted batch job 1581
[araim1@maya-usr1 check_memory_parallel]$ cat slurm.out 
Number_of_processes=004, My_rank=003, processor_name=   n2
Number_of_processes=004, My_rank=001, processor_name=   n1
Number_of_processes=004, My_rank=000, processor_name=   n1
Process 000: VmRSS =  22996 KB, VmSize =  77268 KB
Process 001: VmRSS =  19624 KB, VmSize =  78048 KB
Process 002: VmRSS =  24552 KB, VmSize =  78828 KB
Process 003: VmRSS =  21196 KB, VmSize =  79612 KB
Global memory usage: VmRSS =  88752 KB, VmSize = 314052 KB
Number_of_processes=004, My_rank=002, processor_name=   n2
[araim1@maya-usr1 check_memory_parallel]$ 

Notice that the global memory usage reported in the output is much higher than the sum of per-process usages that are reported immediately prior. This is coming from the usage of the MPI library itself, which is allocating some memory as we use it. We could verify in our test program that if we call both memory functions a second time, the numbers would remain steady.