Checking disk usage

As you conduct your work on the cluster, you will use more and more of your allocated space, and may not always be aware of how the space is being used. Inevitably, free space will run low and you will need to do some cleaning up. On this page we will provide some help to identify how your space is being used.

One very important place to keep an eye on is your home directory. Many programs use space in your home directory to store preferences, settings, etc. If you don’t check occasionally, you may not be aware of how much space is being used this way. Also, if you run out of space in your home, you may begin to get mysterious errors which do not appear to be related to disk space. For example, users often see “X11 connection rejected because of wrong authentication” when trying to run X Windows programs remotely.

Before reading this page, make sure to read Using your account so that you know the location of your storage areas, and how to do basic checks of quotas and partition sizes.

Compression on some filesystems

Some of our filesystems feature automatic compression – namely the attached 160 TB of central storage. Compression of your files on the storage device happens behind the scenes, without any action needed from you. This is very convenient for saving space but can also be slightly confusing when examining the sizes of your files.

For example, suppose we have a large file in the User Workspace portion of our PI’s storage.

[araim1@maya-usr1 data]$ pwd
/home/araim1/nagaraj_common/data
[araim1@maya-usr1 data]$ ls -lh
total 770M
-rw-r----- 1 araim1 pi_nagaraj 2.0G Oct 11  2010 train.dat
[araim1@maya-usr1 data]$ du -h train.dat
770M	train.dat

Notice that output isn’t quite consistent. The “ls” command says that our file is 2 GB, but the “total” is 770 MB. Then the “du” command reports 770 MB in use as well. The reason for this discrepancy is that the file “train.dat” has been compressed from its original size (about 2 GB) to 770 MB. Some commands are displaying the actual storage used on the device, and some are counting the number of bytes.

For the purposes of managing disk space, in all likelihood you will want to refer to the actual storage used. The “du” command will usually display this, unless you use certain flags like “–apparent-size” or “-b”. See the man page “man du” for details, or stick with “du -h” to be safe, or contact us if you have questions.

“topdu” script

We have provided a commmand called “topdu” which computes the top N largest files / sub-directories in a given directory. This can help you quickly identify where your space is being used the most.

[araim1@maya-usr1 ~]$ topdu

If we run the command without arguments, it will give us the top 10 largest files and directories in our current working directory. Notice that the units are in KB. Let’s run this in our home

[araim1@maya-usr1 ~]$ topdu
(May take some time to compute space usage)
The 10 largest files or directories (in KB) under /home/araim1/
24080	/home/araim1/.idl
14812	/home/araim1/userEstimates.txt
6448	/home/araim1/texmf.bak.tar.gz
4252	/home/araim1/beamer-3-10.tar
4244	/home/araim1/tmp
3928	/home/araim1/r-workshop
3220	/home/araim1/.matlab
2220	/home/araim1/beamer
1448	/home/araim1/out.pdf
1332	/home/araim1/2669851.pdf

It’s easy to see that the IDL program is using the most space in home in the output above, at about 24 MB. We can also check other directories besides the current working directory, and also optionally request a different number than top 10. We can also suppress the text at the beginning, to use the output in other scripts (for example). To see all the available options

[araim1@maya-usr1 ~]$ topdu -h
Usage: topdu [-n <TOPN>] [-q] [<DIRNAME1>] [<DIRNAME2>] ...
	Display the TOPN largest files in DIRNAME1, ... DIRNAMEk
	Default TOPN is 10
	If no DIRNAMEs specified, default is to use current working directory
	-q suppresses any extra output

Here’s an example using some of the options

[araim1@maya-usr1 ~]$ topdu -n 5 -q ~/ ~/nagaraj_user/
24080	/home/araim1/.idl
14812	/home/araim1/userEstimates.txt
6448	/home/araim1/texmf.bak.tar.gz
4252	/home/araim1/beamer-3-10.tar
4244	/home/araim1/tmp
207119	/home/araim1/nagaraj_user/work
170720	/home/araim1/nagaraj_user/doc
129705	/home/araim1/nagaraj_user/petsc-2.3.3-p15
112629	/home/araim1/nagaraj_user/mpich2
109396	/home/araim1/nagaraj_user/project

We can also set up a BASH alias if we’d like, for quick use of a frequently used command

[araim1@maya-usr1 ~]$ alias top15='topdu -n 5 /home/araim1'
[araim1@maya-usr1 ~]$ top15
(May take some time to compute space usage)
The 15 largest files or directories (in KB) under /home/araim1/
24080	/home/araim1/.idl
14812	/home/araim1/userEstimates.txt
6448	/home/araim1/texmf.bak.tar.gz
4252	/home/araim1/beamer-3-10.tar
4244	/home/araim1/tmp