As you conduct your work on the cluster, you will use more and more of your allocated space, and may not always be aware of how the space is being used. Inevitably, free space will run low and you will need to do some cleaning up. On this page we will provide some help to identify how your space is being used.
One very important place to keep an eye on is your home directory. Many programs use space in your home directory to store preferences, settings, etc. If you don’t check occasionally, you may not be aware of how much space is being used this way. Also, if you run out of space in your home, you may begin to get mysterious errors which do not appear to be related to disk space. For example, users often see “X11 connection rejected because of wrong authentication” when trying to run X Windows programs remotely.
Before reading this page, make sure to read Using your account so that you know the location of your storage areas, and how to do basic checks of quotas and partition sizes.
Compression on some filesystems
Some of our filesystems feature automatic compression – namely the attached 160 TB of central storage. Compression of your files on the storage device happens behind the scenes, without any action needed from you. This is very convenient for saving space but can also be slightly confusing when examining the sizes of your files.
For example, suppose we have a large file in the User Workspace portion of our PI’s storage.
[araim1@maya-usr1 data]$ pwd /home/araim1/nagaraj_common/data [araim1@maya-usr1 data]$ ls -lh total 770M -rw-r----- 1 araim1 pi_nagaraj 2.0G Oct 11 2010 train.dat [araim1@maya-usr1 data]$ du -h train.dat 770M train.dat
Notice that output isn’t quite consistent. The “ls” command says that our file is 2 GB, but the “total” is 770 MB. Then the “du” command reports 770 MB in use as well. The reason for this discrepancy is that the file “train.dat” has been compressed from its original size (about 2 GB) to 770 MB. Some commands are displaying the actual storage used on the device, and some are counting the number of bytes.
For the purposes of managing disk space, in all likelihood you will want to refer to the actual storage used. The “du” command will usually display this, unless you use certain flags like “–apparent-size” or “-b”. See the man page “man du” for details, or stick with “du -h” to be safe, or contact us if you have questions.
“topdu” script
We have provided a commmand called “topdu” which computes the top N largest files / sub-directories in a given directory. This can help you quickly identify where your space is being used the most.
[araim1@maya-usr1 ~]$ topdu
If we run the command without arguments, it will give us the top 10 largest files and directories in our current working directory. Notice that the units are in KB. Let’s run this in our home
[araim1@maya-usr1 ~]$ topdu (May take some time to compute space usage) The 10 largest files or directories (in KB) under /home/araim1/ 24080 /home/araim1/.idl 14812 /home/araim1/userEstimates.txt 6448 /home/araim1/texmf.bak.tar.gz 4252 /home/araim1/beamer-3-10.tar 4244 /home/araim1/tmp 3928 /home/araim1/r-workshop 3220 /home/araim1/.matlab 2220 /home/araim1/beamer 1448 /home/araim1/out.pdf 1332 /home/araim1/2669851.pdf
It’s easy to see that the IDL program is using the most space in home in the output above, at about 24 MB. We can also check other directories besides the current working directory, and also optionally request a different number than top 10. We can also suppress the text at the beginning, to use the output in other scripts (for example). To see all the available options
[araim1@maya-usr1 ~]$ topdu -h Usage: topdu [-n <TOPN>] [-q] [<DIRNAME1>] [<DIRNAME2>] ... Display the TOPN largest files in DIRNAME1, ... DIRNAMEk Default TOPN is 10 If no DIRNAMEs specified, default is to use current working directory -q suppresses any extra output
Here’s an example using some of the options
[araim1@maya-usr1 ~]$ topdu -n 5 -q ~/ ~/nagaraj_user/ 24080 /home/araim1/.idl 14812 /home/araim1/userEstimates.txt 6448 /home/araim1/texmf.bak.tar.gz 4252 /home/araim1/beamer-3-10.tar 4244 /home/araim1/tmp 207119 /home/araim1/nagaraj_user/work 170720 /home/araim1/nagaraj_user/doc 129705 /home/araim1/nagaraj_user/petsc-2.3.3-p15 112629 /home/araim1/nagaraj_user/mpich2 109396 /home/araim1/nagaraj_user/project
We can also set up a BASH alias if we’d like, for quick use of a frequently used command
[araim1@maya-usr1 ~]$ alias top15='topdu -n 5 /home/araim1' [araim1@maya-usr1 ~]$ top15 (May take some time to compute space usage) The 15 largest files or directories (in KB) under /home/araim1/ 24080 /home/araim1/.idl 14812 /home/araim1/userEstimates.txt 6448 /home/araim1/texmf.bak.tar.gz 4252 /home/araim1/beamer-3-10.tar 4244 /home/araim1/tmp