On this page we’ll see how to use Python on the taki cluster via the module system and anaconda. Before proceeding, make sure you’ve read the How To Run tutorial first. Python is a popular full-featured scripting language. It can be used interactively or through scripting.
On taki there is a default python, namely python 2.7.5 . If you need something newer, say, python 3.x.x then you must see the “Change versions” section. However the default system python will never change. The system python is an integral part of many linux operating systems and as such is sensitive to changes. No additional packages will be installed and no version changes will occur to the system python ever because of this.
How do I know if I’m using system python?
Simply use “which” to see if you’re using system python.
[barajasc@taki-usr1 ~]$ which python /usr/bin/python
If your python is stored at “/usr/bin/python” it is the system python.
There are several versions of Python installed on taki. To use the system Python just type python
[barajasc@taki-usr1 ~]$ python Python 2.7.5 (default, Nov 16 2020, 22:23:17) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>>
To see the various Python distributions and libaries available through the module system simply use
[barajasc@taki-usr1 ~]$ module avail python ANTLR/2.7.7-intel-2018a-Python-3.6.4 GDAL/2.2.3-foss-2018b-Python-3.6.6 GEOS/3.6.2-foss-2018b-Python-3.6.6 GEOS/3.6.2-intel-2017b-Python-3.6.3 GEOS/3.6.2-intel-2018a-Python-3.6.4 (D) Keras/2.2.4-foss-2018b-Python-3.6.6 Keras/2.2.4-intel-2018a-Python-3.6.4 (D) Mako/1.0.7-foss-2018b-Python-2.7.15 Pillow/5.0.0-intel-2017b-Python-3.6.3 PyYAML/3.12-intel-2017b-Python-3.6.3 PyYAML/3.12-intel-2018a-Python-3.6.4 PyYAML/3.13-foss-2018b-Python-3.6.6 Python/2.7.13-foss-2017a Python/2.7.14-GCCcore-6.4.0-bare Python/2.7.15-foss-2018b Python/2.7.15-fosscuda-2018b Python/2.7.15-GCCcore-7.3.0-bare Python/3.6.1-foss-2017a Python/3.6.3-intel-2017b Python/3.6.4-foss-2018a Python/3.6.4-intel-2018a Python/3.6.6-foss-2018b Python/3.6.6-fosscuda-2018b Python/3.6.6-intel-2018b (D) ... and many more!
Pay close attentions to the following:
This states what compiler was used to compile each of these distributions. This is very important when it comes to installing local packages from source or importing additional modules. Let’s enable the intel-2019a version of Python 3.7.6 .
[barajasc@taki-usr1 ~]$ module load Python/3.7.6-intel-2019a [barajasc@taki-usr1 ~]$ python Python 3.7.6 (default, Mar 6 2020, 19:29:58) [GCC Intel(R) C++ gcc 8.2 mode] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numpy, pandas >>> exit() [barajasc@taki-usr1 ~]$
Notice that I did not load any python libraries but was still able to import numpy and pandas! There are many non-standard packages bundles in these modules.
Loading libraries via modules
In the case where the packages you need can’t be imported in the base python module then loading additional modules should help. Let’s search for matplotlib which is a popular plotting and visualization software.
[barajasc@taki-usr1 ~]$ module avail matplotlib matplotlib/3.0.2-foss-2018b-Python-3.6.6 matplotlib/3.0.3-foss-2019a-Python-3.7.2 matplotlib/3.0.0-intel-2018b-Python-3.6.6 matplotlib/3.0.3-intel-2019a-Python-3.7.2 (D)
As mentioned in Using Your taki Account, when you load a module with different dependencies all of the module shuffling is done automatically. Notice how there is no Python 3.6.6 version of matplotlib. This isn’t a problem because matplotlib doesn’t rely on any features that were changed between 3.7.6 and 3.7.4. To avoid any issues with packages that may rely on features that have changed you should make sure that the versions line up. Unload the other Python version, load 3.6.6, load several libaries like matplotlib and keras, and confirm that they work.
[barajasc@taki-usr1 ~]$ module load Python/3.6.6-intel-2018b ... modules are reloaded [barajasc@taki-usr1 ~]$ python Python 3.6.6 (default, Mar 12 2019, 14:04:07) [GCC Intel(R) C++ gcc 6.4 mode] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import matplotlib, keras Traceback (most recent call last): File "<stdin>", line 1, in <module> ModuleNotFoundError: No module named 'matplotlib' >>> exit() [barajasc@taki-usr1 ~]$ module load matplotlib/3.0.0-intel-2018b-Python-3.6.6 Keras/2.2.4-intel-2018a-Python-3.6.4 [barajasc@taki-usr1 ~]$ python Python 3.6.4 (default, Mar 14 2019, 08:55:30) [GCC Intel(R) C++ gcc 6.4 mode] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import matplotlib, keras /usr/ebuild/software/h5py/2.7.1-intel-2018a-Python-3.6.4/lib/python3.6/site-packages/h5py-2.7.1-py3.6-linux-x86_64.egg/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters Using TensorFlow backend. >>> exit()
Example batch script
We’ll write a simple Python program that performs a few calculations in parallel. See How to Run Programs on taki for an explanation on the basics of MPI and SLURM’s sbatch.
We can launch it with a standard SLURM script.
Notice the module for Python is loaded in the run.slurm file and that we use “python filename.py” with mpirun rather than just “filename.py”. Now we launch the job and look at the output from the slurm.out
[barajasc@taki-usr1 pythonPage]$ sbatch run.slurm Submitted batch job 814308 [barajasc@taki-usr1 pythonPage]$ cat slurm.out 20! computed with 2 processes is 2432902008176640000
Non module packages for a python module
With the lack of the root admin credentials you will see errors similar to this one when attempting to install your own packages on taki:
barajasc@taki-usr1 withModules]$ pip install matplotlib Collecting matplotlib Cache entry deserialization failed, entry ignored Downloading https://files.pythonhosted.org/packages/be/74/24d058c17b155d131359f1cd01e120b3954686bf8b7853172b279237e1dc/matplotlib-3.1.3.tar.gz (40.9MB) 100% |################################| 40.9MB 31kB/s Complete output from command python setup.py egg_info: Beginning with Matplotlib 3.1, Python 3.6 or above is required. This may be due to an out of date pip. Make sure you have pip >= 9.0.1. ---------------------------------------- Command "python setup.py egg_info" failed with error code 1 in directory You are using pip version 8.1.2, however version 20.0.2 is available. You should consider upgrading via the 'pip install --upgrade pip' command.
You will never, ever, be given permission to install packages to the system python or any module python. So please do not attempt to do so. The solution to this is to maintain a separate collection of packages for you and your PI group (if applicable).
First load a python module. You can see if you’ve successfully loaded one with the which command.
[barajasc@taki-usr1 withModules]$ which python /usr/bin/python [barajasc@taki-usr1 withModules]$ module load Python/3.7.2-GCCcore-8.2.0 The following have been reloaded with a version change: 1) GCCcore/7.3.0 => GCCcore/8.2.0 [barajasc@taki-usr1 withModules]$ which python /usr/ebuild/software/Python/3.7.2-GCCcore-8.2.0/bin/python
Then create a directory to store all personal modules in. Please note that this directory should be in your dedicated research storage area and NOT in your home folder! Installing these packages in your home folder will result in a full home folder which locks you out of all system commands even ls!
[barajasc@taki-usr1 withModules]$ mkdir package_storage
Now add that directory to your python path. Note here that this line should also go into your .bashrc if you intend on having these packages available all the time.
[barajasc@taki-usr1 withModules]$ export PYTHONPATH="absolute/path/to/package_storage/lib/python3.7/site-packages:$PYTHONPATH"
Note here that I have added “lib/python3.7/site-packages”. When you install packages in this way they’ll be stored a couple directories deeper to identify the type of python package installed and what version it was installed for. In this case I was using the python3.7 module and Pillow is just a library. Be sure to add this export line to your ~/.bashrc for future use.
Then install to that directory with pip. Let’s install a popular image manipulation library called Pillow.
[barajasc@taki-usr1 withModules]$ pip install --prefix=absolute/path/to/package_storage pillow Collecting pillow Downloading https://files.pythonhosted.org/packages/f5/79/b2d5695d1a931474fa68b68ec93bdf08ba9acbc4d6b3b628eb6aac81d11c/Pillow-7.0.0-cp37-cp37m-manylinux1_x86_64.whl (2.1MB) 100% |████████████████████████████████| 2.1MB 1.8MB/s Installing collected packages: pillow Successfully installed pillow-7.0.0
Lastly just use the packages as if they were part of the python distribution you were using.
[barajasc@taki-usr1 withModules]$ python Python 3.7.2 (default, May 22 2019, 11:56:05) [GCC 8.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import PIL >>>
To make installing packages a lot smoother with less typing it is suggested to use an “alias” of pip. An alias, in this case, is a sort of shortcut to help cut down on the amount you have to type to do a specific task
$ echo "alias pipi=pip install --prefix=absolute/path/to/package_storage" >> ~/.bashrc
Now whenever you call pipi it will automatically tell pip to install whatever package you want to the alternate storage location.
It is highly suggested that you also use setup soft links for pip. When you install a pip or conda package it downloads a copy of it into your home folder. After a while your home directory will fill up and you will be unable to delete anything due to a lack of space. In this case you will have to file a ticket to get the problem fixed. In order to prevent this we suggest you move you ~/.cache folder to your research folder if it exists. Then create a soft link to your home folder to your research storage.Note that we use “pi_user” here. For me, it would be “gobbert_user”, for you it may be something else.
First, move your ~/.cache folder to your user research storage:
mv ~/.cache ~/pi_user/
If you do not have a ~/.cache folder then you will have to make one in your research storage with
Then we simply add a soft link from your research cache to your home folder cache.
ln -s ~/pi_user/.cache ~/.cache
All done! Now when pip and conda download packages they should go to your research storage. Note that other software uses the .cache folder and this also benefits that software as well.
On taki, there is an interactive python mode called “ipython”. To use this, we must first load the python module
[dkelly7@taki-usr1 plot]$ module load Python/3.7.6-intel-2019a
Then we can just run ipython. A simple check would be to print “Hello World!”.
[dkelly7@taki-usr1 plot]$ ipython Python 3.7.6 (default, Mar 6 2020, 19:29:58) Type 'copyright', 'credits' or 'license' for more information IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help. In : print ("Hello, World!") Hello, World! In : exit
For additional questions please see the mpi4py documentation and other HPCF pages for help.