How to run python on taki

Introduction

On this page we’ll see how to use Python on the taki cluster via the module system and anaconda. Before proceeding, make sure you’ve read the How To Run tutorial first. Python is a popular full-featured scripting language. It can be used interactively or through scripting.

System Python

On taki there is a default python, namely python 2.7.5 . If you need something newer, say, python 3.x.x then you must see the “Change versions” section. However the default system python will never change. The system python is an integral part of many linux operating systems and as such is sensitive to changes. No additional packages will be installed and no version changes will occur to the system python ever because of this.

How do I know if I’m using system python?

Simply use “which” to see if you’re using system python.

[barajasc@taki-usr1 ~]$ which python
/usr/bin/python

If your python is stored at “/usr/bin/python” it is the system python.

Change versions

There are several versions of Python installed on taki. To use the system Python just type python

[barajasc@taki-usr1 ~]$ python
Python 2.7.5 (default, Nov 16 2020, 22:23:17)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

To see the various Python distributions and libaries available through the module system simply use

[barajasc@taki-usr1 ~]$ module avail python
   ANTLR/2.7.7-intel-2018a-Python-3.6.4
   GDAL/2.2.3-foss-2018b-Python-3.6.6
   GEOS/3.6.2-foss-2018b-Python-3.6.6
   GEOS/3.6.2-intel-2017b-Python-3.6.3
   GEOS/3.6.2-intel-2018a-Python-3.6.4           (D)
   Keras/2.2.4-foss-2018b-Python-3.6.6
   Keras/2.2.4-intel-2018a-Python-3.6.4          (D)
   Mako/1.0.7-foss-2018b-Python-2.7.15
   Pillow/5.0.0-intel-2017b-Python-3.6.3
   PyYAML/3.12-intel-2017b-Python-3.6.3
   PyYAML/3.12-intel-2018a-Python-3.6.4
   PyYAML/3.13-foss-2018b-Python-3.6.6
   Python/2.7.13-foss-2017a
   Python/2.7.14-GCCcore-6.4.0-bare
   Python/2.7.15-foss-2018b
   Python/2.7.15-fosscuda-2018b
   Python/2.7.15-GCCcore-7.3.0-bare
   Python/3.6.1-foss-2017a
   Python/3.6.3-intel-2017b
   Python/3.6.4-foss-2018a
   Python/3.6.4-intel-2018a
   Python/3.6.6-foss-2018b
   Python/3.6.6-fosscuda-2018b
   Python/3.6.6-intel-2018b                      (D)
   ... and many more!

Pay close attentions to the following:

Python/3.6.6-foss-2018b

Python/3.7.6-intel-2019a

This states what compiler was used to compile each of these distributions. This is very important when it comes to installing local packages from source or importing additional modules. Let’s enable the intel-2019a version of Python 3.7.6 .

[barajasc@taki-usr1 ~]$ module load Python/3.7.6-intel-2019a
[barajasc@taki-usr1 ~]$ python
Python 3.7.6 (default, Mar  6 2020, 19:29:58)
[GCC Intel(R) C++ gcc 8.2 mode] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy, pandas
>>> exit()
[barajasc@taki-usr1 ~]$

Notice that I did not load any python libraries but was still able to import numpy and pandas! There are many non-standard packages bundles in these modules.

Loading libraries via modules

In the case where the packages you need can’t be imported in the base python module then loading additional modules should help. Let’s search for matplotlib which is a popular plotting and visualization software.

[barajasc@taki-usr1 ~]$ module avail matplotlib

matplotlib/3.0.2-foss-2018b-Python-3.6.6
matplotlib/3.0.3-foss-2019a-Python-3.7.2
matplotlib/3.0.0-intel-2018b-Python-3.6.6
matplotlib/3.0.3-intel-2019a-Python-3.7.2 (D)

As mentioned in Using Your taki Account, when you load a module with different dependencies all of the module shuffling is done automatically. Notice how there is no Python 3.6.6 version of matplotlib. This isn’t a problem because matplotlib doesn’t rely on any features that were changed between 3.7.6 and 3.7.4. To avoid any issues with packages that may rely on features that have changed you should make sure that the versions line up. Unload the other Python version, load 3.6.6, load several libaries like matplotlib and keras, and confirm that they work.

[barajasc@taki-usr1 ~]$ module load Python/3.6.6-intel-2018b
... modules are reloaded
[barajasc@taki-usr1 ~]$ python
Python 3.6.6 (default, Mar 12 2019, 14:04:07)
[GCC Intel(R) C++ gcc 6.4 mode] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib, keras
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'matplotlib'
>>> exit()
[barajasc@taki-usr1 ~]$ module load matplotlib/3.0.0-intel-2018b-Python-3.6.6   Keras/2.2.4-intel-2018a-Python-3.6.4
[barajasc@taki-usr1 ~]$ python
Python 3.6.4 (default, Mar 14 2019, 08:55:30)
[GCC Intel(R) C++ gcc 6.4 mode] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib, keras
/usr/ebuild/software/h5py/2.7.1-intel-2018a-Python-3.6.4/lib/python3.6/site-packages/h5py-2.7.1-py3.6-linux-x86_64.egg/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
>>> exit()


Example batch script

We’ll write a simple Python program that performs a few calculations in parallel. See How to Run Programs on taki for an explanation on the basics of MPI and SLURM’s sbatch.


Download: ..code-2018/python/pythonParallel.py

We can launch it with a standard SLURM script.

Download: ..code-2018/python/run.slurm

Notice the module for Python is loaded in the run.slurm file and that we use “python filename.py” with mpirun rather than just “filename.py”.  Now we launch the job and look at the output from the slurm.out

[barajasc@taki-usr1 pythonPage]$ sbatch run.slurm
Submitted batch job 814308
[barajasc@taki-usr1 pythonPage]$ cat slurm.out
20! computed with 2 processes is 2432902008176640000

Non module packages for a python module

With the lack of the root admin credentials you will see errors similar to this one when attempting to install your own packages on taki:

barajasc@taki-usr1 withModules]$ pip install matplotlib
Collecting matplotlib
  Cache entry deserialization failed, entry ignored
  Downloading https://files.pythonhosted.org/packages/be/74/24d058c17b155d131359f1cd01e120b3954686bf8b7853172b279237e1dc/matplotlib-3.1.3.tar.gz (40.9MB)
    100% |################################| 40.9MB 31kB/s
    Complete output from command python setup.py egg_info:

    Beginning with Matplotlib 3.1, Python 3.6 or above is required.

    This may be due to an out of date pip.

    Make sure you have pip >= 9.0.1.


    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in directory
You are using pip version 8.1.2, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

You will never, ever, be given permission to install packages to the system python or any module python. So please do not attempt to do so. The solution to this is to maintain a separate collection of packages for you and your PI group (if applicable).

First load a python module. You can see if you’ve successfully loaded one with the which command.

[barajasc@taki-usr1 withModules]$ which python
/usr/bin/python
[barajasc@taki-usr1 withModules]$ module load Python/3.7.2-GCCcore-8.2.0

The following have been reloaded with a version change:
  1) GCCcore/7.3.0 => GCCcore/8.2.0

[barajasc@taki-usr1 withModules]$ which python
/usr/ebuild/software/Python/3.7.2-GCCcore-8.2.0/bin/python

Then create a directory to store all personal modules in. Please note that this directory should be in your dedicated research storage area and NOT in your home folder! Installing these packages in your home folder will result in a full home folder which locks you out of all system commands even ls!

[barajasc@taki-usr1 withModules]$ mkdir package_storage

Now add that directory to your python path. Note here that this line should also go into your .bashrc if you intend on having these packages available all the time.

[barajasc@taki-usr1 withModules]$ export PYTHONPATH="absolute/path/to/package_storage/lib/python3.7/site-packages:$PYTHONPATH"

Note here that I have added “lib/python3.7/site-packages”. When you install packages in this way they’ll be stored a couple directories deeper to identify the type of python package installed and what version it was installed for. In this case I was using the python3.7 module and Pillow is just a library. Be sure to add this export line to your ~/.bashrc for future use.

Then install to that directory with pip. Let’s install a popular image manipulation library called Pillow.

[barajasc@taki-usr1 withModules]$ pip install --prefix=absolute/path/to/package_storage pillow                                                                                                                                                                                                                                      Collecting pillow
  Downloading https://files.pythonhosted.org/packages/f5/79/b2d5695d1a931474fa68b68ec93bdf08ba9acbc4d6b3b628eb6aac81d11c/Pillow-7.0.0-cp37-cp37m-manylinux1_x86_64.whl (2.1MB)
    100% |████████████████████████████████| 2.1MB 1.8MB/s
Installing collected packages: pillow
Successfully installed pillow-7.0.0

Lastly just use the packages as if they were part of the python distribution you were using.

[barajasc@taki-usr1 withModules]$ python
Python 3.7.2 (default, May 22 2019, 11:56:05)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import PIL
>>>

To make installing packages a lot smoother with less typing it is suggested to use an “alias” of pip. An alias, in this case, is a sort of shortcut to help cut down on the amount you have to type to do a specific task

$ echo "alias pipi=pip install --prefix=absolute/path/to/package_storage" >> ~/.bashrc

Now whenever you call pipi it will automatically tell pip to install whatever package you want to the alternate storage location.

It is highly suggested that you also use setup soft links for pip. When you install a pip or conda package it downloads a copy of it into your home folder. After a while your home directory will fill up and you will be unable to delete anything due to a lack of space. In this case you will have to file a ticket to get the problem fixed. In order to prevent this we suggest you move you ~/.cache folder to your research folder if it exists. Then create a soft link to your home folder to your research storage.Note that we use “pi_user” here. For me, it would be “gobbert_user”, for you it may be something else.

First, move your ~/.cache folder to your user research storage:

mv ~/.cache ~/pi_user/

If you do not have a ~/.cache folder then you will have to make one in your research storage with

mkdir ~/pi_user/.cache

Then we simply add a soft link from your research cache to your home folder cache.

ln -s ~/pi_user/.cache ~/.cache

All done! Now when pip and conda download packages they should go to your research storage. Note that other software uses the .cache folder and this also benefits that software as well.

Interactive Mode

On taki, there is an interactive python mode called “ipython”. To use this, we must first load the python module

[dkelly7@taki-usr1 plot]$ module load Python/3.7.6-intel-2019a

Then we can just run ipython. A simple check would be to print “Hello World!”.

[dkelly7@taki-usr1 plot]$ ipython
Python 3.7.6 (default, Mar  6 2020, 19:29:58)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: print ("Hello, World!")
Hello, World!

In [2]: exit

For additional questions please see the mpi4py documentation and other HPCF pages for help.