PyTorch

What is PyTorch?

PyTorch is a GPU/CPU enabled neural network library written in C with native bindings to Python. This tutorial intends to teach you how use and run PyTorch on tak. This tutorial does NOT serve as an all purpose, all encompassing guide to PyTorch. For more detailed information see https://pytorch.org/Please note that until further notice PyTorch will not natively run on multiple GPUs on taki due to problems with NCCL.

Before starting this guide please read the pages on USING YOUR TAKI ACCOUNT, HOW TO RUN PYTHON ON TAKI, and HOW TO RUN ON THE GPUS.

What modules do I use?

Individual modules

There are several modules which can be used to gain access to PyTorch. First there are the independent modules which load PyTorch and the prerequisite.

module load PyTorch/1.3.1-foss-2019b-Python-3.7.4

or

module load pytorch/1.5.0

Bundled with Python modules

There is currently one Python module which automatically loads PyTorch and Tensorflow, specifically

module load Python/3.7.6-intel-2019a

Notice that when loading the module very few extra modules are loaded.

[barajasc@taki-usr2 ~]$ module load Python/3.7.6-intel-2019a
[barajasc@taki-usr2 ~]$ module list
Currently Loaded Modules:
1) cuDNN/7.6.2.24-CUDA-10.1.243 10) zlib/1.2.11-GCCcore-8.2.0 19) libffi/3.2.1-GCCcore-8.2.0
2) binutils/2.31.1-GCCcore-8.2.0 11) libpng/1.6.36-GCCcore-8.2.0 20) intel/2019a
3) icc/2019.1.144-GCC-8.2.0-2.31.1 12) freetype/2.9.1-GCCcore-8.2.0 21) GCCcore/8.2.0
4) ifort/2019.1.144-GCC-8.2.0-2.31.1 13) ncurses/6.1-GCCcore-8.2.0 22) NVIDIA-Drivers/396.44
5) iccifort/2019.1.144-GCC-8.2.0-2.31.1 14) libreadline/8.0-GCCcore-8.2.0 23) CUDA/10.1.243.87
6) impi/2018.4.274-iccifort-2019.1.144-GCC-8.2.0-2.31.1 15) Tcl/8.6.9-GCCcore-8.2.0 24) NCCL/2.7.6-1-intel-2019a
7) iimpi/2019a 16) SQLite/3.27.2-GCCcore-8.2.0 25) Python/3.7.6-intel-2019a
8) imkl/2019.1.144-iimpi-2019a 17) XZ/5.2.4-GCCcore-8.2.0
9) bzip2/1.0.6-GCCcore-8.2.0 18) GMP/6.1.2-GCCcore-8.2.0
Yet we can still load PyTorch, NumPy, Tensorflow, sklearn, and many more.
[barajasc@taki-usr2 ~]$ python
Python 3.7.6 (default, Mar 6 2020, 19:29:58)
[GCC Intel(R) C++ gcc 8.2 mode] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch, tensorflow, numpy, sklearn, pandas
2020-07-20 12:43:58.163107: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
>>>

Which one should I use?

An important difference between the Python bundle and the individual module is that the PyTorch modules are locked into a specific version of PyTorch. If you load pytorch/1.5.0 you are going to be using pytorch 1.5.0 and nothing else. If you load the Python bundle you are not promised to get any specific version because the bundle’s libraries are being actively updated as newer versions of libraries are released. Today it could be PyTorch 1.5.0 but tomorrow could be PyTorch 1.5.0-rc4 or even PyTorch 1.6.0. You should use this information to decide if you want to use a distribution which is constantly updating or a static version.

Running PyTorch code on taki

Data acquisition

For the input data we will be using the digits data set which comes from the sklearn datasets.

from torch import tensor, float32
import torch
from numpy import zeros, arange
from sklearn import datasets
from os import environ
def getData(device):
    mnist    = datasets.load_digits(return_X_y=True)
    inputs   = tensor(mnist[0], dtype=float32, device=device)
    outputs  = tensor(mnist[1], dtype=torch.long, device=device)

    return inputs, outputs

Download source:
…/pytorch/helpers.py

Neural network setup

The network structure is a simple fully connected network containing two hidden layers with ReLU activators and a fully connected output layer with 10 output nodes.

from torch.nn import Sequential, Module
from torch.nn import Linear, ReLU

def createNetwork():
    # Input Layer
    model = Sequential(
        # Hidden Layers
        Linear(64, 256),
        ReLU(),
        Linear(256, 256),
        ReLU(),
        # Output Layer
        Linear(256, 10)
    )
    return model

Additionally we will be using the Adam optimizer and cross entropy loss.

from torch.nn import CrossEntropyLoss
from torch.optim import Adam

def getOptimizer(model, learningRate=1e-3):
    return Adam(model.parameters(), lr=learningRate, betas=(0.5, 0.999))

def getLossFn():
    return CrossEntropyLoss()

Download source:
…/pytorch/network.py

Training the neural network

First we begin with loading our libraries

from network import createNetwork, getOptimizer, getLossFn
from torch.utils.data import DataLoader, TensorDataset
from helpers import getData
import torch

Next we define some common hyperparameters namely, learning rate, epochs, and batch size.

learningRate = 1e-3
epochs = 8
batchSize = 64

Now select whether to use CPU or GPU. It is highly encouraged that you use GPUs for training however this code will work for either.

if torch.cuda.is_available():
    device = torch.device('cuda:0')
else:
    device = torch.device('cpu')
model = createNetwork()
model = model.to(device)
print('Device:',next(model.parameters()).device)

Then we load in our data, get our optimizer, and get our loss function.

digits_inputs, digits_labels = getData(device=device)
digits     = TensorDataset(digits_inputs, digits_labels)
loader     = DataLoader(digits, batch_size=batchSize, shuffle=True)
optimizer  = getOptimizer(model, learningRate)
criterion  = getLossFn()

At this point we can actually train the network using a fairly standard training loop. You can find similar versions of this simple training loop in the PyTorch documentation.

for epoch in range(epochs):
    tempLoss = 0
    minibatch = 0
    for inputs, labels in loader:
        minibatch += 1
        # Zero the gradient
        optimizer.zero_grad()
        # Forward, backward, optmize
        classifications = model(inputs)
        loss = criterion(classifications, labels)
        loss.backward()
        optimizer.step()
        # Epoch loss
        tempLoss += loss.item()
    print("{:03d} :: loss = {:.3f}".format(epoch, tempLoss / minibatch))

Finally we save our trained network to storage

# Save the model
torch.save(model.state_dict(), "./digit_prediction.pth")

Download source:
…/pytorch/train.py

Slurm file

In the slurm file we will  load the module, request the gpu partition, request a single GPU, and then run the file without MPI. Recall that you should the same proportions of CPUs for a node as GPUs for the node. If you request half of the GPUs for a node you should also request half of the CPUs for that node.

#!/bin/bash
#SBATCH --job-name=PyTorchtutorial
#SBATCH --output=slurm.out
#SBATCH --error=slurm.err
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --qos=short+
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=30G

module load Python/3.7.6-intel-2019a
srun python train.py

Download source:
…/pytorch/run.slurm

Submitting the job

First sbatch the file.

[barajasc@taki-usr2 pytorchWebpage]$ sbatch run.slurm
Submitted batch job 3033853

Then when the run finishes we should see the error file, the output file, and the network’s pth file.

[barajasc@taki-usr2 pytorchWebpage]$ ls -lh
total 373K
-rw-rw---- 1 barajasc pi_gobbert 335K Jul 20 14:07 digit_prediction.pth
-rw-rw---- 1 barajasc pi_gobbert  618 Jul 20 13:08 helpers.py
-rw-rw---- 1 barajasc pi_gobbert  570 Jul 20 13:06 network.py
-rw-rw---- 1 barajasc pi_gobbert  315 Jul 20 13:25 run.slurm
-rw-rw---- 1 barajasc pi_gobbert  663 Jul 20 14:07 slurm.err
-rw-rw---- 1 barajasc pi_gobbert  175 Jul 20 14:07 slurm.out
-rw-rw---- 1 barajasc pi_gobbert 1.4K Jul 14 19:43 train.py

The slurm.err file is not empty but this does not mean the code failed. The error file is full of information from LMod simply telling us that it had to load in some extra modules when loading the Python bundle.

[barajasc@taki-usr2 pytorchWebpage]$ cat slurm.err

The following have been reloaded with a version change:
  1) CUDA/10.1.243.87 => CUDA/10.1.243
  2) GCCcore/7.3.0 => GCCcore/8.2.0
  3) binutils/2.30-GCCcore-7.3.0 => binutils/2.31.1-GCCcore-8.2.0
  4) icc/2018.3.222-GCC-7.3.0-2.30 => icc/2019.1.144-GCC-8.2.0-2.31.1
  5) iccifort/2018.3.222-GCC-7.3.0-2.30 => iccifort/2019.1.144-GCC-8.2.0-2.31.1
  6) ifort/2018.3.222-GCC-7.3.0-2.30 => ifort/2019.1.144-GCC-8.2.0-2.31.1
  7) iimpi/2018b => iimpi/2019a
  8) imkl/2018.3.222-iimpi-2018b => imkl/2019.1.144-iimpi-2019a
  9) impi/2018.3.222-iccifort-2018.3.222-GCC-7.3.0-2.30 => impi/2018.4.274-iccifort-2019.1.144-GCC-8.2.0-2.31.1
 10) intel/2018b => intel/2019a

Now our slurm.out file contains the information we want to know.

[barajasc@taki-usr2 pytorchWebpage]$ cat slurm.out
Device: cuda:0
000 :: loss = 0.726
001 :: loss = 0.169
002 :: loss = 0.083
003 :: loss = 0.077
004 :: loss = 0.040
005 :: loss = 0.033
006 :: loss = 0.019
007 :: loss = 0.020

For more information on PyTorch please see their documentation https://pytorch.org/