What is PyTorch?
PyTorch is a GPU/CPU enabled neural network library written in C with native bindings to Python. This tutorial intends to teach you how use and run PyTorch on tak. This tutorial does NOT serve as an all purpose, all encompassing guide to PyTorch. For more detailed information see https://pytorch.org/ . Please note that until further notice PyTorch will not natively run on multiple GPUs on taki due to problems with NCCL.
Before starting this guide please read the pages on USING YOUR TAKI ACCOUNT, HOW TO RUN PYTHON ON TAKI, and HOW TO RUN ON THE GPUS.
What modules do I use?
Individual modules
There are several modules which can be used to gain access to PyTorch. First there are the independent modules which load PyTorch and the prerequisite.
module load PyTorch/1.3.1-foss-2019b-Python-3.7.4
or
module load pytorch/1.5.0
Bundled with Python modules
There is currently one Python module which automatically loads PyTorch and Tensorflow, specifically
module load Python/3.7.6-intel-2019a
Notice that when loading the module very few extra modules are loaded.
[barajasc@taki-usr2 ~]$ module load Python/3.7.6-intel-2019a [barajasc@taki-usr2 ~]$ module list Currently Loaded Modules: 1) cuDNN/7.6.2.24-CUDA-10.1.243 10) zlib/1.2.11-GCCcore-8.2.0 19) libffi/3.2.1-GCCcore-8.2.0 2) binutils/2.31.1-GCCcore-8.2.0 11) libpng/1.6.36-GCCcore-8.2.0 20) intel/2019a 3) icc/2019.1.144-GCC-8.2.0-2.31.1 12) freetype/2.9.1-GCCcore-8.2.0 21) GCCcore/8.2.0 4) ifort/2019.1.144-GCC-8.2.0-2.31.1 13) ncurses/6.1-GCCcore-8.2.0 22) NVIDIA-Drivers/396.44 5) iccifort/2019.1.144-GCC-8.2.0-2.31.1 14) libreadline/8.0-GCCcore-8.2.0 23) CUDA/10.1.243.87 6) impi/2018.4.274-iccifort-2019.1.144-GCC-8.2.0-2.31.1 15) Tcl/8.6.9-GCCcore-8.2.0 24) NCCL/2.7.6-1-intel-2019a 7) iimpi/2019a 16) SQLite/3.27.2-GCCcore-8.2.0 25) Python/3.7.6-intel-2019a 8) imkl/2019.1.144-iimpi-2019a 17) XZ/5.2.4-GCCcore-8.2.0 9) bzip2/1.0.6-GCCcore-8.2.0 18) GMP/6.1.2-GCCcore-8.2.0
[barajasc@taki-usr2 ~]$ python Python 3.7.6 (default, Mar 6 2020, 19:29:58) [GCC Intel(R) C++ gcc 8.2 mode] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch, tensorflow, numpy, sklearn, pandas 2020-07-20 12:43:58.163107: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 >>>
Which one should I use?
Running PyTorch code on taki
Data acquisition
For the input data we will be using the digits data set which comes from the sklearn datasets.
from torch import tensor, float32 import torch from numpy import zeros, arange from sklearn import datasets from os import environ def getData(device): mnist = datasets.load_digits(return_X_y=True) inputs = tensor(mnist[0], dtype=float32, device=device) outputs = tensor(mnist[1], dtype=torch.long, device=device) return inputs, outputs
Download source:
…/pytorch/helpers.py
Neural network setup
The network structure is a simple fully connected network containing two hidden layers with ReLU activators and a fully connected output layer with 10 output nodes.
from torch.nn import Sequential, Module from torch.nn import Linear, ReLU def createNetwork(): # Input Layer model = Sequential( # Hidden Layers Linear(64, 256), ReLU(), Linear(256, 256), ReLU(), # Output Layer Linear(256, 10) ) return model
Additionally we will be using the Adam optimizer and cross entropy loss.
from torch.nn import CrossEntropyLoss from torch.optim import Adam def getOptimizer(model, learningRate=1e-3): return Adam(model.parameters(), lr=learningRate, betas=(0.5, 0.999)) def getLossFn(): return CrossEntropyLoss()
Download source:
…/pytorch/network.py
Training the neural network
First we begin with loading our libraries
from network import createNetwork, getOptimizer, getLossFn from torch.utils.data import DataLoader, TensorDataset from helpers import getData import torch
Next we define some common hyperparameters namely, learning rate, epochs, and batch size.
learningRate = 1e-3 epochs = 8 batchSize = 64
Now select whether to use CPU or GPU. It is highly encouraged that you use GPUs for training however this code will work for either.
if torch.cuda.is_available(): device = torch.device('cuda:0') else: device = torch.device('cpu') model = createNetwork() model = model.to(device) print('Device:',next(model.parameters()).device)
Then we load in our data, get our optimizer, and get our loss function.
digits_inputs, digits_labels = getData(device=device) digits = TensorDataset(digits_inputs, digits_labels) loader = DataLoader(digits, batch_size=batchSize, shuffle=True) optimizer = getOptimizer(model, learningRate) criterion = getLossFn()
At this point we can actually train the network using a fairly standard training loop. You can find similar versions of this simple training loop in the PyTorch documentation.
for epoch in range(epochs): tempLoss = 0 minibatch = 0 for inputs, labels in loader: minibatch += 1 # Zero the gradient optimizer.zero_grad() # Forward, backward, optmize classifications = model(inputs) loss = criterion(classifications, labels) loss.backward() optimizer.step() # Epoch loss tempLoss += loss.item() print("{:03d} :: loss = {:.3f}".format(epoch, tempLoss / minibatch))
Finally we save our trained network to storage
# Save the model torch.save(model.state_dict(), "./digit_prediction.pth")
Download source:
…/pytorch/train.py
Slurm file
In the slurm file we will load the module, request the gpu partition, request a single GPU, and then run the file without MPI. Recall that you should the same proportions of CPUs for a node as GPUs for the node. If you request half of the GPUs for a node you should also request half of the CPUs for that node.
#!/bin/bash #SBATCH --job-name=PyTorchtutorial #SBATCH --output=slurm.out #SBATCH --error=slurm.err #SBATCH --partition=gpu #SBATCH --gres=gpu:1 #SBATCH --qos=short+ #SBATCH --nodes=1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=8 #SBATCH --mem=30G module load Python/3.7.6-intel-2019a srun python train.py
Download source:
…/pytorch/run.slurm
Submitting the job
First sbatch the file.
[barajasc@taki-usr2 pytorchWebpage]$ sbatch run.slurm Submitted batch job 3033853
Then when the run finishes we should see the error file, the output file, and the network’s pth file.
[barajasc@taki-usr2 pytorchWebpage]$ ls -lh total 373K -rw-rw---- 1 barajasc pi_gobbert 335K Jul 20 14:07 digit_prediction.pth -rw-rw---- 1 barajasc pi_gobbert 618 Jul 20 13:08 helpers.py -rw-rw---- 1 barajasc pi_gobbert 570 Jul 20 13:06 network.py -rw-rw---- 1 barajasc pi_gobbert 315 Jul 20 13:25 run.slurm -rw-rw---- 1 barajasc pi_gobbert 663 Jul 20 14:07 slurm.err -rw-rw---- 1 barajasc pi_gobbert 175 Jul 20 14:07 slurm.out -rw-rw---- 1 barajasc pi_gobbert 1.4K Jul 14 19:43 train.py
The slurm.err file is not empty but this does not mean the code failed. The error file is full of information from LMod simply telling us that it had to load in some extra modules when loading the Python bundle.
[barajasc@taki-usr2 pytorchWebpage]$ cat slurm.err The following have been reloaded with a version change: 1) CUDA/10.1.243.87 => CUDA/10.1.243 2) GCCcore/7.3.0 => GCCcore/8.2.0 3) binutils/2.30-GCCcore-7.3.0 => binutils/2.31.1-GCCcore-8.2.0 4) icc/2018.3.222-GCC-7.3.0-2.30 => icc/2019.1.144-GCC-8.2.0-2.31.1 5) iccifort/2018.3.222-GCC-7.3.0-2.30 => iccifort/2019.1.144-GCC-8.2.0-2.31.1 6) ifort/2018.3.222-GCC-7.3.0-2.30 => ifort/2019.1.144-GCC-8.2.0-2.31.1 7) iimpi/2018b => iimpi/2019a 8) imkl/2018.3.222-iimpi-2018b => imkl/2019.1.144-iimpi-2019a 9) impi/2018.3.222-iccifort-2018.3.222-GCC-7.3.0-2.30 => impi/2018.4.274-iccifort-2019.1.144-GCC-8.2.0-2.31.1 10) intel/2018b => intel/2019a
Now our slurm.out file contains the information we want to know.
[barajasc@taki-usr2 pytorchWebpage]$ cat slurm.out Device: cuda:0 000 :: loss = 0.726 001 :: loss = 0.169 002 :: loss = 0.083 003 :: loss = 0.077 004 :: loss = 0.040 005 :: loss = 0.033 006 :: loss = 0.019 007 :: loss = 0.020
For more information on PyTorch please see their documentation https://pytorch.org/