Skip to content

vuanhtuan1012/deep-learning-with-pytorch

Repository files navigation

Deep Learning with PyTorch: Zero to GANs

This is a summarization of the course Deep Learning with PyTorch: Zero to GANs at Jovian.

  1. PyTorch Basic: basic operators of PyTorch.
  2. Linear Regression: train a linear regression from scratch and using PyTorch built-ins function.
  3. Logistic Regression: classify handwritten digits using MNIST handwritten digit database as training dataset.
  4. Insurance Cost Prediction: predict the price of yearly medical bills based on personal information.
  5. Training Deep Neural Networks on a GPU: identify handwritten digits from MNIST dataset by using a neural network.
  6. Classifying Images of Everyday Objects: build a neural network multi hidden layers to classify images of objects using CIFAR10 dataset.

Certificate

enter image description here

Completion Certificate for JovianML's Deep Learning with PyTorch: Zero to GANs

I. PyTorch Basic

Tensor

  • Create a tensor
t1 = torch.tensor(4.)  # single number
t2 = torch.tensor([1, 2, 3, 4])  # vector
t3 = torch.tensor([[5, 6]
                   [7, 8],
                   [9, 10]
                 ])  # matrix
  • Tensor attributes:
    • t.dtype : the type of a tensor like float32, double64, etc.
    • t.shape : the size of a tensor like torch.Size([4]), torch.Size([3, 2])

Tensor operations and gradients

  1. Operations
x = torch.tensor(3.)
w = torch.tensor(4., requires_grad = True)
b = torch.tensor(5., requires_grad = True)
y = w*x + b
  • requires_grad = True to set that we will compute and
  1. Compute gradients
y.backward()
print('dy/dx = ', x.grad)
print('dy/dw = ', w.grad)
print('dy/db = ', b.grad)
  • y.backward() : computes the derivatives of y with respect to the input tensors x, w, b.
  • the tensor attribute grad stores the derivative of y of the respective tensors.
    • x.grad stores . In this case, it is None since x doesn't have requires_grad = True
    • w.grad stores . In this case, it is the value of x, tensor(3.).
    • b.grad stores . In this case, it is tensor(1.)
  1. Interoperability with Numpy
  • Convert a Numpy array to a PyTorch tensor, using torch.from_numpy
x = np.array([[1, 2],
              [3, 4]
            ])
y = torch.from_numpy(x)
type(x), type(y)
  • Convert a tensor to a numpy, using the method numpy
z = y.numpy()
type(z)

II. Linear Regression

This part mentions how to train a linear regression model in PyTorch in two ways:

Linear Regression supposes that there's a linear relation between inputs and outputs (targets).

A. Linear Regression from scratch

The figure below presents the workflow of this section.

  • Convert inputs & targets to tensors: convert data (inputs & targets) from numpy arrays to tensors.
  • Initialize parameters: identify the number of samples, of features and of targets. Initialize weights and bias to predict target. Theses parameters will be optimized in training process.
  • Define functions: create hypothesis function (model) to predict target from input, and cost function (loss function) to compute the difference between the prediction and the target.
  • Train model: find the optimal values of the parameters (weights & bias) by using gradient descent algorithm.

⚠️ Make sure reset gradients to zero before the next iteration.

  • Predict: using optimal parameters to predict target from a given input.

Import libraries

import numpy as np
import torch

1. Prepare data

X = torch.from_numpy(inputs)
Y = torch.from_numpy(targets)

2. Initialize parameters

# get number of samples (m) and features (n)
m, n = X.shape

# get number of outputs
_, a = Y.shape

# initialize parameters
W = torch.randn(a, n, requires_grad=True)  # weights
b = torch.randn(a, requires_grad=True)  # bias
  • the function torch.randn() creates a tensor with the given shape with elements picked from a normal distribution with mean 0 and standard deviation 1.
  • In steps after we will optimize parameters W, b by using gradient descent, so we will need to compute dy/dW and dy/db. That's why W and b are set requires_grad=True.

3. Define functions

Hypothesis function (model)

Predicts y from x and parameters W, b.

def model(X, W, b):
    Y_hat = X @ W.t() + b
    return Y_hat
  • the operator @ indicates that we want to do matrix multiplication.
  • the method t() returns the transpose of a tensor.
Cost function (loss function)

Computes the difference between predicted values Y_hat and output values Y.

def cost_fn(Y_hat, Y):
    diff = Y_hat - Y
    return torch.sum(diff * diff)/diff.numel()
  • the function torch.sum() returns the sum of all the elements in a tensor.
  • the method numel() returns the number of elements in a tensor.

4. Train model

✍️ Gradient Descent: this algorithm repeats the process of adjusting the weights and biases using the gradients multiple times to reduce the loss.

  • Each iteration is called an epoch.
epochs = 100  # number of iteration
lr = 1e-5  # learning rate
for i in range(epochs):
    Y_hat = model(X, W, b)
    cost = cost_fn(Y_hat, Y)
    cost.backward()  # compute derivatives
    # update parameters
    with torch.no_grad():
        W -= W.grad * lr
        b -= b.grad * lr
        W.grad.zero_()
        b.grad.zero_()
  • the method cost.backward() computes the derivatives of cost with respect to W and b.
  • the function torch.no_grad() indicates to PyTorch that we shouldn't track, calculate, or modify gradients while updating parameters W and b
  • the method grad.zero_() resets the gradients to zero. As PyTorch accumulates gradients, we need to reset them before the next time we invoke backward() on the loss.

5. Predict

x = torch.tensor([[75, 63, 44.]])
y_hat = model(x)
print(y_hat.tolist())
  • the method tolist() converts a vector tensor in list
  • the method item() returns the value of a single tensor.

B. Linear Regression using PyTorch built-ins

The figure below presents the workflow of this section.

  • Convert inputs & targets to tensors: convert data (inputs & targets) from numpy arrays to tensors.

⚠️ Make sure that numpy arrays are in data type float32.

  • Define dataset & dataloader:
    • dataset are tuples of inputs & targets.
    • dataloader shuffles the dataset and divides a dataset into batches.
  • Define functions:
    • identify the number of features and of targets, set model is a linear function.
    • set cost function is a mean squared loss function.
  • Define optimizer: identifies the algorithm using to adjust model parameters. Set optimzer to use stochastic gradient descent algorithm.
  • Train model: find the optimal values of model parameters by repeating the process of optimizing.

⚠️ Make sure reset gradients to zero before the next iteration.

  • Predict: using optimal parameters to predict target from a given input.

Import libraries

import torch.nn as nn
from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader
import torch.nn.functional as F

1. Prepare data

Convert inputs & targets to tensors
X = torch.from_numpy(inputs)
Y = torch.from_numpy(targets)
Define dataset & data loader
dataset = TensorDataset(X, Y)
batch_size = 5
dataloader = DataLoader(dataset, batch_size, shuffle=True)
  • TensorDataset returns a tuple of two elements in which the first one contains the inputs and the second one contains the outputs.
    • Allow to access a small section of the dataset using the array indexing notation.
  • DataLoader splits the dataset into batches of a predefined size while training.
    • batch_size indicates how many samples in a batch. For example, if dataset contains 15 samples and batch_size = 5, dataloader will point to 3 batches, each batch contain 5 samples.
    • shuffle=True means the dataset will be shuffled before creating batches. It helps randomize the input to the optimization algorithm, leading to a faster reduction in the loss.
    • Access the elements of data loader by using for loop.
for batch in dataloader:
    print(batch)
    xs, ys = batch
    print(xs.data); print(ys.data)
  • The idea of data loader is that if the dataset is too big it takes time to train the whole dataset multiple times. Therefore, instead of training whole dataset, we devide the dataset into batches and at each batch iteraton (for batch in dataloader), we only train samples in one batch. We need some (len(dataset)/batch_size) iterations to train the whole dataset.

2. Define functions

Hypothesis function (model)
# get number of samples (m) and of features (n)
m, n = X.shape

# get number of outputs
_, a = Y.shape

# define hypothesis function
model = nn.Linear(n, a)

print(model.weight)
print(model.bias)
print(list(model.parameters()))
  • Model attributes weight and bias contains the weights and bias of a model.
  • the method parameters() return a generator of a list containing the weights and bias of a model.
Cost function (loss function)
cost_fn = F.mse_loss
  • the function mse_loss() measures the element-wise mean squared error. It takes two obligatory inputs: input and target (more detail).

3. Define optimizer

Optimizer identifies the algorithm using to adjust model parameters.

opt = torch.optim.SGD(model.parameters(), lr=1e-5)  # use the algorithm stochastic gradient descent
  • the function torch.optim.SGD optimizes parameters which are passed in model.parameters() with the learning rate passed in the parameter lr.
  • SGD stands for stochastic gradient descent. The terms stochastic indicates that samples are selected in random batches instead of as a single group.

4. Train model

def fit(epochs, model, cost_fn, opt, dataloader):
    for i in range(epochs):
        for batch in dataloader:
            xs, ys = batch
            ys_hat = model(xs)
            cost = cost_fn(ys_hat, ys)
            cost.backward()
            opt.step()  # adjust model parameters
            opt.zero_grad()  # reset gradients to zero

fit(100, model, cost_fn, opt, dataloader)
  • the optimizer method step() updates parameters (weights and bias).
  • the optimizer method zero_grad() resets the gradients to zero.

5. Predict

x = torch.tensor([[75, 63, 44.]])
y_hat = model(x)
print(y_hat.data)

The complete code of this part is in the notebook linear regression.ipynb.

III. Logistic Regression

This part mentions how to train a model to classify handwritten digits. We will use the famous MNIST handwritten digit database as our training dataset. It consists of 28 x 28 pixels grayscale images of handwritten digits (0 to 9) and labels for each image indicating which digit it represents. The trained model is save to file after the training process.

Here are some sample images from the dataset:

(image source: researchgate.net)

We suppose that there'are linear lines separating digit groups.

Workflow

Import libraries

import torch
import torchvision
from torchvision.datasets import MNIST
import torchvision.transforms as transforms
from torch.utils.data import random_split
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.nn.functional as F
import matplotlib.pyplot as plt

1. Prepare data

1.1 Convert train & test images to tensors

dataset = MNIST(root='data/', train=True, transform=transforms.ToTensor(), download=True)
test_ds = MNIST(root='data/', train=False, transform=transforms.ToTensor())
  • the first line will download images from MNIST handwritten digit database to the directory data and create a Pytorch dataset. This dataset contains 60 000 images. We will use this dataset to train the model.
  • the second line will create a Pytorch dataset containing 10 000 images. We use this dataset to evaluate models. We don't need to download images since they're already downloaded.

1.2 Split train images to train & validation sets

data_size = len(dataset)
train_size = round(data_size*0.8)
val_size = data_size - train_size
train_ds, val_ds = random_split(dataset, [train_size, val_size])
  • the PyTorch method random_split choose a random sample of size val_size for creating a validation dataset, and a random sample of size train_size for creating a training dataset. There's no ntersection sample of these two datasets.

1.3 Define batch size and dataloaders

batch_size = 128
train_loader = DataLoader(train_ds, batch_size, shuffle=True)
val_loader = DataLoader(val_ds, batch_size*2)
test_loader = DataLoader(test_ds, batch_size*2)

2. Create model

In this section we build a model class containing four methods:

  • forward() computes linear predictions of outputs from tensor inputs.
  • predict() predicts label from a linear predictions.
  • cost_func() measures the difference between predicted and real label.
  • evaluate_batch() evaluates a batch on two criteria: the cost and the accuracy.
class MnistModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(in_features, out_classes)

    def forward(self, X):
        X = X.reshape(-1, self.linear.in_features)
        Y_linear = self.linear(X)
        return Y_linear

    # predict label
    def predict(self, X):
        Y_linear = self(X)
        probs = F.softmax(Y_linear.detach(), dim=1)
        _, Y_hat = torch.max(probs, dim=1)
        return Y_hat
  • the method reshape() indicates to PyTorch that we want a view of X with two dimensions. The first dimension -1 let PyTorch figure it out automatically based on the shape of the original tensor.
  • self(X) will call the method forward(). Therefore, its result is the result of self.forward(X)
  • the function F.softmax() convert the results of linear computations into probabilities.
  • the function torch.max() returns each row's largest element and the corresponding index. dim=1 indicates to PyTorch that we want to find maximal values based on rows.
  • the method detach() indicates PyTorch disables automatic differentiation.
# compute cost
def cost_func(self, batch):
    images, labels = batch
    Y_linear = self(images)
    cost = F.cross_entropy(Y_linear, labels)
    return cost
  • the function cross_entropy() is a continuous and differentiable function. It performs softmax internally, so we can directly pass the Y_linear into this function without converting them into probabilities.
# evaluate a batch
def evaluate_batch(self, batch):
    images, labels = batch
    Y_hat = self.predict(images)
    acc = torch.sum(Y_hat == labels).item()/len(Y_hat)
    Y_linear = self(images)
    cost = F.cross_entropy(Y_linear.detach(), labels).item()
    res = {
        'cost': cost,
        'accuracy': acc
    }
    return res
  • torch.sum(Y_hat == labels) computes the number of right prediction.

3. Define optimizer

We use gradient descent to adjust model parameters.

lr = 1e-3  # learning rate
optimizer = torch.optim.SGD(model.parameters(), lr)

4. Train model

4.1. Training phase

# training phase
for batch in train_loader:
    cost = model.cost_func(batch)  # compute cost
    cost.backward()  # compute gradients
    optimizer.step()  # adjust model parameters
    optimizer.zero_grad()  # reset gradients to zero

4.2 Validation phase

# evaluate a batch
def evaluate_batch(self, batch):
    images, labels = batch
    Y_hat = self.predict(images)
    acc = torch.sum(Y_hat == labels).item()/len(Y_hat)
    Y_linear = self(images)
    cost = F.cross_entropy(Y_linear.detach(), labels).item()
    log = {
        'cost': cost,
        'accuracy': acc
    }
    return log

5. Save model

filename = 'mnist_logistic.pth'
torch.save(model.state_dict(), filename)
  • the method state_dict() returns an OrderedDict containing all the weights and bias matrices mapped to the right attributes of the model.
  • to load the model we can instantiate a new object of the class MnistModel and use the method load_state_dict()
# load model from file
model2 = MnistModel(in_features, out_classes)
model2.load_state_dict(torch.load(filename))

The complete code of this part is in the notebook logistic regression.ipynb.

IV. Insurance Cost Prediction

In this part, we're going to use information like a person's age, sex, BMI, number of children and smoking habit to predict the price of yearly medical bills. This kind of model is useful for insurance companies to determine the yearly insurance premium for a person.

The dataset for this problem is taken from Kaggle.

Workflow

The figure below presents the workflow of the training process.

  • Prepare data: dowload data in CSV file from the URL source, customize data, convert categorical data into numbers, convert numpy arrays to tensors, define datasets & data loaders, and explorer data.
  • Create model: since the target is continuous, we'll use the linear regression model for this problem.
  • Define optimizer: we'll use gradient descent to adjust model parameters.
  • Train model: train model on training set and perform evaluation on validation set.
  • Make predictions: carry out some predictions on validation set.

1. Prepare data

  • Download data from URL source
DATASET_URL = "https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv"
DATA_FILENAME = "insurance.csv"
download_url(DATASET_URL, '.')
  • Customize data: create a particular data from the raw data. This is to test the ability of building a model that can works well with different range of features and large scale of targets.
rand_str = "anh-tuan"
df = customize_data(df_raw, rand_str)
df.head()
  • Convert categorical data to numbers: since there'are categorical data in inputs, we need to convert them to numbers before training.
input_cols = ['age', 'sex', 'bmi', 'children', 'smoker']
cat_cols = ['sex', 'smoker']
output_cols = ['charges']

inputs, targets = df_to_arrays(df, input_cols, cat_cols, output_cols)
  • Convert numpy arrays to tensors: before converting numpy arrays to tensors, we need to make sure they are in the data type float32.
X = torch.from_numpy(inputs.astype('float32'))
Y = torch.from_numpy(targets.astype('float32'))
X.dtype, Y.dtype
  • Define datasets & data loaders: split the dataset into training set & validation set, then create corresponding train loader & validation loader.
dataset = TensorDataset(X, Y)
train_ds, val_ds = random_split(dataset, [train_size, val_size])
train_loader = DataLoader(train_ds, batch_size*2, shuffle=True)
val_loader = DataLoader(val_ds, batch_size)
  • Explorer data: show brief statistics on inputs & targets.

2. Create model

In this section, we create a model class consisting of four methods:

  • forward(): estimates the output from the input.
  • cost_func(): computes the error of estimation.
  • predict(): predicts the output from the input.
  • evaluate_batch(): computes the error of the estimation of a batch.

Based on statistics on inputs & targets we found that features are in different ranges, so it needs to perform feature normalization in the method forward().

def forward(self, X):
    X_norm = normalize_features(X)
    Y_hat = self.linear(X_norm)
    return Y_hat

The target range is too large, so we perform scale down targets in cost_func() and scale up estimations in predict() with the same scale ratio to reduce the cost.

def cost_func(self, batch):
    X, Y = batch
    Y_hat = self(X)
    cost = F.mse_loss(Y_hat, scale_down(Y))
    return cost

def predict(self, X):
    Y_hat = self(X)
    return scale_up(Y_hat.detach())

3. Define optimizer

Specifying gradient descent is used to adjust model parameters.

# define optimizer
lr = 1e-2
optimizer = torch.optim.SGD(model.parameters(), lr=lr)

4. Train model

  • Optimize the model parameters on training set
for batch in train_loader:
    cost = model.cost_func(batch)
    cost.backward()  # compute gradients
    optimizer.step()  # adjust model parameters
    optimizer.zero_grad()  # reset gradients to zero
  • Evaluate the effect of model on validation set
batch_logs = [model.evaluate_batch(batch) for batch in val_loader]
epoch_log = evaluate_epoch(batch_logs)
logs.append(epoch_log)

5. Make predictions

Performing some estimations on validation set.

x, y = val_ds[0]
yh = model.predict(x)
cost = model.evaluate_batch((x, y))['cost']
print("x = {}, y = {}".format(x.tolist(), round(y.item(),2)))
print("- yh = {}\n- cost = {}".format(round(yh.item(),2), round(cost.item(),2)))

The complete code of this part is in the notebook insurance cost prediction.ipynb.

V. Train Deep Neural Networks on a GPU

In this part, we're going to build a neural network of three layers (input layer, output layer, and a hidden layer) to identify handwritten digits from MNIST dataset. We also use GPU to train our models if available.

The workflow to predict output class from input units is presented in the figure below.

  1. Input units are passed to a activation function to compute activation units of the hidden layer. In this problem, we choose RELU (Rectified Linear Unit) function as activation function.
  2. Activation units are passed to a linear function to compute the linear predictions. These values then passed to softmax function to calculate probabilities belonging to output classes. The class having maximal probability will be considered as the prediction.

⚠️ NOTE:

  • If we choose a linear function as activation function, the neural network will become the logistic regression since the combination of two linear functions is a linear function.
  • RELU is a non-linear function. By using RELU function as the activation function, we suppose that for each input certain activation units are activated (those with non-zero values) while others turned off (those with zero values).
  • Softmax function rescales an n-dimensional value so that its elements lie in the range [0, 1] and sum to 1.

The code of this part is similar to the one in the part Logistic Regression, except two methods __init__() and forward() of the class MnistModel.

def __init__(self, in_features:int, hidden_size:int,
             out_classes:int):
    super().__init__()
    self.linear1 = nn.Linear(in_features, hidden_size)
    self.linear2 = nn.Linear(hidden_size, out_classes)

def forward(self, X:torch.tensor) -> torch.tensor:
    # flatten image(s)
    X = X.reshape(-1, self.linear1.in_features)
    # compute activation units
    Z = self.linear1(X)
    A = F.relu(Z)
    # compute probabilities
    Y_linear = self.linear2(A)
    return Y_linear

As the sizes of our models and datasets increase, we need to use GPUs to train our models within a reasonable amount of time. GPUs contain hundreds of cores optimized for performing expensive matrix operations on floating-point numbers quickly, making them ideal for training deep neural networks.

def getDefaultDevice():
    """Pick GPU if available, else CPU
    """
    if torch.cuda.is_available():
        return torch.device("cuda")
    return torch.device("cpu")


def toDevice(data, device:torch.device):
    """Move tensor(s) to chosen device

    Args:
        device (torch.device): device to move tensor(s) to
    """
    if isinstance(data, (list, tuple)):
        return [toDevice(x, device) for x in data]
    return data.to(device, non_blocking=True)

class DeviceDataLoader():
    """This class is to wrap data loader and move batches of data
    to the selected device
    """
    def __init__(self, dataloader:DataLoader, device:torch.device):
        self.dataloader = dataloader
        self.device = device

    def __iter__(self):
        """This method is to retrieve batches of data
        """
        for batch in self.dataloader:
            yield toDevice(batch, self.device)

    def __len__(self):
        """This method is to get the number of batches
        """
        return len(self.dataloader)

Move model, data loaders to chosen device.

device = getDefaultDevice()
toDevice(model, device)
train_loader = DeviceDataLoader(train_loader, device)
val_loader = DeviceDataLoader(val_loader, device)

The evaluation on test set shows that this neural network gives a little better result in comparison with logistic regression (accuracy = 0.95 vs 0.92).

The complete code of this part is in the notebook deep neural networks with gpu.ipynb.

VI. Classifying Images of Everyday Objects

In this part, we'll build a neural networks with many hidden layers to classify images of CIFAR10 dataset. This dataset consists of 60,000 32x32 colour images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images.

The whole process of classifying is similar to the one of Training Deep Neural Networks on a GPU, except that the neural network class Cifar10Model has dynamique hidden layers, the number of hidden layers depends on the given hidden sizes.

We also use GPU to train model if available.

class Cifar10Model(nn.Module):
    def __init__(self, in_features:int, out_classes:int,
                 hidden_sizes:list):
        super().__init__()
        self.linear1 = nn.Linear(in_features, hidden_sizes[0])
        self.nb_hidden_layers = len(hidden_sizes)
        for i in range(self.nb_hidden_layers):
            func_name = "linear%s" % (i+2)
            if i+1 == self.nb_hidden_layers:
                func = nn.Linear(hidden_sizes[i], out_classes)
            else:
                func = nn.Linear(hidden_sizes[i], hidden_sizes[i+1])
            setattr(self, func_name, func)

    def forward(self, X:torch.tensor) -> torch.tensor:
        """Compute linear prediction of image(s)

        Args:
            X (torch.tensor): input image(s)

        Returns:
            torch.tensor: linear prediction of image(s)
        """
        # flatten image
        X = X.reshape(-1, self.linear1.in_features)
        # compute activation units of hidden layers
        A = X
        for i in range(self.nb_hidden_layers):
            func = getattr(self, "linear%s" % (i+1))
            Z = func(A)
            A = F.relu(Z)
        # compute linear prediction
        func = getattr(self, "linear%s" % (self.nb_hidden_layers+1))
        Y_linear = func(A)
        return Y_linear

The complete code of this part is in the notebook classifying images of everyday objects.ipynb.