Skip to content

paniabhisek/maxout

Repository files navigation

This is an attempt to replicate the following paper as the hyperparameter link is not working in the paper.

arXiv:1302.4389 [stat.ML]

Dataset and Device Info

The following diagram shows the maxout module with multilayer perceptrons.

image

MLP + Dropout

How to Run

  • Train: (first 50000 training data) - python mnist.py --mlp 1 --train true
  • Validation: (remaining 10000 training data) - python mnist.py --mlp 1 --valid true
  • Train Continuation: (whole train data, continue from previous training) - python mnist.py --mlp 1 --train_cont true
  • Testing: python mnist.py --mlp 1 --test true

For complete hyperparameter tuning check hyper-tuning.rst file.

  • Learning rate: 0.005

Training

+--------+------------+-------------------------+-------------------------+---------+--------+ | | | Layer1 | Layer2 | | | | Epochs | Batch size +------------+------------+------------+------------+ Accuracy| Loss | | | | Number of | Number of | Number of | Number of | (%) | | | | | layers | Neurons | layers | Neurons | | | +========+============+============+============+============+============+=========+========+ | 5 | 64 | 4 | 2048 | 2 | 10 | 97.79 | 1.5060 | +--------+------------+------------+------------+------------+------------+---------+--------+ | 5 | 64 | 4 | 1024 | 2 | 10 | 97.44 | 1.5107 | +--------+------------+------------+------------+------------+------------+---------+--------+

Validation

+---------+------------+-------------------------+-------------------------+---------+--------+ | | | Layer1 | Layer2 | | | Batch size +------------+------------+------------+------------+ Accuracy| Loss | | Epochs | | Number of | Number of | Number of | Number of | (%) | | | | | layers | Neurons | layers | Neurons | | | +=========+============+============+============+============+============+=========+========+ | 5 | 64 | 4 | 2048 | 2 | 10 | 96.94 | 1.5097 | +---------+------------+------------+------------+------------+------------+---------+--------+ | 5 | 64 | 4 | 1024 | 2 | 10 | 96.83 | 1.5108 | +---------+------------+------------+------------+------------+------------+---------+--------+

It has been trained further with whole training dataset with the following accuracies and loss.

Training with pretrained weights

+--------+------------+-------------------------+-------------------------+---------+----------+ | | | Layer1 | Layer2 | | | | Epochs | Batch size +------------+------------+------------+------------+ Accuracy| Loss | | | | Number of | Number of | Number of | Number of | (%) | | | | | layers | Neurons | layers | Neurons | | | +========+============+============+============+============+============+=========+==========+ | 5 | 64 | 4 | 2048 | 2 | 10 1.4827| +--------+------------+------------+------------+------------+------------+---------+----------+

Testing

+------------+-------------------------+-------------------------+---------+----------+ | | Layer1 | Layer2 | | | | Batch size +------------+------------+------------+------------+ Accuracy| Loss | | | Number of | Number of | Number of | Number of | (%) | | | | layers | Neurons | layers | Neurons | | | +============+============+============+============+============+=========+==========+ | 64 | 4 | 2048 | 2 | 10 1.5007| +------------+------------+------------+------------+------------+---------+----------+

3 Conv + MLP

image

How to Run

  • Train: (50000 shuffled training data) - python mnist.py --conv 1 --train true
  • Validation: (remaining 10000 training data) - python mnist.py --conv 1 --valid true
  • Train Continuation: (whole train data, continue from previous training) - python mnist.py --conv 1 --train_cont true
  • Testing: python mnist.py --conv 1 --test true

Learning Rate

First learning rate is set to 0.01. Then it is halved at epoch 5 for training of 50000 shuffled data. With least error for validation, it is retrained with the pretrained weights. But this time the starting learning rate is 0.001, it is halved at epoch 5.


The architecture presented in paper is as follows: conv -> maxpool -> conv -> maxpool -> conv -> maxpool -> MLP -> softmax. It is evident that the output of MLP is 10 and the input of MLP is whatever number comes from 3rd maxpool. Only I had to adjust was kernels, paddings of convolutional layers. Because those are the only parameters in the network.

Training

+--------+-------+--------------+---------------+--------------+---------------+--------------+---------------+----------+---------+------+ | | | Conv1 | Maxpool1 | Conv2 | Maxpool2 | Conv3 | Maxpool3 | MLP | | | | Epochs | Batch +--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+ Acc % | Loss | | | | kernel | pad | pool | stride | kernel | pad | pool | stride | kernel | pad | pool | stride | in | out | | | +========+=======+========+=====+======+========+========+=====+======+========+========+=====+======+========+====+=====+=========+======+ | 10 | 64 | 7 x 7 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 1.4921| +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 | 87.62 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 3 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 95.43 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 2 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 95.96 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+

Validation

+-------+--------------+---------------+--------------+---------------+--------------+---------------+----------+---------+------+ | | Conv1 | Maxpool1 | Conv2 | Maxpool2 | Conv3 | Maxpool3 | MLP | | | | Batch +--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+ Acc % | Loss | | | kernel | pad | pool | stride | kernel | pad | pool | stride | kernel | pad | pool | stride | in | out | | | +=======+========+=====+======+========+========+=====+======+========+========+=====+======+========+====+=====+=========+======+ | 64 | 7 x 7 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 1.4928| +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 | 87.76 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 3 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 95.16 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 2 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 96.15 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+

Training Continuation

+--------+-------+--------------+---------------+--------------+---------------+--------------+---------------+----------+---------+------+ | | | Conv1 | Maxpool1 | Conv2 | Maxpool2 | Conv3 | Maxpool3 | MLP | | | | Epochs | Batch +--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+ Acc % | Loss | | | | kernel | pad | pool | stride | kernel | pad | pool | stride | kernel | pad | pool | stride | in | out | | | +========+=======+========+=====+======+========+========+=====+======+========+========+=====+======+========+====+=====+=========+======+ | 10 | 64 | 7 x 7 | 3 1 | 5 x 5 | 2 | 2 | 1 | 5 x 5 | 2 | 2 | 1 10 1.4874| +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 3 1 | 5 x 5 | 2 | 2 | 1 | 5 x 5 | 2 | 2 | 1 10 | 88.04 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 3 1 | 3 x 3 | 2 | 2 | 1 | 3 x 3 | 2 | 2 | 1 10 | 96.25 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 2 1 | 3 x 3 | 2 | 2 | 1 | 3 x 3 | 2 | 2 | 1 10 | 96.75 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+

Testing

+-------+--------------+---------------+--------------+---------------+--------------+---------------+----------+---------+------+ | | Conv1 | Maxpool1 | Conv2 | Maxpool2 | Conv3 | Maxpool3 | MLP | | | | Batch +--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+ Acc % | Loss | | | kernel | pad | pool | stride | kernel | pad | pool | stride | kernel | pad | pool | stride | in | out | | | +=======+========+=====+======+========+========+=====+======+========+========+=====+======+========+====+=====+=========+======+ | 64 | 7 x 7 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 1.4929| +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 | 87.39 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 3 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 95.52 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 2 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 96.30 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+

Releases

No releases published

Packages

No packages published

Languages