This is an attempt to replicate the following paper as the hyperparameter link is not working in the paper.
arXiv:1302.4389 [stat.ML]
- dataset: THE MNIST DATABASE
- GPU: 1, 8GB, GM204GL [Tesla M60]
- CPU: 4, 30.5 GiB
- logs and model: here
The following diagram shows the maxout module with multilayer perceptrons.
- Train: (first 50000 training data) - python mnist.py --mlp 1 --train true
- Validation: (remaining 10000 training data) - python mnist.py --mlp 1 --valid true
- Train Continuation: (whole train data, continue from previous training) - python mnist.py --mlp 1 --train_cont true
- Testing: python mnist.py --mlp 1 --test true
For complete hyperparameter tuning check hyper-tuning.rst
file.
- Learning rate: 0.005
+--------+------------+-------------------------+-------------------------+---------+--------+ | | | Layer1 | Layer2 | | | | Epochs | Batch size +------------+------------+------------+------------+ Accuracy| Loss | | | | Number of | Number of | Number of | Number of | (%) | | | | | layers | Neurons | layers | Neurons | | | +========+============+============+============+============+============+=========+========+ | 5 | 64 | 4 | 2048 | 2 | 10 | 97.79 | 1.5060 | +--------+------------+------------+------------+------------+------------+---------+--------+ | 5 | 64 | 4 | 1024 | 2 | 10 | 97.44 | 1.5107 | +--------+------------+------------+------------+------------+------------+---------+--------+
+---------+------------+-------------------------+-------------------------+---------+--------+ | | | Layer1 | Layer2 | | | Batch size +------------+------------+------------+------------+ Accuracy| Loss | | Epochs | | Number of | Number of | Number of | Number of | (%) | | | | | layers | Neurons | layers | Neurons | | | +=========+============+============+============+============+============+=========+========+ | 5 | 64 | 4 | 2048 | 2 | 10 | 96.94 | 1.5097 | +---------+------------+------------+------------+------------+------------+---------+--------+ | 5 | 64 | 4 | 1024 | 2 | 10 | 96.83 | 1.5108 | +---------+------------+------------+------------+------------+------------+---------+--------+
It has been trained further with whole training dataset with the following accuracies and loss.
+--------+------------+-------------------------+-------------------------+---------+----------+ | | | Layer1 | Layer2 | | | | Epochs | Batch size +------------+------------+------------+------------+ Accuracy| Loss | | | | Number of | Number of | Number of | Number of | (%) | | | | | layers | Neurons | layers | Neurons | | | +========+============+============+============+============+============+=========+==========+ | 5 | 64 | 4 | 2048 | 2 | 10 1.4827| +--------+------------+------------+------------+------------+------------+---------+----------+
+------------+-------------------------+-------------------------+---------+----------+ | | Layer1 | Layer2 | | | | Batch size +------------+------------+------------+------------+ Accuracy| Loss | | | Number of | Number of | Number of | Number of | (%) | | | | layers | Neurons | layers | Neurons | | | +============+============+============+============+============+=========+==========+ | 64 | 4 | 2048 | 2 | 10 1.5007| +------------+------------+------------+------------+------------+---------+----------+
- Train: (50000 shuffled training data) - python mnist.py --conv 1 --train true
- Validation: (remaining 10000 training data) - python mnist.py --conv 1 --valid true
- Train Continuation: (whole train data, continue from previous training) - python mnist.py --conv 1 --train_cont true
- Testing: python mnist.py --conv 1 --test true
First learning rate is set to 0.01
. Then it is halved at epoch 5
for training of 50000
shuffled data. With least error for validation, it is retrained with the pretrained weights. But this time the starting learning rate is 0.001
, it is halved at epoch 5
.
The architecture presented in paper is as follows: conv -> maxpool -> conv -> maxpool -> conv -> maxpool -> MLP -> softmax
. It is evident that the output of MLP is 10
and the input of MLP is whatever number comes from 3rd maxpool
. Only I had to adjust was kernels, paddings of convolutional layers. Because those are the only parameters in the network.
+--------+-------+--------------+---------------+--------------+---------------+--------------+---------------+----------+---------+------+ | | | Conv1 | Maxpool1 | Conv2 | Maxpool2 | Conv3 | Maxpool3 | MLP | | | | Epochs | Batch +--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+ Acc % | Loss | | | | kernel | pad | pool | stride | kernel | pad | pool | stride | kernel | pad | pool | stride | in | out | | | +========+=======+========+=====+======+========+========+=====+======+========+========+=====+======+========+====+=====+=========+======+ | 10 | 64 | 7 x 7 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 1.4921| +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 | 87.62 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 3 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 95.43 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 2 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 95.96 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+
+-------+--------------+---------------+--------------+---------------+--------------+---------------+----------+---------+------+ | | Conv1 | Maxpool1 | Conv2 | Maxpool2 | Conv3 | Maxpool3 | MLP | | | | Batch +--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+ Acc % | Loss | | | kernel | pad | pool | stride | kernel | pad | pool | stride | kernel | pad | pool | stride | in | out | | | +=======+========+=====+======+========+========+=====+======+========+========+=====+======+========+====+=====+=========+======+ | 64 | 7 x 7 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 1.4928| +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 | 87.76 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 3 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 95.16 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 2 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 96.15 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+
+--------+-------+--------------+---------------+--------------+---------------+--------------+---------------+----------+---------+------+ | | | Conv1 | Maxpool1 | Conv2 | Maxpool2 | Conv3 | Maxpool3 | MLP | | | | Epochs | Batch +--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+ Acc % | Loss | | | | kernel | pad | pool | stride | kernel | pad | pool | stride | kernel | pad | pool | stride | in | out | | | +========+=======+========+=====+======+========+========+=====+======+========+========+=====+======+========+====+=====+=========+======+ | 10 | 64 | 7 x 7 | 3 1 | 5 x 5 | 2 | 2 | 1 | 5 x 5 | 2 | 2 | 1 10 1.4874| +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 3 1 | 5 x 5 | 2 | 2 | 1 | 5 x 5 | 2 | 2 | 1 10 | 88.04 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 3 1 | 3 x 3 | 2 | 2 | 1 | 3 x 3 | 2 | 2 | 1 10 | 96.25 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 2 1 | 3 x 3 | 2 | 2 | 1 | 3 x 3 | 2 | 2 | 1 10 | 96.75 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+
+-------+--------------+---------------+--------------+---------------+--------------+---------------+----------+---------+------+ | | Conv1 | Maxpool1 | Conv2 | Maxpool2 | Conv3 | Maxpool3 | MLP | | | | Batch +--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+ Acc % | Loss | | | kernel | pad | pool | stride | kernel | pad | pool | stride | kernel | pad | pool | stride | in | out | | | +=======+========+=====+======+========+========+=====+======+========+========+=====+======+========+====+=====+=========+======+ | 64 | 7 x 7 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 1.4929| +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 | 87.39 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 3 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 95.52 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 2 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 96.30 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+