Dataset and Device Info

This is an attempt to replicate the following paper as the hyperparameter link is not working in the paper.

arXiv:1302.4389 [stat.ML]

Dataset and Device Info

dataset: THE MNIST DATABASE
GPU: 1, 8GB, GM204GL [Tesla M60]
CPU: 4, 30.5 GiB
logs and model: here

The following diagram shows the maxout module with multilayer perceptrons.

MLP + Dropout

How to Run

Train: (first 50000 training data) - python mnist.py --mlp 1 --train true
Validation: (remaining 10000 training data) - python mnist.py --mlp 1 --valid true
Train Continuation: (whole train data, continue from previous training) - python mnist.py --mlp 1 --train_cont true
Testing: python mnist.py --mlp 1 --test true

For complete hyperparameter tuning check hyper-tuning.rst file.

Learning rate: 0.005

Training

+--------+------------+-------------------------+-------------------------+---------+--------+ | | | Layer1 | Layer2 | | | | Epochs | Batch size +------------+------------+------------+------------+ Accuracy| Loss | | | | Number of | Number of | Number of | Number of | (%) | | | | | layers | Neurons | layers | Neurons | | | +========+============+============+============+============+============+=========+========+ | 5 | 64 | 4 | 2048 | 2 | 10 | 97.79 | 1.5060 | +--------+------------+------------+------------+------------+------------+---------+--------+ | 5 | 64 | 4 | 1024 | 2 | 10 | 97.44 | 1.5107 | +--------+------------+------------+------------+------------+------------+---------+--------+

Validation

+---------+------------+-------------------------+-------------------------+---------+--------+ | | | Layer1 | Layer2 | | | Batch size +------------+------------+------------+------------+ Accuracy| Loss | | Epochs | | Number of | Number of | Number of | Number of | (%) | | | | | layers | Neurons | layers | Neurons | | | +=========+============+============+============+============+============+=========+========+ | 5 | 64 | 4 | 2048 | 2 | 10 | 96.94 | 1.5097 | +---------+------------+------------+------------+------------+------------+---------+--------+ | 5 | 64 | 4 | 1024 | 2 | 10 | 96.83 | 1.5108 | +---------+------------+------------+------------+------------+------------+---------+--------+

It has been trained further with whole training dataset with the following accuracies and loss.

Training with pretrained weights

+--------+------------+-------------------------+-------------------------+---------+----------+ | | | Layer1 | Layer2 | | | | Epochs | Batch size +------------+------------+------------+------------+ Accuracy| Loss | | | | Number of | Number of | Number of | Number of | (%) | | | | | layers | Neurons | layers | Neurons | | | +========+============+============+============+============+============+=========+==========+ | 5 | 64 | 4 | 2048 | 2 | 10 1.4827| +--------+------------+------------+------------+------------+------------+---------+----------+

Testing

+------------+-------------------------+-------------------------+---------+----------+ | | Layer1 | Layer2 | | | | Batch size +------------+------------+------------+------------+ Accuracy| Loss | | | Number of | Number of | Number of | Number of | (%) | | | | layers | Neurons | layers | Neurons | | | +============+============+============+============+============+=========+==========+ | 64 | 4 | 2048 | 2 | 10 1.5007| +------------+------------+------------+------------+------------+---------+----------+

3 Conv + MLP

How to Run

Train: (50000 shuffled training data) - python mnist.py --conv 1 --train true
Validation: (remaining 10000 training data) - python mnist.py --conv 1 --valid true
Train Continuation: (whole train data, continue from previous training) - python mnist.py --conv 1 --train_cont true
Testing: python mnist.py --conv 1 --test true

Learning Rate

First learning rate is set to 0.01. Then it is halved at epoch 5 for training of 50000 shuffled data. With least error for validation, it is retrained with the pretrained weights. But this time the starting learning rate is 0.001, it is halved at epoch 5.

The architecture presented in paper is as follows: conv -> maxpool -> conv -> maxpool -> conv -> maxpool -> MLP -> softmax. It is evident that the output of MLP is 10 and the input of MLP is whatever number comes from 3rd maxpool. Only I had to adjust was kernels, paddings of convolutional layers. Because those are the only parameters in the network.

Training

+--------+-------+--------------+---------------+--------------+---------------+--------------+---------------+----------+---------+------+ | | | Conv1 | Maxpool1 | Conv2 | Maxpool2 | Conv3 | Maxpool3 | MLP | | | | Epochs | Batch +--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+ Acc % | Loss | | | | kernel | pad | pool | stride | kernel | pad | pool | stride | kernel | pad | pool | stride | in | out | | | +========+=======+========+=====+======+========+========+=====+======+========+========+=====+======+========+====+=====+=========+======+ | 10 | 64 | 7 x 7 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 1.4921| +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 | 87.62 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 3 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 95.43 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 2 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 95.96 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+

Validation

+-------+--------------+---------------+--------------+---------------+--------------+---------------+----------+---------+------+ | | Conv1 | Maxpool1 | Conv2 | Maxpool2 | Conv3 | Maxpool3 | MLP | | | | Batch +--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+ Acc % | Loss | | | kernel | pad | pool | stride | kernel | pad | pool | stride | kernel | pad | pool | stride | in | out | | | +=======+========+=====+======+========+========+=====+======+========+========+=====+======+========+====+=====+=========+======+ | 64 | 7 x 7 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 1.4928| +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 | 87.76 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 3 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 95.16 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 2 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 96.15 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+

Training Continuation

+--------+-------+--------------+---------------+--------------+---------------+--------------+---------------+----------+---------+------+ | | | Conv1 | Maxpool1 | Conv2 | Maxpool2 | Conv3 | Maxpool3 | MLP | | | | Epochs | Batch +--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+ Acc % | Loss | | | | kernel | pad | pool | stride | kernel | pad | pool | stride | kernel | pad | pool | stride | in | out | | | +========+=======+========+=====+======+========+========+=====+======+========+========+=====+======+========+====+=====+=========+======+ | 10 | 64 | 7 x 7 | 3 1 | 5 x 5 | 2 | 2 | 1 | 5 x 5 | 2 | 2 | 1 10 1.4874| +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 3 1 | 5 x 5 | 2 | 2 | 1 | 5 x 5 | 2 | 2 | 1 10 | 88.04 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 3 1 | 3 x 3 | 2 | 2 | 1 | 3 x 3 | 2 | 2 | 1 10 | 96.25 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 10 | 64 | 5 x 5 | 2 1 | 3 x 3 | 2 | 2 | 1 | 3 x 3 | 2 | 2 | 1 10 | 96.75 +--------+-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+

Testing

+-------+--------------+---------------+--------------+---------------+--------------+---------------+----------+---------+------+ | | Conv1 | Maxpool1 | Conv2 | Maxpool2 | Conv3 | Maxpool3 | MLP | | | | Batch +--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+ Acc % | Loss | | | kernel | pad | pool | stride | kernel | pad | pool | stride | kernel | pad | pool | stride | in | out | | | +=======+========+=====+======+========+========+=====+======+========+========+=====+======+========+====+=====+=========+======+ | 64 | 7 x 7 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 1.4929| +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 3 1 | 5 x 5 | 2 1 | 5 x 5 | 2 1 10 | 87.39 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 3 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 95.52 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+ | 64 | 5 x 5 | 2 1 | 3 x 3 | 2 1 | 3 x 3 | 2 1 10 | 96.30 +-------+--------+-----+------+--------+--------+-----+------+--------+--------+-----+------+--------+----+-----+---------+------+

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
LICENSE		LICENSE
README.rst		README.rst
hyper-tuning.rst		hyper-tuning.rst
logs.py		logs.py
maxout-conv.png		maxout-conv.png
maxout-mlp.png		maxout-mlp.png
maxout.json		maxout.json
maxout.py		maxout.py
mean_std.py		mean_std.py
mnist.py		mnist.py
model.py		model.py
requirements.txt		requirements.txt
timer.py		timer.py
utils.py		utils.py

License

paniabhisek/maxout

Folders and files

Latest commit

History

Repository files navigation

Dataset and Device Info

MLP + Dropout

How to Run

Training

Validation

Training with pretrained weights

Testing

3 Conv + MLP

How to Run

Learning Rate

Training

Validation

Training Continuation

Testing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages