tau-ResNet

This repo implements the experiments presented in "Convergence Theory of Learning Over-parameterized ResNet: A Full Characterization" (https://arxiv.org/abs/1903.07120).

The codes are tested using Python 3.6 and Pytorch 1.0.

Experiments on CIFAR10

First open the cifar floder.

You can train a Resnet110 baseline model by the following command. Other ResNets are supported for depth 20/32/44/56/1202. The program would download the dataset automatically if this is first running.

CUDA_VISIBLE_DEVICES=0 python cifar_train.py --arch resnet110 --sess baseline

The network architecture and hyperparameters are the same as in "Deep Residual Learning for Image Recognition". The result can be found in the result folder.

Beyond the original ResNet, we suggest using a scale factor tau=O(1/sqrt(L)) on top of each residual block, where L is the number of residual blocks (54 for resnet110).

The following command trains the Resnet110 model with tau=1/sqrt(54)=0.136.

CUDA_VISIBLE_DEVICES=0 python cifar_train.py --arch resnet110 --tau 0.136 --sess tau0.136

The following chart shows the comparison between tau-ResNet and ResNet with different depths. We can see that tau-ResNet improves over original ResNet by a considerable margin, especially for deep ResNet.

You can also train Resnet model without batch normalization. The following command trains Resnet110 model without any bn layer.

CUDA_VISIBLE_DEVICES=0 python cifar_train.py --arch resnet110_nobn --tau 0.136 --sess tau0.136_nobn

In practice, one can fine tune tau (i.e. tau=0.5/sqrt(L)) to achieve slightly better performeance.

Experiments on ImageNet

Go to the imagenet folder. You need to download the ImageNet classification dataset from http://www.image-net.org/challenges/LSVRC/2012/ first.

We set L as the largest number of blocks over all stages and choose tau=2/sqrt(L) for ImageNet. The following command trains the Resnet101 model with tau=0.4. Other ResNets are supported with depth 50/152.

python imagenet_train.py --arch resnet101 --tau 0.4 --sess imagenet_tau0.4 --data_dir data_folder_path

The following command trains the Resnet101 model without batch normalization.

python imagenet_train.py --arch resnet101_nobn --tau 0.4 --sess imagenet_tau0.4_nobn --data_dir data_folder_path

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
cifar		cifar
imagenet		imagenet
README.md		README.md
cifar-bn.png		cifar-bn.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cifar

cifar

imagenet

imagenet

README.md

README.md

cifar-bn.png

cifar-bn.png

Repository files navigation

tau-ResNet

Experiments on CIFAR10

Experiments on ImageNet

About

Releases

Packages

Contributors 2

Languages

dayu11/tau-ResNet

Folders and files

Latest commit

History

Repository files navigation

tau-ResNet

Experiments on CIFAR10

Experiments on ImageNet

About

Resources

Stars

Watchers

Forks

Languages