Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

Codebase for the paper "Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning."

Requirements

The code requires:

Python 3.6 or higher
Pytorch 1.7 or higher

To install other dependencies, the following command can be used (uses pip):

./requirements.sh

Organization

The provided modules serve the following purpose:

main.py: Provides functions for training models with different layers.
layer_defs.py: Contains definitions for different normalization layers.
models.py: Contains definitions for different model architectures.
config.py: Training hyperparameters and progress bar definition.

Example execution

To train a model (e.g., ResNet-56) using a particular normalization layer (e.g., BatchNorm), run the following command

python main.py -arch=ResNet-56 --norm_type=BatchNorm

Summary of basic options

--arch=<architecture>

Options: vgg / resnet-56.
Since our non-residual CNNs are like VGG, we refer to their architecture as VGG.

--p_grouping=<amount_of_grouping_in_GroupNorm>

Options: integer; default: 32.
If p_grouping < 1: defines a group size of 1/p_grouping. E.g., p_grouping=0.5 implies group size of 2.
If p_grouping >= 1: defines number of groups as layer_width/p_grouping. E.g., p_grouping=32 implies number of groups per layer will be 32.

--skipinit=<use_skipinit_initialization>

Options: True/False; Default: False.

--preact=<use_preactivation_resnet>

Options: True/False; Default: False.

--probe_layers=<probe_activations_and_gradients>

Options: True/False; Default: True
Different properties in model layers (activation norm, stable rank, std. dev., cosine similarity, and gradient norm) will be calculated every iteration and stored as a dict every 5 epochs of training

--init_lr=<init_lr>

Options: float; Default: 1.
A multiplication factor to alter the learning rate schedule (e.g., if default learning rate is 0.1, init_lr=0.1 will make initial learning rate be equal to 0.01).

--lr_warmup=<lr_warmup>

Options: True/False; Default: False.
Learning rate warmup; used in Filter Response Normalization.

--batch_size=<batch_size>

Options: integer; Default: 256.

--dataset=<dataset>

Options: CIFAR-10/CIFAR-100; Default: CIFAR-100.

--download=<download_dataset>

Options: True/False; Default: False.
If CIFAR-10 or CIFAR-100 are to be downloaded, this option should be True.

--cfg=<number_of_layers>

Options: cfg_10/cfg_20/cfg_40; Default: cfg_10
Number of layers for non-residual architectures.

--seed=<change_random_seed>

Options: integer; Default: 0.

Training Settings: To change number of epochs or the learning rate schedule for training, change the hyperparameters in config.py. By default, models are trained using SGD with momentum (0.9).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

config.py

config.py

layer_defs.py

layer_defs.py

main.py

main.py

models.py

models.py

requirements.sh

requirements.sh

Repository files navigation

Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

Requirements

Organization

Example execution

Summary of basic options

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
config.py		config.py
layer_defs.py		layer_defs.py
main.py		main.py
models.py		models.py
requirements.sh		requirements.sh

License

EkdeepSLubana/BeyondBatchNorm

Folders and files

Latest commit

History

Repository files navigation

Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

Requirements

Organization

Example execution

Summary of basic options

About

Resources

License

Stars

Watchers

Forks

Languages