Skip to content

EkdeepSLubana/BeyondBatchNorm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

Codebase for the paper "Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning."

Requirements

The code requires:

  • Python 3.6 or higher

  • Pytorch 1.7 or higher

To install other dependencies, the following command can be used (uses pip):

./requirements.sh

Organization

The provided modules serve the following purpose:

  • main.py: Provides functions for training models with different layers.

  • layer_defs.py: Contains definitions for different normalization layers.

  • models.py: Contains definitions for different model architectures.

  • config.py: Training hyperparameters and progress bar definition.

Example execution

To train a model (e.g., ResNet-56) using a particular normalization layer (e.g., BatchNorm), run the following command

python main.py -arch=ResNet-56 --norm_type=BatchNorm

Summary of basic options

--arch=<architecture>

  • Options: vgg / resnet-56.
  • Since our non-residual CNNs are like VGG, we refer to their architecture as VGG.

--p_grouping=<amount_of_grouping_in_GroupNorm>

  • Options: integer; default: 32.
  • If p_grouping < 1: defines a group size of 1/p_grouping. E.g., p_grouping=0.5 implies group size of 2.
  • If p_grouping >= 1: defines number of groups as layer_width/p_grouping. E.g., p_grouping=32 implies number of groups per layer will be 32.

--skipinit=<use_skipinit_initialization>

  • Options: True/False; Default: False.

--preact=<use_preactivation_resnet>

  • Options: True/False; Default: False.

--probe_layers=<probe_activations_and_gradients>

  • Options: True/False; Default: True
  • Different properties in model layers (activation norm, stable rank, std. dev., cosine similarity, and gradient norm) will be calculated every iteration and stored as a dict every 5 epochs of training

--init_lr=<init_lr>

  • Options: float; Default: 1.
  • A multiplication factor to alter the learning rate schedule (e.g., if default learning rate is 0.1, init_lr=0.1 will make initial learning rate be equal to 0.01).

--lr_warmup=<lr_warmup>

  • Options: True/False; Default: False.
  • Learning rate warmup; used in Filter Response Normalization.

--batch_size=<batch_size>

  • Options: integer; Default: 256.

--dataset=<dataset>

  • Options: CIFAR-10/CIFAR-100; Default: CIFAR-100.

--download=<download_dataset>

  • Options: True/False; Default: False.
  • If CIFAR-10 or CIFAR-100 are to be downloaded, this option should be True.

--cfg=<number_of_layers>

  • Options: cfg_10/cfg_20/cfg_40; Default: cfg_10
  • Number of layers for non-residual architectures.

--seed=<change_random_seed>

  • Options: integer; Default: 0.

Training Settings: To change number of epochs or the learning rate schedule for training, change the hyperparameters in config.py. By default, models are trained using SGD with momentum (0.9).

About

Codebase for the paper "Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published