Hyperplane Arrangements of Trained ConvNets Are Biased

This repository contains the source code to reproduce the experiments of the paper "Hyperplane Arrangements of Trained ConvNets Are Biased". The code is based on Pytorch.

Requirements

To install the dependencies for the project, run

  pip install -r requirements.txt

Introduction

In this work, we take a geometrical perspective and look for statistical bias in the weights of trained convolutional networks, in terms of oriented hyperplane arrangements induced by convolutional layers with ReLU activations. Notably, for networks combining linear (affine) layers with piece-wise linear activations, oriented hyperplane arrangements define the function computed by the network and characterize how data is transformed non-linearly by the model.

Our main message is summarized as follows.

For many layers of trained ConvNets, the orientation of hyperplanes induced by each filter exhibit strong regularities, that emerge from training and correlate with learning. Furthermore, for low-complexity datasets, layers presenting hyperplanes with biased orientation are interestingly critical to the network's performance -- our measure correlates with the notion of critical layers introduced in the recent intriguing work of Zhang et al.[1], i.e. when the bias is not observed, the corresponding layers can be reset to their initialization without considerable loss in performance.

We refer the reader to the paper for detailed explanation and motivation for our methodology.

Training models

To train a model, run

train.py [-h] [--arch ARCH] [--dataset DATASET]
              [--subsample-classes SUBSAMPLE_CLASSES]
              [--class-sample-seed CLASS_SAMPLE_SEED] [--noise NOISE]
              [--noise-seed NOISE_SEED] [--augmentation] [--upscale]
              [--upscale-padding] [--pretrained] [--split SPLIT]
              [--stratified] [--split-seed SPLIT_SEED] [--shuffle]
              [--evaluate] [--evaluate-train] [--eval-regularization-loss]
              [--cuda] [--workers WORKERS] [--data-path DATA_PATH]
              [--epochs EPOCHS] [--start-epoch START_EPOCH]
              [--batch-size BATCH_SIZE] [--optimizer OPTIMIZER] [--lr LR]
              [--lr-step LR_STEP] [--lr-decay LR_DECAY]
              [--weight-decay WEIGHT_DECAY] [--momentum MOMENTUM]
              [--override] [--models-path MODELS_PATH] [--seed SEED]
              [--tb-logdir TB_LOGDIR] [--snapshot-every SNAPSHOT_EVERY]
              [--snapshot-all-until SNAPSHOT_ALL_UNTIL]
              [--resume-from RESUME_FROM] [--kill-plateaus] [--train-acc]
              [--unnormalize] [--log LOG]

Model training/finetuning.

optional arguments:
-h, --help            show this help message and exit
--arch ARCH           Network architecture to be trained. Run without this
                      option to see a list of all supported archs.
--dataset DATASET     Dataset to train the network on. Run without this
                      option to see a list of supported datasets.
--subsample-classes SUBSAMPLE_CLASSES
                      Subsample only SUBSAMPLE_CLASSES classes from DATASET.
                      If set to 0 (default) all classes of DATASET are used.
--class-sample-seed CLASS_SAMPLE_SEED
                      Numpy random seed used for sampling classes.
--noise NOISE         Ratio of corrupted labels, in [0., 1.]. Set to -1 to
                      enable pixel shuffle in place of label noise.
--noise-seed NOISE_SEED
                      Numpy seed for corrupting labels.
--augmentation        Enable data augmentation.
--upscale             Upscale image data to 244x224 pixels.
--upscale-padding     Upscale image data to 112x112 pixels, then pad with
                      zeros to get a final resolution of 224x224.
--pretrained          Load pretrained architecture on ImageNet from model
                      zoo.
--split SPLIT         Validation split size. The resutling split is class-
                      unbalanced [default=0].
--stratified          Validation split with equal balance of samples per
                      class.
--split-seed SPLIT_SEED
                      Seed used for shuffling the data before making the
                      validation split.
--shuffle             Shuffle the training data before making the validation
                      split [default=False].
--evaluate            Evaluate model on the validation set.
--evaluate-train      Evaluate model on the training set.
--eval-regularization-loss
                      Compute the regularization loss and quit The
                      WEIGHT_DECAY option should be set as well.
--cuda                Enable GPU support.
--workers WORKERS, -j WORKERS
                      Number of parallel data processing jobs.
--data-path DATA_PATH
                      Path to local ImageNet folder.
--epochs EPOCHS       The number of epochs used for training [default = 20].
--start-epoch START_EPOCH
                      Starting epoch for training.
--batch-size BATCH_SIZE
                      The minibatch size for training [default = 128].
--optimizer OPTIMIZER
                      Supported optimizers: sgd, adam [default = sgd].
--lr LR               The base learning rate for SGD optimization [default =
                      0.001].
--lr-step LR_STEP     The step size (# iterations) of the learning rate
                      decay [default = off].
--lr-decay LR_DECAY   The decay factor of the learning rate decay [default =
                      off].
--weight-decay WEIGHT_DECAY
                      The weight decay coefficient [default = 1e-4].
--momentum MOMENTUM   The momentum coefficient for SGD [default = 0.9].
--override            When resuming from a snapshot with different
                      hyperparameters, overrides the values restored from
                      the snapshot with the command line arguments.
--models-path MODELS_PATH
                      The dirname where to store/load models [default =
                      './models'].
--seed SEED           Pytorch PRNG seed.
--tb-logdir TB_LOGDIR
                      The tensorboard log folder [default =
                      './tensorboard'].
--snapshot-every SNAPSHOT_EVERY
                      Snapshot the model state every E epochs [default = 0].
--snapshot-all-until SNAPSHOT_ALL_UNTIL
                      Optional. Snapshot every epoch until the specified
                      one, then snapshot according to the --snapshot-every
                      argument.
--resume-from RESUME_FROM
                      Path to a model snapshot [default = None].
--kill-plateaus       Quit training if the model validation accuracy
                      plateaus in the first 10 epochs.
--train-acc           (Optional) compute train accuracy and report it during
                      training.
--unnormalize         Disable data normalization and represent instead pixel
                      values in [0,1]. [Default = False].
--log LOG             Logfile name [default = 'train.log'].

To see a list of supported architectures run train.py with no --arch argument, e.g.

  python train.py --dataset imagenet

Similarly, to see a list of supported datasets, run train.py with no --dataset argument, e.g.

  python train.py --arch vgg19

Training supports logging to tensorboard.

Example -- Train VGG19 on ImageNet

To train VGG19 on ImageNet on GPU(s) for 40 epochs, with base learning rate 0.1 and batch size 128, run

  python train.py --arch vgg19 --dataset imagenet --cuda --data-path /path/to/imagenet --epochs 40 --lr 0.01 --batch-size 128

Computing Projection Statistics

When training a model, several snapshots of the model at different epochs can be saved. For each snapshot, our projection statistics can be computed by running

compute_projections.py [-h] [--arch ARCH]
                            [--subsample-classes SUBSAMPLE_CLASSES]
                            [--dataset DATASET] [--load-from LOAD_FROM]
                            [--init-from INIT_FROM] [--pretrained] [--cuda]
                            [--normalize] [--with-linear]
                            [--results RESULTS] [--log LOG]
                            [--seed SEED]

Compute projection statistics.

optional arguments:
-h, --help            show this help message and exit
--arch ARCH           Network architecture to be trained. Run without this
                      option to see a list of all available pretrained
                      archs.
--subsample-classes SUBSAMPLE_CLASSES
                      Number of classes in the prediction head of the
                      network. Specify 0 (default) for using the standard
                      number of classes of DATASET.
--dataset DATASET     Dataset used to train the model. Used to specify the
                      number of classes of the prediction layer of the
                      model.
--load-from LOAD_FROM
                      Load trained network from file.
--init-from INIT_FROM
                      Network initialization. Specify a model snapshot to
                      measure distance from initialization for each layer.
--pretrained          Load pretrained architecture on ImageNet
                      [default=False].
--cuda                Enable GPU support.
--normalize           Normalize projections.
--with-linear         Compute statistics also for fully connected layers.
--results RESULTS     Path to store results [default = './results'].
--log LOG             Logfile name [default = 'positive_orthant.log'].
--seed SEED           Pytorch seed. (Optional)If using --load-from, specify
                      the seed the model was trained with.

If no model snapshots are available, it is possible to compute the statistics for off-the-shelf pretrained Pytorch models.

  python compute_projections.py --arch alexnet --dataset imagenet --cuda --normalize --pretrained

Finally, when no model snapshots or --pretrained are specified, the statistics for one random initialization of the specified architecture can be computed:

  python compute_projections.py --arch alexnet --dataset imagenet --cuda --normalize

For each snapshot or pretrained model, the computed statistics will be stored as a JSON file, which can later be loaded for plotting.

Example

Compute projection statistics from a model snasphot of AlexNet trained on Imagenet:

  python compute_projections.py --arch alexnet --dataset imagenet --cuda --normalize --load-from /path/to/model/snapshot

Weight Reinitialization

To reproduce the experiments of Zhang et al., we reimplemented the methodology described in section 2 of [1].

Given a snapshot of a trained network and a list of snapshots of initial weights to use for weight reinitialization, the drop in validation accuracy can be computed by running

reinit_layers.py [-h] [--arch ARCH] [--dataset DATASET] [--upscale]
                      [--upscale-padding]
                      [--subsample-classes SUBSAMPLE_CLASSES]
                      [--class-sample-seed CLASS_SAMPLE_SEED]
                      [--load-from LOAD_FROM] [--inits-from INITS_FROM]
                      [--cuda] [--results RESULTS] [--plots PLOTS]
                      [--log LOG] [--seed SEED] [--workers WORKERS]
                      [--data-path DATA_PATH] [--batch-size BATCH_SIZE]
                      [--rand]

Reinit network weights and compute drop in performance.

optional arguments:
-h, --help            show this help message and exit
--arch ARCH           Network architecture. Run without this option to see a
                      list of all available pretrained archs.
--dataset DATASET     Dataset used to train the model. Used to specify the
                      number of classes of the prediction layer of the
                      model.
--upscale             Upscale image data to 244x224 pixels.
--upscale-padding     Upscale image data to 112x112 pixels and then zero-pad
                      to 224x224.
--subsample-classes SUBSAMPLE_CLASSES
                      Subsample only SUBSAMPLE_CLASSES classes from DATASET.
                      If set to 0 (default) all classes of DATASET are used.
--class-sample-seed CLASS_SAMPLE_SEED
                      Numpy random seed used for sampling classes.
--load-from LOAD_FROM
                      Load trained network from file.
--inits-from INITS_FROM
                      Network initialization. Specify a model snapshot to
                      measure distance from initialization for each layer.
--cuda                Enable GPU support.
--results RESULTS     Path to store results [default = './results'].
--plots PLOTS         Path to store plots [default = './plots'].
--log LOG             Logfile name [default = 'reinit_weights.log'].
--seed SEED           Pytorch seed. (Optional)If using --load-from, specify
                      the seed the model was trained with.
--workers WORKERS, -j WORKERS
                      Number of parallel data processing jobs.
--data-path DATA_PATH
                      Path to local dataset folder.
--batch-size BATCH_SIZE
                      The minibatch size for training [default = 128].
--rand                Perform weight randomization test.

The list of weights INITS_FROM to use for reinitialization must be specified as a plaintext list of paths to model snapshots -- one per line -- with the file ending with an empty line.

If the flag --rand is specified, the drop in accuracy is computed also for weights sampled from a (independent) random initialization.

All results are stored as JSON for easy post-processing (e.g. plotting heatmaps).

Example

Weight Reinitialization for VGG19 trained on ImageNet:

  python reinit_weights.py --arch vgg19 --dataset imagenet --data-path /path/to/imagenet --load-from /path/to/trained/model/snapshot --inits-from /path/to/file/list/of/initializations.txt --rand --batch-size 128 --cuda

References

"Are All Layers Created Equal?." Chiyuan Zhang and Samy Bengio and Yoram Singer. 2019. arXiv preprint 1902.01996.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
mlp		mlp
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
compute_projections.py		compute_projections.py
data_loader.py		data_loader.py
models.py		models.py
reinit_layers.py		reinit_layers.py
requirements.txt		requirements.txt
scores.py		scores.py
setup.py		setup.py
snapshot.py		snapshot.py
train.py		train.py
utils.py		utils.py

License

magamba/hp_arrangements

Folders and files

Latest commit

History

Repository files navigation

Hyperplane Arrangements of Trained ConvNets Are Biased

Requirements

Introduction

Training models

Example -- Train VGG19 on ImageNet

Computing Projection Statistics

Example

Weight Reinitialization

Example

References

About

Resources

License

Stars

Watchers

Forks

Languages