Skip to content

A list of papers I used for my thesis about convolutional neural networks and batch normalization

Notifications You must be signed in to change notification settings

fabianschilling/awesome-convnets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 

Repository files navigation

Convolutional Neural Networks Reading List

A list of papers I used for my thesis about convolutional neural networks with a focus on batch normalization. The papers are mostly ordered chronologically in terms of their publication date. I divided the papers in sections Early work, Pre-AlexNet, Post-AlexNet, and Batch Normalization.

Early work

Invention of the backpropagation algorithm.

First paper on convolutional neural networks trained with backpropagation.

Overview of training end-to-end systems such as convolutional neural networks with gradient-based optimization.

Efficient Backprop (LeCun et al 1998)

Gives many practical recommendations for traning multi-layer (convolutional) neural networks.

  • Motivates stochastic gradient descent with mini-batches
  • Shows benefits of mean subtraction, normalization, and decorrelation
  • Shows drawbacks of sigmoid activation function and motivates hyperbolic tangent (tanh)
  • Proposes weight initialization scheme (LeCun initialization)
  • Motivates use of adaptive optimization techniques and momentum

Pre-AlexNet

Introduces unsupervised pre-training and shows significant convergence improvements and generalization performance.

Shows why training deep neural networks in deep networks is difficult and gives pointers for improvements.

  • Gradient propagation study with sigmoid, tanh, and softsign
  • New initialization scheme for these activations (Xavier initialization)
  • Motivates the cross entropy loss function instead of mean squared error (MSE)

Shows the advantages of rectified activation functions (ReLU) for convergence speed.

Introduces adagrad, an adaptive optimization technique.

Practical recommendations for setting hyperparameters such as the learning rate, learning rate decay, batch size, momentum, weight decay, and nonlinearity.

Post-AlexNet

Breakthrough paper that popularized convolutional neural networks (namely AlexNet) and made the following contributions.

  • The use of local response normalization
  • Extensive use of regularizers such as data augmentation and dropout

Describes dropout in detail.

Introduces adadelta, an improved version of the adagrad adaptive optimization technique.

Maxout Networks (Goodfellow et al 2013)

Introduces the maxout neuron, a companion to dropout, that is able to approximate activation functions such as ReLU and the absolute value.

Theoretical analysis of the dynamics in deep neural networks and proposal of the orthogonal initialization scheme.

Shows why careful weight initialization and (Nesterov) momentum accelerated SGD are cruciual for training deep neural networks.

Introduces dropconnect, a generalization of dropout that drops random weights instead of entire neurons.

Introduces a novel visualization technique for convolutional filters using a method called deconvolution that maps layer activations back to the input pixel space.

Introduces adam and adamax, improved versions of the adadelta adaptive optimization technique.

Going Deeper with Convolutions (Szegedy et al 2014)

Describes the inception architecture (GoogLeNet) that reduces the amount of learable parameters significantly while improving accuracy.

Motivates the use of architectures with smaller convolutional filters such as 1 x 1 and 3 x 3 (VGGNet).

Introduces a novel parametric rectifier (PReLU) and a weight initialization scheme tailored to rectified activations (Kaiming initialization).

Describes a network architecture with residual connections (ResNet) that enable deeper architectures and are easier to optimize.

Batch Normalization

Introduces batch normalization, a method to accelerate deep network training by reducing the internal covariate shift. The authors claim batch normalization has the following properties.

  • Enables higher learning rates and faster learning rate decay without the risk of divergence
  • Regularizes the model by stabilizing the parameter growth
  • Reduce the need for dropout, weight regularization, and local response normalization

About

A list of papers I used for my thesis about convolutional neural networks and batch normalization

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published