Notes on some important deep learning topics and paper summaries.
Feel free to contribute.
Documents have corresponding latex file you can edit.
- Information, Entropy, Cross-Entropy: ML perspective: Basic of information theory, why log is used to represent information. entropy, cross entropy, KL divergence, likelihood, why cross entropy loss is used in machine learning.
- Gradient Descent Optimizations: Three gradient descent varients, challenges with vanilla gradient descent, momentum, Nesterov accelerated gradient, Adagrad, RMSprop, Adadelta, Adam.
- Common activation functions used in neural net: why we need to use activation function, desirable properties of activation, sigmoid, tanh, relu, prelu, elu.
- Why ReLU(instead of sigmoid/tanh)?: why ReLU is better suited for deep learning compared to sigmoid or tanh, some of the potential problems with ReLU and how to mitigate them.
- How transferable are features in deep neural networks?
- Learning and transferring mid-Level image representations using convolutional neural networks
- Distilling the Knowledge in a Neural Network
- Sequence to Sequence Learning with Neural Networks
- Distributed Representations of Sentences and Documents
- VGG
- ResNet
- Deep Sparse Rectifier Neural Networks
- Network in Network
- GoogLeNet
- MobileNets
- AlexNet
- Inception-V2
- Inception-V4
- Dropout
- Efficient Estimation of Word Representations in Vector Space
- A Convolutional Neural Network for Modelling Sentences
- Effective Approaches to Attention-based Neural Machine Translation
- Neural Machine Translation by Jointly Learning to Align and Translate
- Large-scale Video Classification with Convolutional Neural Networks
- Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
- Curriculum Learning
- Maxout Networks
- Visualizing and Understanding Convolutional Networks
- R-CNN
- Fast R-CNN
- Faster R-CNN
- SSD: Single Shot MultiBox Detector
- Regularization
- Different types of Losses
- Activation functions: swish