deep-learning-distilled

Notes on some important deep learning topics and paper summaries.
Feel free to contribute.
Documents have corresponding latex file you can edit.

Information, Entropy, Cross-Entropy: ML perspective: Basic of information theory, why log is used to represent information. entropy, cross entropy, KL divergence, likelihood, why cross entropy loss is used in machine learning.
Gradient Descent Optimizations: Three gradient descent varients, challenges with vanilla gradient descent, momentum, Nesterov accelerated gradient, Adagrad, RMSprop, Adadelta, Adam.
Common activation functions used in neural net: why we need to use activation function, desirable properties of activation, sigmoid, tanh, relu, prelu, elu.
Why ReLU(instead of sigmoid/tanh)?: why ReLU is better suited for deep learning compared to sigmoid or tanh, some of the potential problems with ReLU and how to mitigate them.

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
pdf		pdf
source		source
LICENSE		LICENSE
README.md		README.md

Provide feedback