Deep Learning Papers TLDR

My collection of notes on deep learning papers.

To take a look at some of my projects and notes on deep learning that's not directly related to literature research, go here: @episodeyang/deep_learning_notes

This repository is motivated by Andrew Ng's The Saturday Story, with the hope that eventually I will become a good deep learning researcher.

Current Week

I will keep this todo list short. This is what I'm working on this week.

Neural-Programmer Interpreter implementation (pyTorch)

see deep_learning_notes repo for the code.

Information Theory Notes (WIP)

Boltzmann, Entropy, and Kullback-Leibler Divergence

I was inspired to understand the physics foundation of Restricted Boltzmann Machine. In the first installment of a series of posts, I take a physicist's approach to derive Shannon's entropy from statistical mechanics. Then I went on to derive various information theoretic enties. (Work in progress)

Bayesian Optimizations

Closely related to statistical ideas of ¹

optimal design of experiments, dating back to Kirstine Smith in 1918.
response surface methods, they date back to Box and Wilson in 1951.
Bayesian optimization, studied first by Kushner in 1964 and then Mockus in 1978.

Methodologically, it touches on several important machine learning areas: active learning, contextual bandits, Bayesian nonparametrics

Started receiving serious attention in ML in 2007,
- Brochu, de Freitas & Ghosh, NIPS 2007 [preference learning]
- Krause, Singh & Guestrin, JMLR 2008 [optimal sensor placement]
- Srinivas, Krause, Kakade & Seeger, ICML 2010 [regret bounds]
- Brochu, Hoffman & de Freitas, UAI 2011 [portfolios]
Interest exploded when it was realized that Bayesian optimization provides an excellent tool for finding good ML hyperparameters.

ICLR 2017 Best Papers

"Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data" by Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, Kunal Talwar https://openreview.net/forum?id=HkwoSDPgg&noteId=HkwoSDPgg

Attention

link

Notes on "Making Neural Programming Architectures Generalize via Recursion" by Jonathon Cai, Richard Shin, Dawn Song" Berkeley

Authors propose to augment neural networks by a key abstraction: recursion. Work applied this idea to Program-Interpreter from Reed & de Freitas 2006, and demonstrates superior generalizability, tractability with proven garantee.

Recursion greatly reduces the domain of each neural program component, also greatly reduces the amount of training data required, and make it easier to interpret and validate.
Notes on "Understanding deep learning requires rethinking generalization" Google Brain

Through experimentation, authors shows how traditional approaches to generalization failes to explain why large neural networks generalize well in practice. They also explains why deep learning requires rethinking generalization.

Neural Compression and Techniques

2005, Han et al., Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding [pdf]

gist: A three stage pipeline:
1. zero out small weights (prunning) 9x-13x
2. Trained Quantization 27x-31x
3. Huffman Encoding 35x-49x
without suffering any loss in accuracy.
ICLR 2017, Han et al., Dense-Sparse-Dense Training for Deep Neural Networks [pdf]

gist: Sparse training and dense retrian improves network performance
1. train normally
2. mask out small weights (bimodal distribution), then retrain
3. remove mask and set small weights to zero, then retrain.
profit: 1~~2% abs. imprv. across vision, speech and caption tasks, 4~~13% rel. imprv.

Sources

The source of this repo is mostly:

various online reading lists
Conferences
Courses: this is probably the most important source because they are structured
friends and colleague's recommendations

Other Repos

My lab-mate Nelson's notes can be seen here.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Bayesian Inference		Bayesian Inference
ICLR 2017		ICLR 2017
Neural Programs		Neural Programs
RNN modules		RNN modules
Restricted Boltzman Machine		Restricted Boltzman Machine
TTS		TTS
VAE and GAN		VAE and GAN
_papers		_papers
adaptive dropout		adaptive dropout
hand-writing generation		hand-writing generation
hippocampal and episodic memory		hippocampal and episodic memory
intro to Reinforcement Learning		intro to Reinforcement Learning
new downloads		new downloads
quantum machine learning		quantum machine learning
stereo segmentation		stereo segmentation
time-series		time-series
unsupervised_cv		unsupervised_cv
word embedings		word embedings
work-in-progress/catastrophic forgetting		work-in-progress/catastrophic forgetting
Daily Todo.md		Daily Todo.md
Deep Learning TODOs.md		Deep Learning TODOs.md
Machine Translation.md		Machine Translation.md
Manually Engineered Features.md		Manually Engineered Features.md
March Meeting Physics Information Theory notes.md		March Meeting Physics Information Theory notes.md
Nuts and Bolts of Applying Deep Learning.md		Nuts and Bolts of Applying Deep Learning.md
README.md		README.md
Resources.md		Resources.md
Sparse Auto-encoders, ICA and Sparse Coding.md		Sparse Auto-encoders, ICA and Sparse Coding.md
The Saturday Story.md		The Saturday Story.md
deep learning project ideas.md		deep learning project ideas.md
deep learnproject ideas.md		deep learnproject ideas.md
state of the art of various models.md		state of the art of various models.md

geyang/deep_learning_papers_TLDR

Folders and files

Latest commit

History

Repository files navigation

Deep Learning Papers TLDR

Current Week

Neural-Programmer Interpreter implementation (pyTorch)

Information Theory Notes (WIP)

Bayesian Optimizations

ICLR 2017 Best Papers

Attention

Table of Contents

ICLR 2017 Best Papers

Neural Compression and Techniques

Sources

Other Repos

About

Topics

Resources

Stars

Watchers

Forks

Languages