Visualizing Loss Functions

In DeepMind's recent paper Neural Arithmetic Logic Units, they showed that NALUs and NACs (neural accumulators) perform better that RNNs, LSTMs, and vanilla MLPs in tasks that involve some notion of counting, keeping track of numbers, and approximating mathematical functions. While such effectiveness at numerical tasks is obvious from their internal architecture, I was (and still am) wondering how much of it could be explained by the shape of the loss function graphed against a few randomly chosen weights.

In the case of a simple MLP (2 hidden units!) vs. a simple NAC (2 hidden units as well) being trained to learn addition, the MLP loss surface has subtle local minima, and the MLP requires multiple training samples to fully understand the task. But the NAC loss surface monotonically changes in the correct direction even with a single training sample! This is because the architecture of the NAC forces it to learn a smaller range of tasks than an MLP, allowing it to converge with fewer training samples.

If I have more time, I want to compare an MLP with a NALU network on the MNIST counting task as described in the original paper and see how the loss surfaces differ.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
NAC_loss.ipynb		NAC_loss.ipynb
README.md		README.md
architectures.py		architectures.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

NAC_loss.ipynb

NAC_loss.ipynb

README.md

README.md

architectures.py

architectures.py

Repository files navigation

Visualizing Loss Functions

About

Releases

Packages

Languages

License

sagnibak/nac-loss-vis

Folders and files

Latest commit

History

Repository files navigation

Visualizing Loss Functions

About

Topics

Resources

License

Stars

Watchers

Forks

Languages