Science of Learning

Science of Learning

V. Vapnik said that ``Nothing is more practical than a good theory.'' Here we focus on the theoretical machine learning.

CONSTRAINT REASONING AND OPTIMIZATION
https://www.math.ubc.ca/~erobeva/seminar.html
https://www.deel.ai/theoretical-guarantees/
http://www.vanderschaar-lab.com/NewWebsite/index.html
https://nthu-datalab.github.io/ml/index.html
http://www.cs.cornell.edu/~shmat/research.html
http://www.prace-ri.eu/best-practice-guide-deep-learning
https://math.ethz.ch/sam/research/reports.html?year=2019
http://gr.xjtu.edu.cn/web/jjx323/home
https://zhouchenlin.github.io/
https://www.math.tamu.edu/~bhanin/
https://yani.io/annou/
https://probability.dmi.unibas.ch/seminar.html
http://mjt.cs.illinois.edu/courses/dlt-f19/
http://danroy.org/
Symbolic Methods for Biological Networks
https://losslandscape.com/faq/
https://mcallester.github.io/ttic-31230/
https://sites.google.com/view/holist/home

Deep learning is a transformative technology that has delivered impressive improvements in image classification and speech recognition. Many researchers are trying to better understand how to improve prediction performance and also how to improve training methods. Some researchers use experimental techniques; others use theoretical approaches.

https://www.cl.cam.ac.uk/~rja14

There has been a lot of interest in algorithms that learn feature hierarchies from unlabeled data. Deep learning methods such as deep belief networks, sparse coding-based methods, convolutional networks, and deep Boltzmann machines, have shown promise and have already been successfully applied to a variety of tasks in computer vision, audio processing, natural language rocessing, information retrieval, and robotics. In this workshop, we will bring together researchers who are interested in deep learning and unsupervised feature learning, review the recent technical progress, discuss the challenges, and identify promising future research directions.

https://deeplearningworkshopnips2010.wordpress.com/

The development of a "Science of Deep Learning" is now an active, interdisciplinary area of research combining insights from information theory, statistical physics, mathematical biology, and others. Deep learning is at least related with kernel tricks, projection pursuit and neural networks.

Resource on Deep Learning Theory

Blogs and Paper

Understanding Neural Networks by embedding hidden representations
Tractable Deep Learning
DALI 2018, Data Learning and Inference
MATHEMATICS OF DEEP LEARNING, NYU, Spring 2018
Theory of Deep Learning, project in researchgate
THE THEORY OF DEEP LEARNING - PART I
Magic paper
Principled Approaches to Deep Learning
A Convergence Theory for Deep Learning via Over-Parameterization
Advancing AI through cognitive science
Deep Learning and the Demand for Interpretability
Doing the Impossible: Why Neural Networks Can Be Trained at All
Deep Learning Drizzle
A Comprehensive Analysis of Deep Regression
Deep Learning in Practice
Deep learning theory
The Science of Deep Learning
TBSI 2019 Retreat Conference
http://principlesofdeeplearning.com/
https://cbmm.mit.edu/education/courses
DALI 2018 - Data, Learning and Inference
On Theory@http://www.deeplearningpatterns.com
https://blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/85815724

https://zhuanlan.zhihu.com/p/22353056
https://arxiv.org/abs/1909.13458

Course on Deep Learning

CoMS E6998 003: Advanced Topics in Deep Learning
Analyses of Deep Learning (STATS 385) 2019
Deep Learning Theory: Approximation, Optimization, Generalization
UVA DEEP LEARNING COURSE
Theories of Deep Learning (STATS 385)
Topics Course on Deep Learning for Spring 2016 by Joan Bruna, UC Berkeley, Statistics Department
Mathematical aspects of Deep Learning
MATH 6380p. Advanced Topics in Deep Learning Fall 2018
6.883 Science of Deep Learning: Bridging Theory and Practice -- Spring 2018
(Winter 2018) IFT 6085: Theoretical principles for deep learning
STAT 991: Topics in deep learning (UPenn)

Deep Learning Reading Group

yanjun organized a wonderful reading group on deep learning.

https://a2i2.deakin.edu.au/
https://qdata.github.io/deep2Read/
https://dlta-reading.github.io/

http://www.mlnl.cs.ucl.ac.uk/readingroup.html
https://labrosa.ee.columbia.edu/cuneuralnet/
http://www.ub.edu/cvub/reading-group/
https://team.inria.fr/perception/deeplearning/
https://scholar.princeton.edu/csmlreading
https://junjuew.github.io/elijah-reading-group/
http://www.sribd.cn/DL/schedule.html
http://lear.inrialpes.fr/people/gaidon/lear_xrce_deep_learning_01.html
https://simons.berkeley.edu/events/reading-group-deep-learning
https://csml.princeton.edu/readinggroup
http://www.bicv.org/deep-learning/
https://www.cs.ubc.ca/labs/lci/mlrg/
https://calculatedcontent.com/2015/03/25/why-does-deep-learning-work/
https://project.inria.fr/deeplearning/
https://hustcv.github.io/reading-list.html

Workshops

http://pwp.gatech.edu/fdl-2018/program/
Symposium Artificial Intelligence for Science, Industry and Society
4th Workshop on Semantic Deep Learning (SemDeep-4)
TAU & GTDeepNet seminars
NeuroIP 2018 workshop on Deep Learning Theory
Toward theoretical understanding of deep learning

The Science of Deep Learning
PG Program in Artificial Intelligence & Machine Learning: Business Applications
Explainable AI Workshop
https://people.eecs.berkeley.edu/~malik/
heoretical Foundation of Deep Learning (TFDL 2018)
https://simons.berkeley.edu/programs/dl2019
https://www.minds.jhu.edu/tripods/
https://humancompatible.ai/research
Theory of Deep Learning, ICML'2018
Identifying and Understanding Deep Learning Phenomena
https://ijcai20interpretability.github.io/
https://niceworkshop.org/
https://ecea-5.sciforum.net/

Labs

https://ori.ox.ac.uk/labs/a2i/

NSF, Simons Foundation partner to uncover foundations of deep learning
https://elsc.huji.ac.il/faculty-staff/haim-sompolinsky
https://www.neuralnet.science/
https://carolewu.engineering.asu.edu/
https://www.nist.gov/artificial-intelligence
https://humancompatible.ai/
http://kordinglab.com/
http://koerding.com/
https://www.regina.csail.mit.edu/
https://www.cs.huji.ac.il/~shashua/research.php
http://www.cns.nyu.edu/~eero/
https://xiangxiangxu.com/

http://qszhang.com/index.php/team/
https://gangwg.github.io/research.html
http://www.mit.edu/~k2smith/
https://www.msra.cn/zh-cn/news/people-stories/wei-chen
https://www.microsoft.com/en-us/research/people/tyliu/
https://www.researchgate.net/profile/Hatef_Monajemi

Interpretability in AI

2018 Workshop on Interpretable & Reasonable Deep Learning and its Applications (IReDLiA)

http://networkinterpretability.org/
https://interpretablevision.github.io/
https://vipriors.github.io/
Interpretability in AI and its relation to fairness, transparency, reliability and trust
https://github.com/jphall663/awesome-machine-learning-interpretability
https://people.mpi-sws.org/~manuelgr/
2nd HUMAINT Winter school on Fairness, Accountability and Transparency in Artificial Intelligence
https://facctconference.org/network/
https://calculatedcontent.com/

Interpretability of Neural Networks

Although deep neural networks have exhibited superior performance in various tasks, interpretability is always Achilles’ heel of deep neural networks. At present, deep neural networks obtain high discrimination power at the cost of a low interpretability of their black-box representations. We believe that high model interpretability may help people break several bottlenecks of deep learning, e.g., learning from a few annotations, learning via human–computer communications at the semantic level, and semantically debugging network representations. We focus on convolutional neural networks (CNNs), and revisit the visualization of CNN representations, methods of diagnosing representations of pre-trained CNNs, approaches for disentangling pre-trained CNN representations, learning of CNNs with disentangled representations, and middle-to-end learning based on model interpretability. Finally, we discuss prospective trends in explainable artificial intelligence.

https://www.transai.org/
GAMES Webinar 2019 – 93期(深度学习可解释性专题课程)
GAMES Webinar 2019 – 94期(深度学习可解释性专题课程) | 刘日升（大连理工大学），张拳石（上海交通大学）
http://qszhang.com/index.php/publications/
Explaining Neural Networks Semantically and Quantitatively
https://www.jiqizhixin.com/articles/0211
https://www.jiqizhixin.com/articles/030205
https://mp.weixin.qq.com/s/xY7Cpe6idbOTJuyD3vwD3w
http://academic.hep.com.cn/fitee/CN/10.1631/FITEE.1700808#1
https://arxiv.org/pdf/1905.11833.pdf
http://www.cs.sjtu.edu.cn/~leng-jw/
https://lemondan.github.io
http://ise.sysu.edu.cn/teacher/teacher02/1136886.htm
http://www.cs.cmu.edu/~zhitingh/data/hu18texar.pdf
https://datasciencephd.eu/DSSS19/slides/GiannottiPedreschi-ExplainableAI.pdf
http://www.cs.cmu.edu/~zhitingh/
https://graphreason.github.io/

https://beenkim.github.io/
https://www.math.ucla.edu/~montufar/
Explainable AI: Interpreting, Explaining and Visualizing Deep Learning
http://www.prcv2019.com/en/index.html
http://gr.xjtu.edu.cn/web/jiansun
http://www.shixialiu.com/
http://irc.cs.sdu.edu.cn/
https://www.seas.upenn.edu/~minchenl/
https://cs.nyu.edu/~yixinhu/
http://www.cs.utexas.edu/~huangqx/
https://stats385.github.io/

Not all one can understand the relative theory or quantum theory.

Interpretable Convolutional Neural Networks

Integrated and detailed image understanding

DeepLEVER

DeepLEVER aims at explaining and verifying machine learning systems via combinatorial optimization in general and SAT in particular. The main thesis of the DeepLever project is that a solution to address the challenges faced by ML models is at the intersection of formal methods (FM) and AI. (A recent Summit on Machine Learning Meets Formal Methods offered supporting evidence to how strategic this topic is.) The DeepLever project envisions two main lines of research, concretely explanation and verification of deep ML models, supported by existing and novel constraint reasoning technologies.

DeepLEVER
https://aniti.univ-toulouse.fr/index.php/en/
https://jpmarquessilva.github.io/
https://www.researchgate.net/profile/Martin_Cooper3
http://homepages.laas.fr/ehebrard/Home.html
http://www.merl.com/

DLphi

Together with the participants of the Oberwolfach Seminar: Mathematics of Deep Learning, I wrote a (not entirely serious) paper called "The Oracle of DLPhi" proving that Deep Learning techniques can perform accurate classifications on test data that is entirely uncorrelated to the training data. This, however, requires a couple of non-standard assumptions such as uncountably many data points and the axiom of choice. In a sense this shows that mathematical results on machine learning need to be approached with a bit of scepticism.

https://github.com/juliusberner/oberwolfach_workshop
http://www.pc-petersen.eu/
http://voigtlaender.xyz/
https://math.ethz.ch/sam/research/reports.html
The Oracle of DLphi
https://faculty.washington.edu/kutz/

Scientific Machine Learning

Scientific machine learning is a burgeoning discipline which blends scientific computing and machine learning. Traditionally, scientific computing focuses on large-scale mechanistic models, usually differential equations, that are derived from scientific laws that simplified and explained phenomena. On the other hand, machine learning focuses on developing non-mechanistic data-driven models which require minimal knowledge and prior assumptions. The two sides have their pros and cons: differential equation models are great at extrapolating, the terms are explainable, and they can be fit with small data and few parameters. Machine learning models on the other hand require "big data" and lots of parameters but are not biased by the scientists ability to correctly identify valid laws and assumptions.

https://www.scd.stfc.ac.uk/Pages/Scientific-Machine-Learning.aspx
https://mitmath.github.io/18337/
https://www.stat.purdue.edu/~fmliang/STAT598Purdue/MLS.pdf
https://sciml.ai/
https://github.com/mitmath/18S096SciML
https://ml4sci.lbl.gov/
Scientific computation using machine-learning algorithms
https://sites.google.com/lbl.gov/ml4sci/
SciANN: Neural Networks for Scientific Computations

Physics and Deep Learning

Neuronal networks have enjoyed a resurgence both in the worlds of neuroscience, where they yield mathematical frameworks for thinking about complex neural datasets, and in machine learning, where they achieve state of the art results on a variety of tasks, including machine vision, speech recognition, and language translation.
Despite their empirical success, a mathematical theory of how deep neural circuits, with many layers of cascaded nonlinearities, learn and compute remains elusive.
We will discuss three recent vignettes in which ideas from statistical physics can shed light on this issue.
In particular, we show how dynamical criticality can help in neural learning, how the non-intuitive geometry of high dimensional error landscapes can be exploited to speed up learning, and how modern ideas from non-equilibrium statistical physics, like the Jarzynski equality, can be extended to yield powerful algorithms for modeling complex probability distributions.
Time permitting, we will also discuss the relationship between neural network learning dynamics and the developmental time course of semantic concepts in infants.

In recent years, artificial intelligence has made remarkable advancements, impacting many industrial sectors dependent on complex decision-making and optimization. Physics-leaning disciplines also face hard inference problems in complex systems: climate prediction, density matrix estimation for many-body quantum systems, material phase detection, protein-fold quality prediction, parametrization of effective models of high-dimensional neural activity, energy landscapes of transcription factor-binding, etc. Methods using artificial intelligence have in fact already advanced progress on such problems. So, the question is not whether, but how AI serves as a powerful tool for data analysis in academic research, and physics-leaning disciplines in particular.

https://julialang.org/jsoc/gsoc/sciml/
https://zhuanlan.zhihu.com/p/94249675
https://web.stanford.edu/~montanar/index.html
Physics Meets ML
physics forests
Applied Machine Learning Days
DEEP LEARNING FOR MULTIMESSENGER ASTROPHYSICS: REAL-TIME DISCOVERY AT SCALE
Workshop on Science of Data Science | (smr 3283)
Physics & AI Workshop
https://physicsml.github.io/pages/papers.html
Physics-AI opportunities at MIT
https://deepray.github.io/publications.html
https://gogul.dev/software/deep-learning-meets-physics
https://github.com/2prime/ODE-DL/blob/master/DL_Phy.md
https://physics-ai.com/
http://physics.usyd.edu.au/quantum/Coogee2015/Presentations/Svore.pdf
Brains, Minds and Machines Summer Course
deep medcine
http://www.dam.brown.edu/people/mraissi/publications/
http://www.physics.rutgers.edu/gso/SSPAR/
https://community.singularitynet.io/c/education/course-brains-minds-machines
ARTIFICIAL INTELLIGENCE AND PHYSICS
http://inspirehep.net/record/1680302/references
https://www.pnnl.gov/computing/philms/Announcements.stm
https://tacocohen.wordpress.com/
https://cnls.lanl.gov/External/workshops.php
https://www.researchgate.net/profile/Jinlong_Wu3
http://djstrouse.com/
https://www.researchgate.net/scientific-contributions/2135376837_Maurice_Weiler
Spontaneous Symmetry Breaking in Neural Networks
https://physai.sciencesconf.org/

Machine Learning for Physics

Deep Learning in High Energy Physics
Machine Learning for Physics and the Physics of Learning
Machine Learning for Physics
2017 Machine Learning for Physicists, by Florian Marquardt
Machine Learning and the Physical Sciences
Machine Learning in Physics School/Workshop
http://deeplearnphysics.org/

Deep Learning for Physics

https://inspirehep.net/literature/1680302
Master-Seminar - Deep Learning in Physics (IN2107, IN0014)
https://www.ml4science.org/agenda-physics-in-ml
https://www.ias.edu/events/deep-learning-physics
https://dl4physicalsciences.github.io/

Physics for Machine Learning

https://tartakovsky.stanford.edu/research/physics-informed-machine-learning
Physics in Machine Learning Workshop
Physics in Machine Learning Workshop
A Differentiable Physics Engine for Deep Learning
Physics Based Vision meets Deep Learning (PBDL)
Physics-Based Deep Learning
Hamiltonian Neural Networks

Physics Informed Machine Learning

Physics Informed Machine Learning
https://www.pnnl.gov/computing/philms/

https://sites.google.com/view/icml2019phys4dl/schedule
Theoretical Physics for Deep Learning
https://sites.google.com/view/icml2019phys4dl/schedule
Physics Informed Machine Learning Workshop

Physics Informed Deep Learning

Physics Informed Neural Networks
Physics Informed Deep Learning
https://maziarraissi.github.io/research/1_physics_informed_neural_networks/
https://github.com/maziarraissi/PINNs
https://github.com/56aaaaa/Physics-informed-neural-networks
Physics-informed deep learning imaging
https://github.com/DeepNeuralAI/DL-Physics-Neural-Network
https://wasp-sweden.org/

Statistical Mechanics and Deep Learning

The recent striking success of deep neural networks in machine learning raises profound questions about the theoretical principles underlying their success. For example, what can such deep networks compute? How can we train them? How does information propagate through them? Why can they generalize? And how can we teach them to imagine? We review recent work in which methods of physical analysis rooted in statistical mechanics have begun to shed conceptual insights into these questions. These insights yield connections between deep learning and diverse physical and mathematical topics, including random landscapes, spin glasses, jamming, dynamical phase transitions, chaos, Riemannian geometry, random matrix theory, free probability, and nonequilibrium statistical mechanics. Indeed, the fields of statistical mechanics and machine learning have long enjoyed a rich history of strongly coupled interactions, and recent advances at the intersection of statistical mechanics and deep learning suggest these interactions will only deepen going forward.

Statistical Physics of Machine Learning
statistical mechanics // machine learning
A Theoretical Connection Between Statistical Physics and Reinforcement Learning
The thermodynamics of learning
WHY DOES DEEP LEARNING WORK?
WHY DEEP LEARNING WORKS II: THE RENORMALIZATION GROUP
https://github.com/CalculatedContent/ImplicitSelfRegularization
torbenkruegermath
TOWARDS A NEW THEORY OF LEARNING: STATISTICAL MECHANICS OF DEEP NEURAL NETWORKS
Statistical Mechanics of Deep Learning
https://zhuanlan.zhihu.com/p/90096775

Born Machine

Born machine is a Probabilistic Generative Modeling.

Unsupervised Generative Modeling Using Matrix Product States
https://wangleiphy.github.io/talks/BornMachine-USTC.pdf
https://github.com/congzlwag/UnsupGenModbyMPS
https://congzlwag.github.io/UnsupGenModbyMPS/
https://github.com/congzlwag/BornMachineTomo
From Baltzman machine to Born Machine
Born Machines: A fresh approach to quantum machine learning
Gradient based training of Quantum Circuit Born Machine (QCBM)

Quantum Machine learning

Quantum Machine Learning: What Quantum Computing Means to Data Mining explains the most relevant concepts of machine learning, quantum mechanics, and quantum information theory, and contrasts classical learning algorithms to their quantum counterparts.

Combining quantum information and machine learning
machine learning for quantum technology/
https://wangleiphy.github.io/
https://tacocohen.wordpress.com
https://peterwittek.com/qml-in-2015.html
https://github.com/krishnakumarsekar/awesome-quantum-machine-learning
https://peterwittek.com/

Lecture Note on Deep Learning and Quantum Many-Body Computation
Quantum Deep Learning and Renormalization

https://scholar.harvard.edu/madvani/home
https://www.elen.ucl.ac.be/esann/index.php?pg=specsess#statistical
https://krzakala.github.io/cargese.io/program.html
New Theory Cracks Open the Black Box of Deep Learning
Unifying Physics and Deep Learning with TossingBot

Tensor network

Tensor network methods are taking a central role in modern quantum physics and beyond. They can provide an efficient approximation to certain classes of quantum states, and the associated graphical language makes it easy to describe and pictorially reason about quantum circuits, channels, protocols, open systems and more. Our goal is to explain tensor networks and some associated methods as quickly and as painlessly as possible. Beginning with the key definitions, the graphical tensor network language is presented through examples. We then provide an introduction to matrix product states. We conclude the tutorial with tensor contractions evaluating combinatorial counting problems. The first one counts the number of solutions for Boolean formulae, whereas the second is Penrose's tensor contraction algorithm, returning the number of 3-edge-colorings of 3-regular planar graphs.

Linear Algebra and Learning from Data
Accelerating deep neural networks with tensor decompositions
An Algebraic Perspective on Deep Learning
Tensor Networks in a Nutshell
A library for easy and efficient manipulation of tensor networks.
http://tensornetworktheory.org/
https://www.perimeterinstitute.ca/research/research-initiatives/tensor-networks-initiative
https://github.com/emstoudenmire/TNML
http://itensor.org/
http://users.cecs.anu.edu.au/~koniusz/
https://deep-learning-tensorflow.readthedocs.io/en/latest/

Deep Neural Network and Renormalization Group

A Common Logic to Seeing Cats and Cosmos
Neural Network Renormalization Group
WHY DEEP LEARNING WORKS II: THE RENORMALIZATION GROUP
https://guava.physics.uiuc.edu/~nigel/courses/563/Essays_2017/PDF/Luo.pdf
https://rojefferson.blog/2019/08/04/deep-learning-and-the-renormalization-group/
Dealings with Data Physics, Machine Learning and Geometry

Mathematics of Deep Learning

Meeting on Mathematics of Deep Learning
Probability in high dimensions
https://math.ethz.ch/sam/research/reports.html?year=2019
Learning Deep Learning
Summer school on Deep Learning Theory by Weinan E
.520/6.860: Statistical Learning Theory and Applications, Fall 2018
2018上海交通大学深度学习理论前沿研讨会 - 凌泽南的文章 - 知乎
Theories of Deep Learning
https://orion.math.iastate.edu/hliu/MDL/
https://deepmath-conference.com/

A mathematical theory of deep networks and of why they work as well as they do is now emerging. I will review some recent theoretical results on the approximation power of deep networks including conditions under which they can be exponentially better than shallow learning. A class of deep convolutional networks represent an important special case of these conditions, though weight sharing is not the main reason for their exponential advantage. I will also discuss another puzzle around deep networks: what guarantees that they generalize and they do not overfit despite the number of weights being larger than the number of training data and despite the absence of explicit regularization in the optimization?

Deep Neural Networks and Partial Differential Equations: Approximation Theory and Structural Properties Philipp Petersen, University of Oxford

https://memento.epfl.ch/event/a-theoretical-analysis-of-machine-learning-and-par/
http://at.yorku.ca/c/b/p/g/30.htm
https://mat.univie.ac.at/~grohs/
Deep Learning: Theory and Applications (Math 689 Fall 2018)
Topics course Mathematics of Deep Learning, NYU, Spring 18
https://skymind.ai/ebook/Skymind_The_Math_Behind_Neural_Networks.pdf
https://github.com/markovmodel/deeptime
https://omar-florez.github.io/scratch_mlp/
https://joanbruna.github.io/MathsDL-spring19/
https://github.com/isikdogan/deep_learning_tutorials
https://www.brown.edu/research/projects/crunch/machine-learning-x-seminars
Deep Learning: Theory & Practice
https://www.math.ias.edu/wtdl
https://www.ml.tu-berlin.de/menue/mitglieder/klaus-robert_mueller/
https://www-m15.ma.tum.de/Allgemeines/MathFounNN
https://www.math.purdue.edu/~buzzard/MA598-Spring2019/index.shtml
http://mathematics-in-europe.eu/?p=801
https://cims.nyu.edu/~bruna/
https://www.math.ias.edu/wtdl
https://www.pims.math.ca/scientific-event/190722-pcssdlcm
Deep Learning for Image Analysis EMBL COURSE
MATH 6380o. Deep Learning: Towards Deeper Understanding, Spring 2018
Mathematics of Deep Learning, Courant Insititute, Spring 19
http://voigtlaender.xyz/
http://www.mit.edu/~9.520/fall19/
The Mathematics of Deep Learning and Data Science - Programme

Home of Math + Machine Learning + X
Mathematical and Computational Aspects of Machine Learning
Mathematical Theory for Deep Neural Networks
Theory of Deep Learning
DALI 2018 - Data, Learning and Inference
BMS Summer School 2019: Mathematics of Deep Learning
SIAM Conference on Mathematics of Data Science (MDS20)

http://web.cs.ucla.edu/~qgu/research.html
BRIDGING GAME THEORY AND DEEP LEARNING

Discrete Mathematics and Neural Networks

Taming the Curse of Dimensionality: Discrete Integration by Hashing and Optimization
DATA SCIENCE MEETS OPTIMIZATION
http://www.cas.mcmaster.ca/~deza/tokyo2018progr.html
https://www.cs.cornell.edu/~bistra/
Discrete Mathematics of Neural Networks: Selected Topics
Deep Learning in Computational Discrete Optimization
Deep Learning in Discrete Optimization
https://web-app.usc.edu/soc/syllabus/20201/30126.pdf
http://www.columbia.edu/~yf2414/Slides.pdf
http://www.columbia.edu/~yf2414/teach.html
https://opt-ml.org/cfp.html
https://easychair.org/smart-program/CPAIOR2020/index.html

MIP and Deep Learning

http://www.joehuchette.com/
Strong mixed-integer programming formulations for trained neural networks by Joey Huchette1
Deep neural networks and mixed integer linear optimization
Matteo Fischetti, University of Padova
Deep Neural Networks as 0-1 Mixed Integer Linear Programs: A Feasibility Study
https://www.researchgate.net/profile/Matteo_Fischetti
A Mixed Integer Linear Programming Formulation to Artificial Neural Networks
ReLU Networks as Surrogate Models in Mixed-Integer Linear Programs

Numerical Analysis for Deep Learning

Dynamics of deep learning is to consider deep learning as a dynamic system. For example, the forward feedback network is expressed in the recurrent form: $$x^{t+1} = f_t(x^{t}),t\in [0,1,\cdots, T]$$ where $f_t$ is some nonlinear function and $t$ is discrete.

However, it is not easy to select a proper nonlinear function $f_t ,,\forall t\in[0,1,\cdots, T]$ and the number $T$. In another word, there are no unified scientific principle or guide to design the structure of deep neural network models.

Many recursive formula share the same feedback forms or hidden structure, where the next input is the output of previous or historical record or generated points.

401-3650-19L Numerical Analysis Seminar: Mathematics of Deep Neural Network Approximation
http://www.mathcs.emory.edu/~lruthot/talks/
CS 584 / MATH 789R - Numerical Methods for Deep Learning
Numerical methods for deep learning
Short Course on Numerical Methods for Deep Learning
MA 721: Topics in Numerical Analysis: Deep Learning

Physics-Based Deep Learning for Fluid Flow

ResNets

Deep Residual Networks won the 1st places in: ImageNet classification, ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation. It inspired more efficient forward convolutional networks.

They take a standard feed-forward ConvNet and add skip connections that bypass (or shortcut) a few convolution layers at a time. Each bypass gives rise to a residual block in which the convolution layers predict a residual that is added to the block’s input tensor.

https://github.com/KaimingHe/deep-residual-networks
http://torch.ch/blog/2016/02/04/resnets.html
https://zh.gluon.ai/chapter_convolutional-neural-networks/resnet.html
https://www.jiqizhixin.com/articles/042201
http://www.smartchair.org/hp/MSML2020/Paper/
https://github.com/liuzhuang13/DenseNet
https://arxiv.org/abs/1810.11741
Depth with nonlinearity creates no bad local minima in ResNets
LeanConvNets: Low-cost Yet Effective Convolutional Neural Networks

Reversible Residual Network

The Reversible Residual Network: Backpropagation Without Storing Activations
https://ai.googleblog.com/2020/01/reformer-efficient-transformer.html
https://arxiv.org/abs/2001.04451
https://ameroyer.github.io/reading-notes/architectures/2019/05/07/the_reversible_residual_network.html
Layer-Parallel Training of Deep Residual Neural Networks

Differential Equations Motivated Deep Learning Methods

This section is on insight from numerical analysis to inspire more effective deep learning architecture.

Many effective networks can be interpreted as different numerical discretizations of differential equations. This finding brings us a brand new perspective on the design of effective deep architectures.

We show that residual neural networks can be interpreted as discretizations of a nonlinear time-dependent ordinary differential equation that depends on unknown parameters, i.e., the network weights. We show how this insight has been used, e.g., to study the stability of neural networks, design new architectures, or use established methods from optimal control methods for training ResNets. Finally, we discuss open questions and opportunities for mathematical advances in this area.

Path integral approach to random neural networks
NEURAL NETWORKS AS ORDINARY DIFFERENTIAL EQUATIONS
Dynamical aspects of Deep Learning
Dynamical Systems and Deep Learning
https://zhuanlan.zhihu.com/p/71747175
https://web.stanford.edu/~yplu/project.html
https://github.com/2prime/ODE-DL/
Deep Neural Networks Motivated by Partial Differential Equations

https://www.researchgate.net/scientific-contributions/2107227289_Eldad_Haber

Residual networks as discretizations of dynamic systems: $$ Y_1 = Y_0 +h \sigma(K_0 Y_0 + b_0)\ \vdots \ Y_N = Y_{N-1} +h \sigma(K_{N-1} Y_{N-1} + b_{N-1}) $$

This is nothing but a forward Euler discretization of the Ordinary Differential Equation (ODE): $$\partial Y(t)=\sigma(K(t) Y(t) + b(t)), Y(0)=Y_0, t\in[0, T].$$

The goal is to plan a path (via $K$ and $b$) such that the initial data can be linearly separated.

Another idea is to ensure stability by design / constraints on $\sigma$ and $K(t), b(t)$.

ResNet with antisymmetric transformation matrix: $$\partial Y(t)=\sigma([K(t)-K(t)^T] Y(t) + b(t)), Y(0)=Y_0, t\in[0, T].$$

Hamiltonian-like ResNet $$\frac{\mathrm d}{\mathrm d t}(Y(t), Z(t))^T=\sigma[(K(t)Z(t), -K(t)^T Y(t))^T + b(t)], t\in[0, T].$$

Parabolic Residual Neural Networks

$$\partial Y(t)=\sigma(K(t) Y(t) + b(t)), Y(0)=Y_0, t\in[0, T].$$

Hyperbolic Residual Neural Networks

$$\partial Y(t)=\sigma(K(t) Y(t) + b(t)), Y(0)=Y_0, t\in[0, T].$$

Hamiltonian CNN

$$\partial Y(t)=\sigma(K(t) Y(t) + b(t)), Y(0)=Y_0, t\in[0, T].$$

Numerical methods for deep learning
Short Course on Numerical Methods for Deep Learning
Deep Neural Networks Motivated By Ordinary Differential Equations
Continuous Models: Numerical Methods for Deep Learning
Fully Hyperbolic Convolutional Neural Networks
https://eldad-haber.webnode.com/selected-talks/
http://www.mathcs.emory.edu/~lruthot/courses/NumDL/3-NumDNNshort-ContinuousModels.pdf

Numerical differential equation inspired networks: $$Y_{t+1} = (1-k_t)Y_{t-1} + k_t Y_t + h \sigma(K_{t} Y_{t} + b_{t})\tag{Linear multi-step structure}.$$

Bridging Deep Architects and Numerical Differential Equations
BRIDGING DEEP NEURAL NETWORKS AND DIFFERENTIAL EQUATIONS FOR IMAGE ANALYSIS AND BEYOND
Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations
http://bicmr.pku.edu.cn/~dongbin/
https://arxiv.org/pdf/1906.02762.pdf
Neural ODE Paper List

A Multiscale and Multidepth Convolutional Neural Network for Remote Sensing Imagery Pan-Sharpening
https://arxiv.org/abs/1808.02376
Multimodal and Multiscale Deep Neural Networks for the Early Diagnosis of Alzheimer’s Disease using structural MR and FDG-PET images

MgNet

As the solution space is often the dual of the data space in PDEs, the analogous concept of feature space and data space (which are dual to each other) is introduced in CNN. With such connections and new concept in the unified model, the function of various convolution operations and pooling used in CNN can be better understood.

MgNet: A Unified Framework of Multigrid and Convolutional Neural Network
http://www.multigrid.org/img2019/img2019/Index/shortcourse.html
https://deepai.org/machine-learning/researcher/jinchao-xu

MA 721: Topics in Numerical Analysis: Deep Learning
http://www.mathcs.emory.edu/~lruthot/teaching.html
https://www.math.ucla.edu/applied/cam
http://www.mathcs.emory.edu/~lruthot/
Automatic Differentiation of Parallelised Convolutional Neural Networks - Lessons from Adjoint PDE Solvers
A Theoretical Analysis of Deep Neural Networks and Parametric PDEs.
https://raoyongming.github.io/
https://sites.google.com/prod/view/haizhaoyang/
https://github.com/HaizhaoYang
https://www.stat.uchicago.edu/events/rtg/index.shtml

Control Theory and Deep Learning

It arose out of control theory literature when people were trying to identify highly complex and nonlinear dynamical systems. Neural networks – artificial neural networks – were first used in a supervised learning scenario in control theory. Hornik, if I remember correctly, was the first to find that neural networks were universal approximators.

Supervised Deep Learning Problem Given training data, $Y_0$, and labels, $C$, find network parameters $\theta$ and classification weights $W, \mu$ such that the DNN predicts the data-label relationship (and generalizes to new data), i.e., solve $$\operatorname{minimize}_{ \theta,W,\mu} loss[g(W, \mu), C] + regularizer[\theta,W,\mu]$$

This can rewrite in a compact form $$\operatorname{minimize}_{ \theta,W,\mu} loss[g(W(T)Y(T)+\mu), C] + regularizer[\theta,W,\mu]\ \text{subject to }\partial_t Y(t) = f (Y(t), \theta(t)), Y(0) = Y_0.$$

Deep Learning Theory Review: An Optimal Control and Dynamical Systems Perspective
An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks
Dynamic System and Optimal Control Perspective of Deep Learning
A Flexible Optimal Control Framework for Efficient Training of Deep Neural Networks
Deep learning as optimal control problems: models and numerical methods
A Mean-Field Optimal Control Formulation of Deep Learning
Control Theory and Machine Learning
Advancing Systems and Control Research in the Era of ML and AI
http://marcogallieri.micso.it/Home.html
Deep Learning meets Control Theory: Research at NNAISENSE and Polimi
Machine Learning-based Control
CAREER: A Flexible Optimal Control Framework for Efficient Training of Deep Neural Networks
https://www.zhihu.com/question/315809187/answer/623687046
https://www4.comp.polyu.edu.hk/~cslzhang/paper/CVPR19-FOCNet.pdf

Neural Ordinary Differential Equations

Neural ODE

Neural Ordinary Differential Equations

NeuPDE: Neural Network Based Ordinary and Partial Differential Equations for Modeling Time-Dependent Data
Neural Ordinary Differential Equations and Adversarial Attacks
Neural Dynamics and Computation Lab
NeuPDE: Neural Network Based Ordinary and Partial Differential Equations for Modeling Time-Dependent Data
https://math.ethz.ch/sam/research/reports.html?year=2019

Dynamics and Deep Learning

http://roseyu.com/
A Proposal on Machine Learning via Dynamical Systems
http://www.scholarpedia.org/article/Attractor_network
An Empirical Exploration of Recurrent Network Architectures
An Attractor-Based Complexity Measurement for Boolean Recurrent Neural Networks
Deep learning for universal linear embeddings of nonlinear dynamics
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
Continuous attractors of higher-order recurrent neural networks with infinite neurons
https://www.researchgate.net/profile/Jiali_Yu3
Markov Transitions between Attractor States in a Recurrent Neural Network
A Survey on Machine Learning Applied to Dynamic Physical Systems
https://deepdrive.berkeley.edu/project/dynamical-view-machine-learning-systems

Stability For Neural Networks

https://folk.uio.no/vegarant/
https://www.mn.uio.no/math/english/people/aca/vegarant/index.html
https://arxiv.org/pdf/1710.11029.pdf
http://www.vision.jhu.edu/tutorials/ICCV15-Tutorial-Math-Deep-Learning-Raja.pdf
https://arxiv.org/abs/1705.03341
https://izmailovpavel.github.io/
https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Zheng_Improving_the_Robustness_CVPR_2016_paper.pdf

Differential Equation and Deep Learning

This section is on how to use deep learning or more general machine learning to solve differential equation numerically.

We derive upper bounds on the complexity of ReLU neural networks approximating the solution maps of parametric partial differential equations. In particular, without any knowledge of its concrete shape, we use the inherent low-dimensionality of the solution manifold to obtain approximation rates which are significantly superior to those provided by classical approximation results. We use this low dimensionality to guarantee the existence of a reduced basis. Then, for a large variety of parametric partial differential equations, we construct neural networks that yield approximations of the parametric maps not suffering from a curse of dimension and essentially only depending on the size of the reduced basis.

https://math.ethz.ch/sam/research/reports.html?year=2019
https://aimath.org/workshops/upcoming/deeppde/
https://github.com/IBM/pde-deep-learning
https://arxiv.org/abs/1804.04272
https://deepai.org/machine-learning/researcher/weinan-e
https://deepxde.readthedocs.io/en/latest/
https://github.com/IBM/pde-deep-learning
https://github.com/ZichaoLong/PDE-Net
https://github.com/amkatrutsa/DeepPDE
https://github.com/maziarraissi/DeepHPMs
https://github.com/markovmodel/deeptime
https://torchdyn.readthedocs.io/en/latest/index.html
Deep Hidden Physics Models: Deep Learning of Nonlinear Partial Differential Equations
SPNets: Differentiable Fluid Dynamics for Deep Neural Networks
https://maziarraissi.github.io/DeepHPMs/
A Theoretical Analysis of Deep Neural Networks and Parametric PDEs
Deep Approximation via Deep Learning
Diffeq 202020
https://openreview.net/group?id=ICLR.cc/2020/Workshop/DeepDiffEq#accept-poster
https://mcallester.github.io/ttic-31230/Fall2020/

Deep Learning for PDEs

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems
Solving Nonlinear and High-Dimensional Partial Differential Equations via Deep Learning
DGM: A deep learning algorithm for solving partial differential equations
NeuralNetDiffEq.jl: A Neural Network solver for ODEs
PIMS CRG Summer School: Deep Learning for Computational Mathematics

https://arxiv.org/abs/1806.07366
https://mat.univie.ac.at/~grohs/
https://rse-lab.cs.washington.edu/
http://www.ajentzen.de/
https://web.math.princeton.edu/~jiequnh/

$\mathcal H$ matrix and deep learning

In this work we introduce a new multiscale artificial neural network based on the structure of H-matrices. This network generalizes the latter to the nonlinear case by introducing a local deep neural network at each spatial scale. Numerical results indicate that the network is able to efficiently approximate discrete nonlinear maps obtained from discretized nonlinear partial differential equations, such as those arising from nonlinear Schodinger equations and the KohnSham density functional theory.

A multiscale neural network based on hierarchical matrices
A multiscale neural network based on hierarchical nested bases

We aim to build a theoretical foundation for the analysis of deep neural networks to answer questions such as "What are the correct approximation spaces for deep neural networks?", "What is the advantage of deep versus shallow networks?", or "To which extent are deep neural networks able to detect low dimensional structures in high dimensional data?".

https://www.researchgate.net/profile/Gitta_Kutyniok
https://www.researchgate.net/project/Mathematical-Theory-for-Deep-Neural-Networks
https://www.academia-net.org/profil/prof-dr-gitta-kutyniok/1133890
https://www.tu-berlin.de/index.php?id=168945
https://www.math.tu-berlin.de/?108957
Deep Learning: An Introduction for Applied Mathematicians

Stochastic Differential Equations and Deep Learning

Neural Jump SDEs (Jump Diffusions) and Neural PDEs
Deep-Learning Based Numerical BSDE Method for Barrier Options
Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations

Finite Element Methods and Deep Learning

http://www.multigrid.org/index.php?id=13
http://casopisi.junis.ni.ac.rs/index.php/FUMechEng/article/view/309
http://people.math.sc.edu/imi/DASIV/
Deep ReLU Networks and High-Order Finite Element Methods
https://math.psu.edu/events/35992
Neural network for constitutive modelling in finite element analysis
https://arxiv.org/abs/1807.03973
A deep learning approach to estimate stress distribution: a fast and accurate surrogate of finite-element analysis
An Integrated Machine Learning and Finite Element Analysis Framework, Applied to Composite Substructures including Damage
https://github.com/oleksiyskononenko/mlfem
https://people.math.gatech.edu/~wliao60/
https://www.math.tu-berlin.de/fileadmin/i26_fg-kutyniok/Kutyniok/Papers/main.pdf

Approximation Theory for Deep Learning

Universal approximation theory show the expression power of deep neural network of some wide while shallow neural network. The section will extend the approximation to the deep neural network.

We derive fundamental lower bounds on the connectivity and the memory requirements of deep neural networks guaranteeing uniform approximation rates for arbitrary function classes in $L^2(\mathbb R^d)$. In other words, we establish a connection between the complexity of a function class and the complexity of deep neural networks approximating functions from this class to within a prescribed accuracy.

Deep Neural Network Approximation Theory
Approximation Analysis of Convolutional Neural Networks
Deep vs. shallow networks : An approximation theory perspective
Deep Neural Network Approximation Theory
Provable approximation properties for deep neural networks
Optimal Approximation with Sparsely Connected Deep Neural Networks
Deep Learning: Approximation of Functions by Composition
Deep Neural Networks: Approximation Theory and Compositionality
DNN Bonn
From approximation theory to machine learning
Collapse of Deep and Narrow Neural Nets
Nonlinear Approximation and (Deep) ReLU Networks
Deep Approximation via Deep Learning
Convolutional Neural Networks for Steady Flow Approximation
https://www.eurandom.tue.nl/wp-content/uploads/2018/11/Johannes-Schmidt-Hieber-lecture-1-2.pdf
https://arxiv.org/abs/2006.00294
Efficient approximation of high-dimensional functions with deep neural networks

The F-Principle

Understanding the training process of Deep Neural Networks (DNNs) is a fundamental problem in the area of deep learning. The study of the training process from the frequency perspective makes important progress in understanding the strength and weakness of DNN, such as generalization and converging speed etc., which may consist in “a reasonably complete picture about the main reasons behind the success of modern machine learning” (E et al., 2019).

The “Frequency Principle” was first named in the paper (Xu et al., 2018), then (Xu 2018; Xu et al., 2019) use more convincing experiments and a simple theory to demonstrate the university of the Frequency Principle. Bengio's paper (Rahaman et al., 2019) also uses the the simple theory in (Xu 2018; Xu et al., 2019) to understand the mechanism underlying the Frequency Principle for ReLU activation function. Note that the second version of Rahaman et al., (2019) points out this citation clearly but they reorganize this citation to “related works” in the final version. Later, Luo et al., (2019) studies the Frequency Principle in the general setting of deep neural networks and mathematically proves Frequency Principle with the assumption of infinite samples. Zhang et al., (2019) study the Frequency Principle in the NTK regime with finite sample points. Zhang et al., (2019) explicitly shows that the converging speed for each frequency and can accurately predict the learning results.

We aim to develop a theoretical framework on Fourier domain to analyze the Deep Neural Network (DNN) training process and understand the DNN generalization. We exemplified our theoretical results through DNNs fitting 1-d functions and the MNIST dataset.

Deep learning in Fourier domain
Deep Learning Theory: The F-Principle and An Optimization Framework
Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks
Nonlinear Collaborative Scheme for Deep Neural Networks
The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies
Frequency Principle in Deep Learning with General Loss Functions and Its Potential Application
Theory of the Frequency Principle for General Deep Neural Networks
Explicitizing an Implicit Bias of the Frequency Principle in Two-layer Neural Networks
https://www.researchgate.net/profile/Zhiqin_Xu
https://github.com/xuzhiqin1990/F-Principle

Spline Theory and Deep Network

http://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf
http://proceedings.mlr.press/v80/balestriero18a/balestriero18a.pdf
http://rb42.web.rice.edu/
https://github.com/RandallBalestriero
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1838177&HistoricalAwards=false
A Max-Affine Spline Perspective of Recurrent Neural Networks (RNNs)
https://zw16.web.rice.edu/

Resource

Workshop

https://www.mfo.de/occasion/1842b
https://www.mfo.de/occasion/1947a
https://github.com/juliusberner/oberwolfach_workshop
DGD Approximation Theory Workshop

Labs and Groups

https://deepai.org/profile/julius-berner
https://www.cityu.edu.hk/ma/people/profile/zhoudx.htm
https://dblp.uni-trier.de/pers/hd/y/Yang:Haizhao
https://math.duke.edu/people/ingrid-daubechies
http://www.pc-petersen.eu/
https://wwwhome.ewi.utwente.nl/~schmidtaj/
https://personal-homepages.mis.mpg.de/montufar/
https://www.math.tamu.edu/~foucart/
http://www.damtp.cam.ac.uk/user/sl767/#about
http://voigtlaender.xyz/publications.html- https://ins.sjtu.edu.cn/people/xuzhiqin/

Inverse Problem and Deep Learning

There is a long history of algorithmic development for solving inverse problems arising in sensing and imaging systems and beyond. Examples include medical and computational imaging, compressive sensing, as well as community detection in networks. Until recently, most algorithms for solving inverse problems in the imaging and network sciences were based on static signal models derived from physics or intuition, such as wavelets or sparse representations.

Today, the best performing approaches for the aforementioned image reconstruction and sensing problems are based on deep learning, which learn various elements of the method including i) signal representations, ii) stepsizes and parameters of iterative algorithms, iii) regularizers, and iv) entire inverse functions. For example, it has recently been shown that solving a variety of inverse problems by transforming an iterative, physics-based algorithm into a deep network whose parameters can be learned from training data, offers faster convergence and/or a better quality solution. Moreover, even with very little or no learning, deep neural networks enable superior performance for classical linear inverse problems such as denoising and compressive sensing. Motivated by those success stories, researchers are redesigning traditional imaging and sensing systems.

MATH + X SYMPOSIUM ON INVERSE PROBLEMS AND DEEP LEARNING IN SPACE EXPLORATION

Sixteenth International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research
https://github.com/mughanibu/Deep-Learning-for-Inverse-Problems
Accurate Image Super-Resolution Using Very Deep Convolutional Networks
https://eiffl.github.io/talks/KMI2020/index.html
https://earthscience.rice.edu/mathx2019/
https://www.researchgate.net/publication/329395098_On_Deep_Learning_for_Inverse_Problems
Deep Learning and Inverse Problem
https://www.scec.org/publication/8768
https://amds123.github.io/
https://github.com/IPAIopen
https://imaginary.org/snapshot/deep-learning-and-inverse-problems
https://www.researchgate.net/scientific-contributions/2150388821_Jaweria_Amjad
https://zif.ai/inverse-reinforcement-learning/
Physics Based Machine Learning for Inverse Problems
https://www.ece.nus.edu.sg/stfpage/elechenx/Papers/TGRS_Learning.pdf

Deep Learning for Inverse Problems

Learning-based methods, and in particular deep neural networks, have emerged as highly successful and universal tools for image and signal recovery and restoration. They achieve state-of-the-art results on tasks ranging from image denoising, image compression, and image reconstruction from few and noisy measurements. They are starting to be used in important imaging technologies, for example in GEs newest computational tomography scanners and in the newest generation of the iPhone.

The field has a range of theoretical and practical questions that remain unanswered. In particular, learning and neural network-based approaches often lack the guarantees of traditional physics-based methods. Further, while superior on average, learning-based methods can make drastic reconstruction errors, such as hallucinating a tumor in an MRI reconstruction or turning a pixelated picture of Obama into a white male.

Deep Learning for Inverse Problems
Solving inverse problems with deep networks
Neumann Networks for Inverse Problems in Imaging
Deep Decomposition Learning for Inverse Imaging Problems
Model Meets Deep Learning in Image Inverse Problems
https://deepai.org/publication/unsupervised-deep-learning-algorithm-for-pde-based-forward-and-inverse-problems
https://www.aapm.org/GrandChallenge/DL-sparse-view-CT/
https://github.com/jiupinjia/GANs-for-Inverse-Problems
https://eiffl.github.io/talks/KMI2020/index.html
https://onlinelibrary.wiley.com/doi/epdf/10.1002/qua.26599
Deep neural networks learning to solve nonlinear inverse problems for the wave equation

Deep Inverse Optimization

deep inverse optimization
https://ori.ox.ac.uk/deep-irl/
Attacking inverse problems with deep learning
Workshop: Learning Meets Combinatorial Algorithms
https://data102.org/fa20/
https://ieeexplore.ieee.org/document/8434321
https://sites.google.com/site/sercmig/manish-bhatt_-phd-thesis-work

Random Matrix Theory and Deep Learning

Random matrix focus on the matrix, whose entities are sampled from some specific probability distribution. Weight matrices in deep neural network are initialed in random. However, the model is over-parameterized and it is hard to verify the role of one individual parameter.

http://romaincouillet.hebfree.org/
https://zhenyu-liao.github.io/
https://dionisos.wp.imt.fr/
https://project.inria.fr/paiss/
https://zhenyu-liao.github.io/activities/
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Recent Advances in Random Matrix Theory for Modern Machine Learning
Features extraction using random matrix theory
Nonlinear random matrix theory for deep learning
A RANDOM MATRIX APPROACH TO NEURAL NETWORKS
A Random Matrix Approach to Echo-State Neural Networks
Harnessing neural networks: A random matrix approach
Tensor Programs: A Swiss-Army Knife for Nonlinear Random Matrix Theory of Deep Learning and Beyond
Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation
Random Matrix Theory and its Innovative Applications∗
https://romaincouillet.hebfree.org/docs/conf/ELM_icassp.pdf
https://romaincouillet.hebfree.org/docs/conf/NN_ICML.pdf
http://www.vision.jhu.edu/tutorials/CVPR16-Tutorial-Math-Deep-Learning-Raja.pdf
A Random Matrix Framework for BigData Machine Learning

Nonlinear Random Matrix Theory

https://ai.google/research/pubs/pub46342
http://people.cs.uchicago.edu/~pworah/nonlinear_rmt.pdf
A SWISS-ARMY KNIFE FOR NONLINEAR RANDOM MATRIX THEORY OF DEEP LEARNING AND BEYOND
https://simons.berkeley.edu/talks/9-24-mahoney-deep-learning
https://cs.stanford.edu/people/mmahoney/
https://www.stat.berkeley.edu/~mmahoney/f13-stat260-cs294/
https://arxiv.org/abs/1902.04760
https://melaseddik.github.io/
https://thayafluss.github.io/

Deep learning and Optimal Transport

Optimal transport (OT) provides a powerful and flexible way to compare probability measures, of all shapes: absolutely continuous, degenerate, or discrete. This includes of course point clouds, histograms of features, and more generally datasets, parametric densities or generative models. Originally proposed by Monge in the eighteenth century, this theory later led to Nobel Prizes for Koopmans and Kantorovich as well as Villani’s Fields Medal in 2010.

Optimal Transport & Machine Learning
Topics on Optimal Transport in Machine Learning and Shape Analysis (OT.ML.SA)
https://www-obelix.irisa.fr/files/2017/01/postdoc-Obelix.pdf
http://www.cis.jhu.edu/~rvidal/talks/learning/StructuredFactorizations.pdf
https://mc.ai/optimal-transport-theory-the-new-math-for-deep-learning/
https://www.louisbachelier.org/wp-content/uploads/2017/07/170620-ilb-presentation-gabriel-peyre.pdf
http://people.csail.mit.edu/davidam/
https://www.birs.ca/events/2020/5-day-workshops/20w5126
https://github.com/hindupuravinash/nips2017
Selection dynamics for deep neural networks
https://people.math.osu.edu/memolitechera.1/index.html
Optimal Transport Theory the New Math for Deep Learning

Generative Models and Optimal Transport

https://www.researchgate.net/publication/317378242_GAN_and_VAE_from_an_Optimal_Transport_Point_of_View
https://arxiv.org/abs/1710.05488
http://www.dataguru.cn/article-14562-1.html
http://cmsa.fas.harvard.edu/wp-content/uploads/2018/06/David_Gu_Harvard.pdf
http://www.dataguru.cn/article-14563-1.html
http://games-cn.org/games-webinar-20190509-93/
https://www3.cs.stonybrook.edu/~gu/

Geometric Analysis Approach to AI

Why and how that deep learning works well on different tasks remains a mystery from a theoretical perspective. In this paper we draw a geometric picture of the deep learning system by finding its analogies with two existing geometric structures, the geometry of quantum computations and the geometry of the diffeomorphic template matching. In this framework, we give the geometric structures of different deep learning systems including convolutional neural networks, residual networks, recursive neural networks, recurrent neural networks and the equilibrium prapagation framework. We can also analysis the relationship between the geometrical structures and their performance of different networks in an algorithmic level so that the geometric framework may guide the design of the structures and algorithms of deep learning systems.

Machine Learning on Geometrical Data CSE291-C00 - Winter 2019
Geometric Analysis Approach to AI Workshop
ABC Dataset A Big CAD Model Dataset For Geometric Deep Learning
Into the Wild: Machine Learning In Non-Euclidean Spaces
How deep learning works — The geometry of deep learning
http://cmsa.fas.harvard.edu/geometric-analysis-ai/
http://inspirehep.net/record/1697651
https://diglib.eg.org/handle/10.2312/2631996
http://ubee.enseeiht.fr/skelneton/
https://biomedicalimaging.org/2019/tutorials/
Geometric View to Deep Learning
GEOMETRIC IDEAS IN MACHINE LEARNING: FROM DEEP LEARNING TO INCREMENTAL OPTIMIZATION
Deep Learning Theory: Geometric Analysis of Capacity, Optimization, and Generalization for Improving Learning in Deep Neural Networks
Workshop IV: Deep Geometric Learning of Big Data and Applications
Robustness and geometry of deep neural networks
A geometric view of optimal transportation and generative model
GeoNet: Deep Geodesic Networks for Point Cloud Analysis
http://www.stat.uchicago.edu/~lekheng/
https://web.mat.upc.edu/sebastia.xambo/ICIAM2019/GC&DL.html
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1418255
https://nsf-tripods.org/institutes/
https://users.math.msu.edu/users/wei/
https://www.darpa.mil/program/hierarchical-identify-verify-exploit
https://deepai.org/profile/randall-balestriero
https://github.com/digantamisra98/Library

Loss Surface Of Deep Networks

http://www.tianranchen.org/research/papers/deep-linear.pdf
The Loss Surfaces of Multilayer Networks
The Loss Surface Of Deep Linear Networks Viewed Through The Algebraic Geometry Lens
The Loss Surface of Deep and Wide Neural Networks
Visualizing the Loss Landscape of Neural Nets
Understanding the Loss Surface of Neural Networks for Binary Classification
Optimization Landscape and Expressivity of Deep CNNs
On the Flatness of Loss Surface for Two-layered ReLU Networks
https://www.cs.umd.edu/~tomg/projects/landscapes/
Deep Learning without Poor Local Minima
https://chulheey.mit.edu/wp-content/uploads/sites/12/2017/12/yun2017global_nips2017workshop.pdf

Tropical Geometry of Deep Neural Networks

The basic idea of tropical geometry is to study the same kinds of questions as in standard algebraic geometry, but change what we mean when we talk about ‘polynomial equations’.

Tropical Geometry of Deep Neural Networks
https://opendatagroup.github.io/data%20science/2019/04/11/tropical-geometry.html
https://www.stat.uchicago.edu/~lekheng/
https://mathsites.unibe.ch/siamag19/
https://www.math.ubc.ca/~erobeva/seminar.html
https://sites.google.com/view/maag2019/home
https://sites.google.com/site/feliper84/
https://deepai.org/publication/a-tropical-approach-to-neural-networks-with-piecewise-linear-activations
ReLu and Maxout Networks and Their Possible Connections to Tropical Methods
Applications of Tropical Geometry in Deep Neural Networks

Topology and Deep Learning

We perform topological data analysis on the internal states of convolutional deep neural networks to develop an understanding of the computations that they perform. We apply this understanding to modify the computations so as to (a) speed up computations and (b) improve generalization from one data set of digits to another. One byproduct of the analysis is the production of a geometry on new sets of features on data sets of images, and use this observation to develop a methodology for constructing analogues of CNN's for many other geometries, including the graph structures constructed by topological data analysis.

https://keuperj.github.io/DeToL/
Topological Methods for Machine Learning
Topological Approaches to Deep Learning
Topological Data Analysis and Beyond
https://www.gaotingran.com/
Topology based deep learning for biomolecular data
RESEARCH ARTICLE TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions
Exposition and Interpretation of the Topology of Neural Networks
https://zhuanlan.zhihu.com/p/26515275
Towards a topological–geometrical theory of group equivariant non-expansive operators for data analysis and machine learning
https://github.com/FatemehTarashi/awesome-tda
Graph Machine Learning using 3D Topological Models
A Stable Multi-Scale Kernel for Topological Machine Learning
https://github.com/Chen-Cai-OSU/Topology-and-Learning
Topology Optimization based Graph Convolutional Network

Topological machine learning

https://www.birs.ca/workshops/2012/12w5081/report12w5081.pdf
http://cunygc.appliedtopology.nyc/
https://www.sthu.org/research/topmachinelearning/
https://arxiv.org/abs/2003.04584
Multiparameter Persistence Images for Topological Machine Learning
Topological Data Analysis
Understanding Bias in Datasets using Topological Data Analysis
Optimal Transport, Topological Data Analysis and Applications to Shape and Machine Learning
https://tgda.osu.edu/ot-tda-workshop/
Topology and Machine Learning
A Topology Layer for Machine Learning
giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration
A Stable Multi-Scale Kernel for Topological Machine Learning
Topological Machine Learning for Multivariate Time Series
https://sites.google.com/site/nips2012topology/
Persistence Images: A Stable Vector Representation of Persistent Homology
Topological data analysis of zebrafish patterns
https://elib.dlr.de/128105/1/MIMA_IGRASS_2019.pdf
Using topological data analysis for diagnosis pulmonary embolism
Persistence Bag-of-Words for Topological Data Analysis

Topology Optimization and Deep Learning

A Deep Learning Design for improving Topology Coherence in Blood Vessel Segmentation
https://www.dbs.ifi.lmu.de/~tresp/
A Novel Topology Optimization Approach using Conditional Deep Learning
A deep Convolutional Neural Network for topology optimization with strong generalization ability
https://www.researchgate.net/publication/322568237_Deep_learning_for_determining_a_near-optimal_topological_design_without_any_iteration
Topology Optimization Accelerated by Deep Learning

Deep Learning with Topological Data Analysis

Topological Data Analysis Based Approaches to Deep Learning
Topological Measurement of Deep Neural Networks Using Persistent Homology
https://isaim2020.cs.ou.edu/
https://graphnav.stanford.edu/
Applying Topological Persistence in Convolutional Neural Network for Music Audio Signals

Deep Learning with Topological Layer

Topological Layer is used to extract the feature via topological data analysis.

Deep Learning with Topological Signatures
http://machinelearning.math.rs/Jekic-TDA.pdf
Feature Extraction Using Topological Data Analysis for Machine Learning and Network Science Applications
Mixing Topology and Deep Learning with PersLay
https://arxiv.org/pdf/1904.09378.pdf
https://github.com/bruel-gabrielsson/TopologyLayer
A Topology Layer for Machine Learning
Improved Image Classification using Topological Persistence

Topological Graph Neural Networks

https://arxiv.org/abs/2102.07835
Topology Optimization based Graph Convolutional Network
Persistence Enhanced Graph Neural Network
https://www.ijcai.org/Proceedings/2019/0550.pdf

Topology-Based Graph Classification

Recent Advances in Topology-Based Graph Classification
https://arxiv.org/abs/1911.06892
A Persistent Weisfeiler–Lehman Procedure for Graph Classification
A General Neural Network Architecture for Persistence Diagrams and Graph Classification
http://homepages.cs.ncl.ac.uk/stephen.mcgough/CV/Papers/2016/GFPX2-slides.pdf
https://ieeexplore.ieee.org/document/7840988
https://bastian.rieck.me/talks/AMLD2020_Slides.pdf
Topological based classification of paper domains using graph convolutional networks

Algebra and Deep Learning

Except the matrix and tensor decomposotion for accelerating the deep neural network, Tensor network is close to deep learning model.

http://people.cs.uchicago.edu/~risi/
https://ttic.uchicago.edu/~shubhendu/

Group Equivariant Convolutional Networks

https://github.com/tscohen/gconv_experiments
http://dalimeeting.org/dali2019b/workshop-05-02.html
https://erikbekkers.bitbucket.io/
https://staff.fnwi.uva.nl/m.welling/
https://www.ics.uci.edu/~welling/
http://ibis.t.u-tokyo.ac.jp/suzuki/
http://www.mit.edu/~kawaguch/
https://www.4tu.nl/ami/en/Agenda-Events/

Complex Valued Neural Networks

Aizenberg, Ivaskiv, Pospelov and Hudiakov (1971) (former Soviet Union) proposed a complex-valued neuron model for the first time, and although it was only available in Russian literature, their work can now be read in English (Aizenberg, Aizenberg & Vandewalle, 2000). Prior to that time, most researchers other than Russians had assumed that the first persons to propose a complex-valued neuron were Widrow, McCool and Ball (1975). Interest in the field of neural networks started to grow around 1990, and various types of complex-valued neural network models were subsequently proposed. Since then, their characteristics have been researched, making it possible to solve some problems which could not be solved with the real-valued neuron, and to solve many complicated problems more simply and efficiently.

http://what-when-how.com/artificial-intelligence/complex-valued-neural-networks-artificial-intelligence/
The 10th International Conference on Complex Networks and their Applications

The complex-valued Neural Network is an extension of a (usual) real-valued neural network, whose input and output signals and parameters such as weights and thresholds are all complex numbers (the activation function is inevitably a complex-valued function).

https://staff.aist.go.jp/tohru-nitta/HNN.html
https://staff.aist.go.jp/tohru-nitta/CNN.html
https://github.com/ChihebTrabelsi/deep_complex_networks
https://r2rt.com/beyond-binary-ternary-and-one-hot-neurons.html
https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2011-42.pdf
https://www.microsoft.com/en-us/research/uploads/prod/2018/04/Deep-Complex-Networks.pdf
https://core.ac.uk/reader/41356536

Quaternion Neural Networks

It looks like Deep (Convolutional) Neural Networks are really powerful. However, there are situations where they don’t deliver as expected. I assume that perhaps many are happy with pre-trained VGG, Resnet, YOLO, SqueezeNext, MobileNet, etc. models because they are “good enough”, even though they break quite easily on really realistic problems and require tons of training data. IMHO there are much smarter approaches out there, which are neglected/ignored. I don’t want to argue why they are ignored but I want to provide a list with other useful architectures.

Instead of staying with real numbers, we should have a look at complex numbers as well. Let’s remember the single reason why we use complex numbers ($C$) or quaternions ($\mathcal H$). The most important reason why we use complex numbers is not to solve $x^2=−1$. The reason why we use complex numbers for everything that involves waves etc. is that we are lazy or efficient ;). Who wants to waste time writing down and solving a bunch of trignometric identities. The same is true for quaternions in robotics. Speaking in terms of computer science, we are using a much more efficient data structure/representation. It seems like complex valued neural networks as well as quaternion, which are a different kind of complex numbers for the mathematical correct reader of this post, seem to outperform real valued neural networks while using less parameters. This makes sense because we are using a different data structure that itself helps to represent certain things in a much more useful way.

https://arxiv.org/abs/1903.08478
Introduction to Quaternion Neural Networks
Capsule Networks and other neural architectures that are less known
https://github.com/Orkis-Research/Quaternion-Convolutional-Neural-Networks-for-End-to-End-Automatic-Speech-Recognition

Probabilistic Theory and Deep Learning

Probabilistic Framework for Deep Learning
A Probabilistic Theory of Deep Learning
A Probabilistic Framework for Deep Learning
Deep Probabilistic Programming
https://github.com/oxmlcs/ML_bazaar/wiki/Deep-Learning-and-Probabilistic-Inference
https://eng.uber.com/pyro/
Probabilistic Deep Learning with Python
https://livebook.manning.com/book/probabilistic-deep-learning/
http://csml.stats.ox.ac.uk/
https://fcai.fi/agile-probabilistic
http://bayesiandeeplearning.org/2017/papers/59.pdf
GluonTS: Probabilistic Time Series Models in Python
CS 731: Advanced methods in artificial intelligence, with biomedical applications (Fall 2009)
CS 838 (Spring 2004): Statistical Relational Learning
https://www.ida.liu.se/~ulfni53/lpp/bok/bok.pdf
https://www.biostat.wisc.edu/bmi576/
http://www.cs.ox.ac.uk/people/yarin.gal/website/blog_2248.html

Probabilistic Deep Learning

Probabilistic Deep Learning with Python teaches the increasingly popular probabilistic approach to deep learning that allows you to tune and refine your results more quickly and accurately without as much trial-and-error testing. Emphasizing practical techniques that use the Python-based Tensorflow Probability Framework, you’ll learn to build highly-performant deep learning applications that can reliably handle the noise and uncertainty of real-world data.

Lightweight Probabilistic Deep Networks
https://tensorchiefs.github.io/dl_book/

Bayesian Deep Learning

The abstract of Bayesian Deep learning put that:

While deep learning has been revolutionary for machine learning, most modern deep learning models cannot represent their uncertainty nor take advantage of the well studied tools of probability theory. This has started to change following recent developments of tools and techniques combining Bayesian approaches with deep learning. The intersection of the two fields has received great interest from the community over the past few years, with the introduction of new deep learning models that take advantage of Bayesian techniques, as well as Bayesian models that incorporate deep learning elements [1-11]. In fact, the use of Bayesian techniques in deep learning can be traced back to the 1990s’, in seminal works by Radford Neal [12], David MacKay [13], and Dayan et al. [14]. These gave us tools to reason about deep models’ confidence, and achieved state-of-the-art performance on many tasks. However earlier tools did not adapt when new needs arose (such as scalability to big data), and were consequently forgotten. Such ideas are now being revisited in light of new advances in the field, yielding many exciting new results Extending on last year’s workshop’s success, this workshop will again study the advantages and disadvantages of such ideas, and will be a platform to host the recent flourish of ideas using Bayesian approaches in deep learning and using deep learning tools in Bayesian modelling. The program includes a mix of invited talks, contributed talks, and contributed posters. It will be composed of five themes: deep generative models, variational inference using neural network recognition models, practical approximate inference techniques in Bayesian neural networks, applications of Bayesian neural networks, and information theory in deep learning. Future directions for the field will be debated in a panel discussion. This year’s main theme will focus on applications of Bayesian deep learning within machine learning and outside of it.

Kingma, DP and Welling, M, "Auto-encoding variational Bayes", 2013.
Rezende, D, Mohamed, S, and Wierstra, D, "Stochastic backpropagation and approximate inference in deep generative models", 2014.
Blundell, C, Cornebise, J, Kavukcuoglu, K, and Wierstra, D, "Weight uncertainty in neural network", 2015.
Hernandez-Lobato, JM and Adams, R, "Probabilistic backpropagation for scalable learning of Bayesian neural networks", 2015.
Gal, Y and Ghahramani, Z, "Dropout as a Bayesian approximation: Representing model uncertainty in deep learning", 2015.
Gal, Y and Ghahramani, G, "Bayesian convolutional neural networks with Bernoulli approximate variational inference", 2015.
Kingma, D, Salimans, T, and Welling, M. "Variational dropout and the local reparameterization trick", 2015.
Balan, AK, Rathod, V, Murphy, KP, and Welling, M, "Bayesian dark knowledge", 2015.
Louizos, C and Welling, M, “Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors”, 2016.
Lawrence, ND and Quinonero-Candela, J, “Local distance preservation in the GP-LVM through back constraints”, 2006.
Tran, D, Ranganath, R, and Blei, DM, “Variational Gaussian Process”, 2015.
Neal, R, "Bayesian Learning for Neural Networks", 1996.
MacKay, D, "A practical Bayesian framework for backpropagation networks", 1992.
Dayan, P, Hinton, G, Neal, R, and Zemel, S, "The Helmholtz machine", 1995.
Wilson, AG, Hu, Z, Salakhutdinov, R, and Xing, EP, “Deep Kernel Learning”, 2016.
Saatchi, Y and Wilson, AG, “Bayesian GAN”, 2017.
MacKay, D.J.C. “Bayesian Methods for Adaptive Models”, PhD thesis, 1992.

Towards Bayesian Deep Learning: A Framework and Some Existing Methods
http://www.wanghao.in/mis.html
https://github.com/junlulocky/bayesian-deep-learning-notes
https://github.com/robi56/awesome-bayesian-deep-learning
https://alexgkendall.com/computer_vision/phd_thesis/
http://bayesiandeeplearning.org/
https://ericmjl.github.io/bayesian-deep-learning-demystified/
http://www.cs.ox.ac.uk/people/yarin.gal/website/blog.html
http://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/
https://uvadlc.github.io/lectures/apr2019/lecture9-bayesiandeeplearning.pdf
Self-supervised Bayesian Deep Learning for Image Recovery with Applications to Compressive Sensing

Statistics and Deep Learning

A History of Deep Learning

Mathematician Ivakhnenko and associates including Lapa arguably created the first working deep learning networks in 1965, applying what had been only theories and ideas up to that point.

Ivakhnenko developed the Group Method of Data Handling (GMDH) – defined as a “family of inductive algorithms for computer-based mathematical modeling of multi-parametric datasets that features fully automatic structural and parametric optimization of models” – and applied it to neural networks.

For that reason alone, many consider Ivakhnenko the father of modern deep learning.

His learning algorithms used deep feedforward multilayer perceptrons using statistical methods at each layer to find the best features and forward them through the system.

Using GMDH, Ivakhnenko was able to create an 8-layer deep network in 1971, and he successfully demonstrated the learning process in a computer identification system called Alpha.

https://zhuanlan.zhihu.com/p/36519666
https://wwwhome.ewi.utwente.nl/~schmidtaj/
http://csml.stats.ox.ac.uk/people/teh/
http://www.sdlcv-workshop.com/
https://gkunapuli.github.io/files/17rrbmILP-longslides.pdf
https://arxiv.org/abs/1810.07132
https://dashee87.github.io/
http://lear.inrialpes.fr/workshop/osl2015/
http://www.stats.ox.ac.uk/~teh/
http://blog.shakirm.com/ml-series/a-statistical-view-of-deep-learning/
http://blog.shakirm.com/wp-content/uploads/2015/07/SVDL.pdf
https://www.ijcai.org/Proceedings/2019/0789.pdf
http://www.stat.ucla.edu/~jxie/
https://mifods.mit.edu/seminar.php
https://johanneslederer.com/people/
https://www.tsu.ge/data/file_db/faculty_zust_sabunebismetk/WEB%20updated%205.05.15-announcement.pdf
On Statistical Thinking in Deep Learning: A Talk
On Statistical Thinking in Deep Learning: A Blog Post
Implementing Bayesian Inference with Neural Networks

Statistical Relational AI

Handling inherent uncertainty and exploiting compositional structure are fundamental to understanding and designing large-scale systems. Statistical relational learning builds on ideas from probability theory and statistics to address uncertainty while incorporating tools from logic, databases, and programming languages to represent structure. In Introduction to Statistical Relational Learning, leading researchers in this emerging area of machine learning describe current formalisms, models, and algorithms that enable effective and robust reasoning about richly structured systems and data.

Statistical Relational AI Meets Deep Learning
https://people.cs.kuleuven.be/~luc.deraedt/salvador.pdf
http://www.starai.org/2020/
https://homes.cs.washington.edu/~pedrod/cikm13.html
https://www.cs.umd.edu/srl-book/
https://gkunapuli.github.io/
https://aifrenz.github.io/
https://ipvs.informatik.uni-stuttgart.de/mlr/spp-wordpress/
https://personal.utdallas.edu/~sriraam.natarajan/Courses/starai.html
http://acai2018.unife.it/
https://www.biostat.wisc.edu/~page/838.html

Principal Component Neural Networks

Nonlinear principal component analysis (NLPCA) is commonly seen as a nonlinear generalization of standard principal component analysis (PCA). It generalizes the principal components from straight lines to curves (nonlinear). Thus, the subspace in the original data space which is described by all nonlinear components is also curved. Nonlinear PCA can be achieved by using a neural network with an autoassociative architecture also known as autoencoder, replicator network, bottleneck or sandglass type network. Such autoassociative neural network is a multi-layer perceptron that performs an identity mapping, meaning that the output of the network is required to be identical to the input. However, in the middle of the network is a layer that works as a bottleneck in which a reduction of the dimension of the data is enforced. This bottleneck-layer provides the desired component values (scores).

http://www.nlpca.org/
http://users.ics.aalto.fi/~juha/papers/Generalizations_NN_1995.pdf
https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture18-pca.pdf
http://www.vision.jhu.edu/teaching/learning/deeplearning19/assets/Baldi_Hornik-89.pdf
https://www.cs.purdue.edu/homes/dgleich/projects/pca_neural_nets_website/
https://rdrr.io/cran/caret/man/pcaNNet.html
http://research.ics.aalto.fi/ica/book/

Least squares support vector machines

https://www.esat.kuleuven.be/sista/lssvmlab/
https://zhenyu-liao.github.io/pdf/journal/LSSVM-TSP.pdf
https://sci2s.ugr.es/keel/pdf/specific/articulo/vs04.pdf
https://www.esat.kuleuven.be/sista/members/suykens.html

Information Theory and Deep Learning

In short, Neural Networks extract from the data the most relevant part of the information that describes the statistical dependence between the features and the labels. In other words, the size of a Neural Networks specifies a data structure that we can compute and store, and the result of training the network is the best approximation of the statistical relationship between the features and the labels that can be represented by this data structure.

https://github.com/adityashrm21/information-theory-deep-learning
https://infotheory.ece.uw.edu/research.html

Information Theory of Deep Learning
Anatomize Deep Learning with Information Theory
“Deep learning - Information theory & Maximum likelihood.”
Information Theoretic Interpretation of Deep Neural Networks
https://naftali-tishby.mystrikingly.com/

http://pirsa.org/18040050
https://lizhongresearch.miraheze.org/wiki/Main_Page
https://lizhongzheng.mit.edu/
https://www.leiphone.com/news/201703/qzBcOeDYFHtYwgEq.html
http://nsfcbl.org/
Large Margin Deep Neural Networks: Theory and Algorithms
http://ai.stanford.edu/
https://www.math.ias.edu/wtdl
DEEP 3D REPRESENTATION LEARNING
https://www.mis.mpg.de/ay/index.html
Mathematical Algorithms for Artificial Intelligence and Big Data Analysis (Spring 2017)
https://www.tbsi.edu.cn/index.php?s=/cms/181.html
https://www.bigr.io/deep-learning-neural-networks-iot/

https://www.ee.ucl.ac.uk/iiml//projects/it_foundations.html
https://www.isi.edu/~gregv/ijcai/
https://arxiv.org/abs/1804.09060
https://people.eng.unimelb.edu.au/jmanton/static/pdf/ISIT2020_preprint.pdf
http://proceedings.mlr.press/v80/chen18j/chen18j.pdf
https://arxiv.org/pdf/1503.02406.pdf
https://stat.mit.edu/calendar/gregory-wornell/
http://www.mit.edu/~a_makur/publications.html
https://www.rle.mit.edu/sia/publications/
https://www.rle.mit.edu/sia/
https://xiangxiangxu.com/

Universal Feature Selection

In this talk, we formulate a new problem called the "universal feature selection" problem, where we need to select from the high dimensional data a low dimensional feature that can be used to solve, not one, but a family of inference problems. We solve this problem by developing a new information metric that can be used to quantify the semantics of data, and by using a geometric analysis approach. We then show that a number of concepts in information theory and statistics such as the HGR correlation and common information are closely connected to the universal feature selection problem. At the same time, a number of learning algorithms, PCA, Compressed Sensing, FM, deep neural networks, etc., can also be interpreted as implicitly or explicitly solving the same problem, with various forms of constraints.

Universal Features
https://glouppe.github.io/info8010-deep-learning/
http://ita.ucsd.edu/
http://naftali-tishby.mystrikingly.com/
http://lizhongzheng.mit.edu/
The Information Theoretic Problem in Deep-Learning
https://xiangxiangxu.com/
https://www.tbsi.edu.cn/index.php?s=/cms/181.html

Information Bottleneck Theory

The information bottleneck method
On the information bottleneck theory of deep learning
Deep Learning and the Information Bottleneck Principle
https://www.cs.huji.ac.il/labs/learning/Papers/allerton.pdf
https://mc.ai/summary-on-the-information-bottleneck-theory-of-deep-learning/
https://www.jmlr.org/papers/volume6/chechik05a/chechik05a.pdf
https://github.com/billy-odera/info8004-advanced-machine-learning

InfoMax

Greedy InfoMax for Self-Supervised Representation Learning
Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction
https://arxiv.org/abs/1808.06670
Deep InfoMax: Learning good representations through mutual information maximization
https://kwotsin.github.io/post/deep_infomax/
https://loewex.github.io/
https://kwotsin.github.io/publication/infomax-gan/

Deep Learning and Coding Theory

https://ee.stanford.edu/event/seminar/isl-seminar-inventing-algorithms-deep-learning

The first is reliable communication over noisy media where we successfully revisit classical open problems in information theory; we show that creatively trained and architected neural networks can beat state of the art on the AWGN channel with noisy feedback by a 100 fold improvement in bit error rate.

The second is optimization and classification problems on graphs, where the key algorithmic challenge is scalable performance to arbitrary sized graphs. Representing graphs as randomized nonlinear dynamical systems via recurrent neural networks, we show that creative adversarial training allows one to train on small size graphs and test on much larger sized graphs (100~1000x) with approximation ratios that rival state of the art on a variety of optimization problems across the complexity theoretic hardness spectrum.

https://deepcomm.github.io/
Deep Learning meets Coding Theory
An End-to-End Sparse Coding

Communication algorithms via deep learning

https://arxiv.org/abs/1805.09317v1
https://github.com/datlife/deepcom
Inventing Communication Algorithms via Deep Learning

https://infotheory.ece.uw.edu/research.html#deepcode
Physical Layer Communication via Deep Learning
COMMUNICATION ALGORITHMS VIA DEEP LEARNING
https://www.media.mit.edu/groups/signal-kinetics/publications/
https://research.ece.cmu.edu/lions/Papers/CodedEdge_INFOCOM.pdf

Learning-based coded computation

https://jackkosaian.github.io/
https://deepcomm.github.io/
Learning-Based Coded Computation
https://dl.acm.org/doi/10.1145/3341301.3359654
https://github.com/Thesys-lab/parity-models
https://www.cs.cmu.edu/~rvinayak/
LEARN Codes: Inventing Low-Latency Codes via Recurrent Neural Networks

Neural Audio Coding

Neural audio coding is an area where we want to compress an audio signal down to a bitstring, which should be recovered as another audio signal that sounds as similar as possible to human ears, of course, using neural nets. This objective is not that straightforward when it comes to training a neural network that does this autoencoding job, because what I just said in the previous sentence is not well defined as a differentiable loss function.

Neural Audio Coding
Psychoacoustic Loss Functions for Neural Audio Coding
end-to-end optimized speech coding with deep neural networks
https://arxiv.org/abs/2101.00054
https://github.com/cocosci/pam-nac
http://web.mit.edu/hst.723/www/

Brain Science and AI

Artificial intelligence and brain science have had a swinging relationship of convergence and divergence. In the early days of pattern recognition, multi-layer neural networks based on the anatomy and physiology of the visual cortex played a key role, but subsequent sophistication of machine learning promoted methods that are little related to the brain. Recently, however, the remarkable success of deep neural networks in learning from big data has re-evoked the interests in brain-like artificial intelligence.

Theoretical Neuroscience and Deep Learning Theory
Bridging Neuroscience and Deep Machine Learning, by building theories that work in the Real World.

Center for Mind, Brain, Computation and Technology
Where neuroscience and artificial intelligence converge.
https://elsc.huji.ac.il/events/elsc-conference-10
http://www.brain-ai.jp/organization/
https://neurodata.io/
Artificial Intelligence and brain
Dissecting Artificial Intelligence to Better Understand the Human Brain
Deep Learning and the Brain
AI and Neuroscience: A virtuous circle
Neuroscience-Inspired Artificial Intelligence
深度神经网络（DNN）是否模拟了人类大脑皮层结构？ - Harold Yue的回答 - 知乎
Deep Learning: Branching into brains
https://www.humanbrainproject.eu/en/
https://www.neuro-central.com/ask-experts-artificial-intelligence-neuroscience/
https://sites.google.com/mila.quebec/neuroaiworkshop
Brains and Bits: Neuroscience Meets Machine Learning
Learning From Brains How to Regularize Machines

Neuromorphic Computing

The key challenges in neuromorphic research are matching a human's flexibility, and ability to learn from unstructured stimuli with the energy efficiency of the human brain. The computational building blocks within neuromorphic computing systems are logically analogous to neurons. Spiking neural networks (SNNs) are a novel model for arranging those elements to emulate natural neural networks that exist in biological brains.

Neuro Inspired Computational Elements Conference

Spiking neural networks

https://zenkelab.org/
https://neural-reckoning.github.io/snn_workshop_2020/
https://fzenke.net/
https://github.com/google/ihmehimmeli
Spiking neural networks: Applications to computing, algorithmics, and robotics
“Why spikes? – Understanding the power and constraints of spiking based computation in biological and artificial neuronal networks”
https://2020.wcci-virtual.org/session/workshop-6-design-implementation-and-applications-spiking-neural-networks-and-neuromorphic
https://niceworkshop.org/nice-2020/nice-2020-tutorials/
Spiking Neural Networks for real-time inference tasks

SpiNNaker

SpiNNaker is a novel massively-parallel computer architecture, inspired by the fundamental structure and function of the human brain, which itself is composed of billions of simple computing elements, communicating using unreliable spikes.

The project's objectives are two-fold:

To provide a platform for high-performance massively parallel processing appropriate for the simulation of large-scale neural networks in real-time, as a research tool for neuroscientists, computer scientists and roboticists
As an aid in the investigation of new computer architectures, which break the rules of conventional supercomputing, but which we hope will lead to fundamentally new and advantageous principles for energy-efficient massively-parallel computing

SpiNNaker project has delivered the world’s largest neuromorphic computing platform incorporating over a million ARM mobile phone processors and capable of modelling spiking neural networks of the scale of a mouse brain in biological real time

SpiNNaker: A Spiking Neural Network Architecture
http://apt.cs.manchester.ac.uk/projects/SpiNNaker/project/
The Neuromorphic Computing platform: Getting started to run network simulations on SpiNNaker and emulations on BrainScaleS
Running Spiking Neural Network Simulations on SpiNNaker
SpiNNaker
SpiNNaker Project - The SpiNNaker Chip.

Intel Corporation Loihi and Nx SDK

https://www.intel.com/content/www/us/en/research/neuromorphic-community.html
Loihi Deep Dive Architecture, SDK, Examples

The Thousand Brains Theory of Intelligence

Numenta has developed a major theory of intelligence and how the brain works called The Thousand Brains Theory of Intelligence, and we’re now exploring how to incorporate key principles of the theory to the field of machine intelligence.

https://numenta.com/
https://numenta.com/blog/2019/01/16/the-thousand-brains-theory-of-intelligence/
The Thousand Brains Theory
https://lexfridman.com/jeff-hawkins/

Cognition Science and Deep Learning

Brain science is the physological theorey of cognitive science, which focus on the physical principle of brain function. The core problem of cognition science is how to learn in my eyes.

Artificial deep neural networks (DNNs) initially inspired by the brain enable computers to solve cognitive tasks at which humans excel. In the absence of explanations for such cognitive phenomena, in turn cognitive scientists have started using DNNs as models to investigate biological cognition and its neural basis, creating heated debate.

https://www.mis.mpg.de/ay/
Josh Tenenbaum
Deep Neural Networks as Scientific Models
https://wiki.opencog.org/w/Language_learning
https://github.com/opencog/learn
Advancing AI through cognitive science - Spring 2019
NYU PSYCH-GA 3405.001 / DS-GA 3001.014 : Advancing AI through cognitive science
PSYCH 209: Neural Network Models of Cognition: Principles and Applications
Computational Learning and Memory Group
Computational cognitive Science Group @MIT
Beyond deep learning
Cognitive Computation Group @ U. Penn.
Computational cognitive modeling
Mechanisms of geometric cognition
Computational Cognitive Science Lab
Deep Learning for Cognitive Computing, Theory
TOPIC: FROM ARISTOTLE TO WILLIAM JAMES TO DEEP LEARNING : EVERYTHING OLD IS NEW AGAIN.
https://www.dcsc.es/
Deep Neural Networks as Scientific Models
http://vca.ele.tue.nl/
Psychlab
https://www.uni-potsdam.de/en/mlcog/index.html
https://csai.nl/home/
https://hadrienj.github.io/about/
https://iccs2019.github.io/
https://human-memory.net/
https://sites.google.com/view/goergen
https://engineering.purdue.edu/IE/people/ptProfile?resource_id=126302
https://engineering.columbia.edu/faculty/christos-papadimitriou
Peristera Paschou
https://people.csail.mit.edu/mirrokni/Welcome.html
https://www.mindcogsci.net/
https://ganguli-gang.stanford.edu/people.html
http://wiki.ict.usc.edu/cogarch/index.php/Main_Page
http://cogarch.ict.usc.edu/
http://bicasociety.org/cogarch/architectures.php
The Neural Adaptive Computing Laboratory (NAC Lab)
https://baicsworkshop.github.io/speakers.html

Predictive coding

Predictive coding is a leading theory of how the brain performs probabilistic inference.

Neuronal Dynamics of Predictive Coding
https://lorenlugosch.github.io/posts/2020/07/predictive-coding/
https://www.cc.gatech.edu/~dovrolis/Papers/ContinualLearning18.pdf
Predictive coding and Bayesian inference in the brain
PREDICTIVE CODING “IS AS IMPORTANT TO NEUROSCIENCE AS EVOLUTION IS TO BIOLOGY.”
LAB: Lab for the Algorithmic Brain
https://www.fil.ion.ucl.ac.uk/~karl/#_Computational_neuroscience
Maximal Mutual Information Predictive Coding for Natural Language Processing
https://senselab.med.yale.edu/modeldb/ModelList?id=223029
https://libi.engin.umich.edu/research/neuroscience-driven-artificial-intelligence/
The Bayesian Brain: An Introduction to Predictive Processing
https://ieeexplore.ieee.org/document/8733892

Contrastive Predictive Coding

Representation Learning with Contrastive Predictive Coding
Representation Learning with Contrastive Predictive Coding (Jan 2019)
http://128.84.21.203/abs/2005.12963
https://arxiv.org/abs/2004.10120
https://github.com/SonyCSLParis/vqcpc-bach
https://sonycslparis.github.io/vqcpc-bach/
http://proceedings.mlr.press/v136/stacke20a/stacke20a.pdf
http://people.csail.mit.edu/clai24/data/lai2019contrastive.pdf

Hierarchical Predictive Coding

A hierarchical predictive coding model consists of layers of latent variables (tiers). Each tier attempts to predict the adjacent lower tier, resulting in a predicted state and a prediction error. By minimizing the prediction error, both the latent variables and the predictors of these variables are estimated.

This principle is often complemented by more general variations that will be supported in future versions of the package

https://sflippl.github.io/predicode/html/index.html
Does Hierarchical Predictive Coding Explain Perception?
https://arxiv.org/abs/2005.03230
https://ieeexplore.ieee.org/document/9262044
https://core.ac.uk/display/202307343

The lottery ticket hypothesis

The lottery ticket hypothesis proposes that over-parameterization of deep neural networks (DNNs) aids training
by increasing the probability of a “lucky” sub-network initialization being present rather than by helping the optimization process (Frankle & Carbin, 2019).

https://ai.facebook.com/blog/understanding-the-generalization-of-lottery-tickets-in-neural-networks
https://arxiv.org/pdf/1905.13405.pdf
https://arxiv.org/abs/1903.01611
https://arxiv.org/abs/1905.13405
https://arxiv.org/abs/1906.02768
https://arxiv.org/abs/1906.02773
https://arxiv.org/abs/1909.13458
https://zhuanlan.zhihu.com/p/84178021
https://zhuanlan.zhihu.com/p/67782029
https://openai.com/blog/deep-double-descent/
https://zhuanlan.zhihu.com/p/100451862

This project explores the Lottery Ticket Hypothesis: the conjecture that neural networks contain much smaller sparse subnetworks capable of training to full accuracy. In the course of this project, we have demonstrated that these subnetworks existed at initialization in small networks and early in training in larger networks. In addition, we have shown that these lottery ticket subnetworks are state-of-the-art pruned neural networks.

http://www.jfrankle.com/
https://gkdz.org/#about
http://yosinski.com/
Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask
OpenLTH: A Framework for Lottery Tickets and Beyond
Code: The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
The Lottery Ticket Hypothesis at Scale
Stabilizing the Lottery Ticket Hypothesis
Linear Mode Connectivity and the Lottery Ticket Hypothesis
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
https://roberttlange.github.io/posts/2020/06/lottery-ticket-hypothesis/
https://internetpolicy.mit.edu/neural-networks-and-the-lottery-ticket-hypothesis/
Luck Matters: Understanding Training Dynamics of Deep ReLU Networks

Double Descent

The model with optimal parameters are not equal to the best model. $$\fbox{Learning}\not ={Training} \ Generalization\not ={Optimziation}.$$ Back-propagation (BP), the current de facto training paradigm for deep learning models, is only useful for parameter learning but offers no role in finding an optimal network structure. We need to go beyond BP in order to derive an optimal network, both in structure and in parameter.

We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time. This effect is often avoided through careful regularization. While this behavior appears to be fairly universal, we don’t yet fully understand why it happens, and view further study of this phenomenon as an important research direction.

https://arxiv.org/abs/1710.03667
Reproducing Deep Double Descent
Deep Double Descent
Deep Double Descent (cross-posted on OpenAI blog)
Deep Double Descent: Where Bigger Models and More Data Hurt
Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime
https://www.lyrn.ai/
High-dimensional dynamics of generalization error in neural networks
https://mltheory.org/deep.pdf

Neural Tangents

https://arxiv.org/pdf/1905.13192.pdf
Neural Tangent Kernel: Convergence and Generlization in Neural Networks
https://github.com/google/neural-tangents
http://simonshaoleidu.com/
http://www.offconvex.org/2019/10/03/NTK/

Files

Science of Deep Learning.md

Latest commit

History

Science of Deep Learning.md

File metadata and controls

Science of Learning

Resource on Deep Learning Theory

Blogs and Paper

Course on Deep Learning

Deep Learning Reading Group

Workshops

Labs

Interpretability in AI

Interpretability of Neural Networks

DeepLEVER

DLphi

Scientific Machine Learning

Physics and Deep Learning

Machine Learning for Physics

Deep Learning for Physics

Physics for Machine Learning

Physics Informed Machine Learning

Physics Informed Deep Learning

Statistical Mechanics and Deep Learning

Born Machine

Quantum Machine learning

Tensor network

Deep Neural Network and Renormalization Group

Mathematics of Deep Learning

Discrete Mathematics and Neural Networks

MIP and Deep Learning

Numerical Analysis for Deep Learning

ResNets

Differential Equations Motivated Deep Learning Methods

Control Theory and Deep Learning

Neural Ordinary Differential Equations

Dynamics and Deep Learning

Stability For Neural Networks

Differential Equation and Deep Learning

Deep Learning for PDEs

$\mathcal H$ matrix and deep learning

Stochastic Differential Equations and Deep Learning

Finite Element Methods and Deep Learning

Approximation Theory for Deep Learning

The F-Principle

Spline Theory and Deep Network

Resource

Workshop

Labs and Groups

Inverse Problem and Deep Learning

Deep Learning for Inverse Problems

Deep Inverse Optimization

Random Matrix Theory and Deep Learning

Nonlinear Random Matrix Theory

Deep learning and Optimal Transport

Generative Models and Optimal Transport

Geometric Analysis Approach to AI

Loss Surface Of Deep Networks

Tropical Geometry of Deep Neural Networks

Topology and Deep Learning

Topological machine learning

Topology Optimization and Deep Learning

Deep Learning with Topological Data Analysis

Deep Learning with Topological Layer

Topological Graph Neural Networks

Topology-Based Graph Classification

Algebra and Deep Learning

Group Equivariant Convolutional Networks

Complex Valued Neural Networks

Quaternion Neural Networks

Probabilistic Theory and Deep Learning

Probabilistic Deep Learning

Bayesian Deep Learning

Statistics and Deep Learning

Statistical Relational AI

Principal Component Neural Networks

Least squares support vector machines

Information Theory and Deep Learning

Universal Feature Selection