Add paper

PiperOrigin-RevId: 429168484
google · Feb 17, 2022 · 7c77a22 · 7c77a22
1 parent 9d1d17e
commit 7c77a22
Show file tree

Hide file tree

Showing 5 changed files with 72 additions and 71 deletions.
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -56,7 +56,7 @@ jobs:
         pytest -n auto --cov=neural_tangents --cov-report=xml --cov-report=term
 
     - name: Test with pytest and generate coverage report (macOS)
-      if: ${{ (matrix.os == 'macos-latest') && (matrix.JAX_ENABLE_X64 == 0) }}
+      if: ${{ (matrix.os != 'macos-latest') && (matrix.JAX_ENABLE_X64 == 0) }}
       run: |
         pytest -n auto --cov=neural_tangents --cov-report=xml --cov-report=term
 

diff --git a/neural_tangents/CITATION → CITATION b/neural_tangents/CITATION → CITATION
diff --git a/neural_tangents/LICENSE_SHORT → LICENSE_SHORT b/neural_tangents/LICENSE_SHORT → LICENSE_SHORT
diff --git a/README.md b/README.md
@@ -429,76 +429,77 @@ as an example. With `NVIDIA V100` 64-bit precision, `nt` took 316/330/508 GPU-ho
 
 Neural Tangents has been used in the following papers (newest first):
 
-1. [Learning Representation from Neural Fisher Kernel with Low-rank Approximation](https://arxiv.org/abs/2202.01944)
-2. [MIT 6.S088 Modern Machine Learning: Simple Methods that Work](https://web.mit.edu/modernml/course/)
-3. [A Neural Tangent Kernel Perspective on Function-Space Regularization in Neural Networks](https://hudsonchen.github.io/papers/A_Neural_Tangent_Kernel_Perspective_on_Function_Space_Regularization_in_Neural_Networks.pdf)
-4. [Eigenspace Restructuring: a Principle of Space and Frequency in Neural Networks](https://arxiv.org/abs/2112.05611)
-5. [Functional Regularization for Reinforcement Learning via Learned Fourier Features](https://arxiv.org/abs/2112.03257)
-6. [A Structured Dictionary Perspective on Implicit Neural Representations](https://arxiv.org/abs/2112.01917)
-7. [Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm](https://arxiv.org/abs/2111.12143)
-8. [Asymptotics of representation learning in finite Bayesian neural networks](https://arxiv.org/abs/2106.00651)
-9. [On the Equivalence between Neural Network and Support Vector Machine](https://arxiv.org/abs/2111.06063)
-10. [An Empirical Study of Neural Kernel Bandits](https://arxiv.org/abs/2111.03543)
-11. [Neural Networks as Kernel Learners: The Silent Alignment Effect](https://arxiv.org/abs/2111.00034)
-12. [Understanding Deep Learning via Analyzing Dynamics of Gradient Descent](https://dataspace.princeton.edu/handle/88435/dsp01xp68kk34b)
-13. [Neural Scene Representations for View Synthesis](https://digitalassets.lib.berkeley.edu/techreports/ucb/incoming/EECS-2020-223.pdf)
-14. [Neural Tangent Kernel Eigenvalues Accurately Predict Generalization](https://arxiv.org/abs/2110.03922)
-15. [Uniform Generalization Bounds for Overparameterized Neural Networks](https://arxiv.org/abs/2109.06099)
-16. [Data Summarization via Bilevel Optimization](https://arxiv.org/abs/2109.12534)
-17. [Neural Tangent Generalization Attacks](http://proceedings.mlr.press/v139/yuan21b.html)
-18. [Dataset Distillation with Infinitely Wide Convolutional Networks](https://arxiv.org/abs/2107.13034)
-19. [Neural Contextual Bandits without Regret](https://arxiv.org/abs/2107.03144)
-20. [Epistemic Neural Networks](https://arxiv.org/abs/2107.08924)
-21. [Uncertainty-aware Cardinality Estimation by Neural Network Gaussian Process](https://arxiv.org/abs/2107.08706)
-22. [Scale Mixtures of Neural Network Gaussian Processes](https://arxiv.org/abs/2107.01408)
-23. [Provably efficient machine learning for quantum many-body problems](https://arxiv.org/abs/2106.12627)
-24. [Wide Mean-Field Variational Bayesian Neural Networks Ignore the Data](https://arxiv.org/abs/2106.07052)
-25. [Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks](https://www.nature.com/articles/s41467-021-23103-1)
-26. [Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation](https://arxiv.org/abs/2106.09017)
-27. [Wide Mean-Field Variational Bayesian Neural Networks Ignore the Data](https://arxiv.org/abs/2106.07052)
-28. [What can linearized neural networks actually say about generalization?](https://arxiv.org/abs/2106.06770)
-29. [Measuring the sensitivity of Gaussian processes to kernel choice](https://arxiv.org/abs/2106.06510)
-30. [A Neural Tangent Kernel Perspective of GANs](https://arxiv.org/abs/2106.05566)
-31. [On the Power of Shallow Learning](https://arxiv.org/abs/2106.03186)
-32. [Learning Curves for SGD on Structured Features](https://arxiv.org/abs/2106.02713)
-33. [Out-of-Distribution Generalization in Kernel Regression](https://arxiv.org/abs/2106.02261)
-34. [Rapid Feature Evolution Accelerates Learning in Neural Networks](https://arxiv.org/abs/2105.14301)
-35. [Scalable and Flexible Deep Bayesian Optimization with Auxiliary Information for Scientific Problems](https://arxiv.org/abs/2104.11667)
-36. [Random Features for the Neural Tangent Kernel](https://arxiv.org/abs/2104.01351)
-37. [Multi-Level Fine-Tuning: Closing Generalization Gaps in Approximation of Solution Maps under a Limited Budget for Training Data](https://arxiv.org/abs/2102.07169)
-38. [Explaining Neural Scaling Laws](https://arxiv.org/abs/2102.06701)
-39. [Correlated Weights in Infinite Limits of Deep Convolutional Neural Networks](https://arxiv.org/abs/2101.04097)
-40. [Dataset Meta-Learning from Kernel Ridge-Regression](https://arxiv.org/abs/2011.00050)
-41. [Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel](https://arxiv.org/abs/2010.15110)
-42. [Stable ResNet](https://arxiv.org/abs/2010.12859)
-43. [Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity](https://arxiv.org/abs/2010.11775)
-44. [Semi-supervised Batch Active Learning via Bilevel Optimization](https://arxiv.org/abs/2010.09654)
-45. [Temperature check: theory and practice for training models with softmax-cross-entropy losses](https://arxiv.org/abs/2010.07344)
-46. [Experimental Design for Overparameterized Learning with Application to Single Shot Deep Active Learning](https://arxiv.org/abs/2009.12820)
-47. [How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks](https://arxiv.org/abs/2009.11848)
-48. [Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit](http://www.gatsby.ucl.ac.uk/~balaji/udl2020/accepted-papers/UDL2020-paper-115.pdf)
-49. [Cold Posteriors and Aleatoric Uncertainty](https://arxiv.org/abs/2008.00029)
-50. [Asymptotics of Wide Convolutional Neural Networks](https://arxiv.org/abs/2008.08675)
-51. [Finite Versus Infinite Neural Networks: an Empirical Study](https://arxiv.org/abs/2007.15801)
-52. [Bayesian Deep Ensembles via the Neural Tangent Kernel](https://arxiv.org/abs/2007.05864)
-53. [The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks](https://arxiv.org/abs/2006.14599)
-54. [When Do Neural Networks Outperform Kernel Methods?](https://arxiv.org/abs/2006.13409)
-55. [Statistical Mechanics of Generalization in Kernel Regression](https://arxiv.org/abs/2006.13198)
-56. [Exact posterior distributions of wide Bayesian neural networks](https://arxiv.org/abs/2006.10541)
-57. [Infinite attention: NNGP and NTK for deep attention networks](https://arxiv.org/abs/2006.10540)
-58. [Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains](https://arxiv.org/abs/2006.10739)
-59. [Finding trainable sparse networks through Neural Tangent Transfer](https://arxiv.org/abs/2006.08228)
-60. [Coresets via Bilevel Optimization for Continual Learning and Streaming](https://arxiv.org/abs/2006.03875)
-61. [On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization](https://arxiv.org/abs/2004.05867)
-62. [The large learning rate phase of deep learning: the catapult mechanism](https://arxiv.org/abs/2003.02218)
-63. [Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks](https://arxiv.org/abs/2002.02561)
-64. [Taylorized Training: Towards Better Approximation of Neural Network Training at Finite Width](https://arxiv.org/abs/2002.04010)
-65. [On the Infinite Width Limit of Neural Networks with a Standard Parameterization](https://arxiv.org/abs/2001.07301)
-66. [Disentangling Trainability and Generalization in Deep Learning](https://arxiv.org/abs/1912.13053)
-67. [Information in Infinite Ensembles of Infinitely-Wide Neural Networks](https://arxiv.org/abs/1911.09189)
-68. [Training Dynamics of Deep Networks using Stochastic Gradient Descent via Neural Tangent Kernel](https://arxiv.org/abs/1905.13654)
-69. [Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent](https://arxiv.org/abs/1902.06720)
-70. [Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes](https://arxiv.org/abs/1810.05148)
+1. [Finding Dynamics Preserving Adversarial Winning Tickets](https://arxiv.org/abs/2202.06488)
+2. [Learning Representation from Neural Fisher Kernel with Low-rank Approximation](https://arxiv.org/abs/2202.01944)
+3. [MIT 6.S088 Modern Machine Learning: Simple Methods that Work](https://web.mit.edu/modernml/course/)
+4. [A Neural Tangent Kernel Perspective on Function-Space Regularization in Neural Networks](https://hudsonchen.github.io/papers/A_Neural_Tangent_Kernel_Perspective_on_Function_Space_Regularization_in_Neural_Networks.pdf)
+5. [Eigenspace Restructuring: a Principle of Space and Frequency in Neural Networks](https://arxiv.org/abs/2112.05611)
+6. [Functional Regularization for Reinforcement Learning via Learned Fourier Features](https://arxiv.org/abs/2112.03257)
+7. [A Structured Dictionary Perspective on Implicit Neural Representations](https://arxiv.org/abs/2112.01917)
+8. [Critical initialization of wide and deep neural networks through partial Jacobians: general theory and applications to LayerNorm](https://arxiv.org/abs/2111.12143)
+9. [Asymptotics of representation learning in finite Bayesian neural networks](https://arxiv.org/abs/2106.00651)
+10. [On the Equivalence between Neural Network and Support Vector Machine](https://arxiv.org/abs/2111.06063)
+11. [An Empirical Study of Neural Kernel Bandits](https://arxiv.org/abs/2111.03543)
+12. [Neural Networks as Kernel Learners: The Silent Alignment Effect](https://arxiv.org/abs/2111.00034)
+13. [Understanding Deep Learning via Analyzing Dynamics of Gradient Descent](https://dataspace.princeton.edu/handle/88435/dsp01xp68kk34b)
+14. [Neural Scene Representations for View Synthesis](https://digitalassets.lib.berkeley.edu/techreports/ucb/incoming/EECS-2020-223.pdf)
+15. [Neural Tangent Kernel Eigenvalues Accurately Predict Generalization](https://arxiv.org/abs/2110.03922)
+16. [Uniform Generalization Bounds for Overparameterized Neural Networks](https://arxiv.org/abs/2109.06099)
+17. [Data Summarization via Bilevel Optimization](https://arxiv.org/abs/2109.12534)
+18. [Neural Tangent Generalization Attacks](http://proceedings.mlr.press/v139/yuan21b.html)
+19. [Dataset Distillation with Infinitely Wide Convolutional Networks](https://arxiv.org/abs/2107.13034)
+20. [Neural Contextual Bandits without Regret](https://arxiv.org/abs/2107.03144)
+21. [Epistemic Neural Networks](https://arxiv.org/abs/2107.08924)
+22. [Uncertainty-aware Cardinality Estimation by Neural Network Gaussian Process](https://arxiv.org/abs/2107.08706)
+23. [Scale Mixtures of Neural Network Gaussian Processes](https://arxiv.org/abs/2107.01408)
+24. [Provably efficient machine learning for quantum many-body problems](https://arxiv.org/abs/2106.12627)
+25. [Wide Mean-Field Variational Bayesian Neural Networks Ignore the Data](https://arxiv.org/abs/2106.07052)
+26. [Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks](https://www.nature.com/articles/s41467-021-23103-1)
+27. [Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation](https://arxiv.org/abs/2106.09017)
+28. [Wide Mean-Field Variational Bayesian Neural Networks Ignore the Data](https://arxiv.org/abs/2106.07052)
+29. [What can linearized neural networks actually say about generalization?](https://arxiv.org/abs/2106.06770)
+30. [Measuring the sensitivity of Gaussian processes to kernel choice](https://arxiv.org/abs/2106.06510)
+31. [A Neural Tangent Kernel Perspective of GANs](https://arxiv.org/abs/2106.05566)
+32. [On the Power of Shallow Learning](https://arxiv.org/abs/2106.03186)
+33. [Learning Curves for SGD on Structured Features](https://arxiv.org/abs/2106.02713)
+34. [Out-of-Distribution Generalization in Kernel Regression](https://arxiv.org/abs/2106.02261)
+35. [Rapid Feature Evolution Accelerates Learning in Neural Networks](https://arxiv.org/abs/2105.14301)
+36. [Scalable and Flexible Deep Bayesian Optimization with Auxiliary Information for Scientific Problems](https://arxiv.org/abs/2104.11667)
+37. [Random Features for the Neural Tangent Kernel](https://arxiv.org/abs/2104.01351)
+38. [Multi-Level Fine-Tuning: Closing Generalization Gaps in Approximation of Solution Maps under a Limited Budget for Training Data](https://arxiv.org/abs/2102.07169)
+39. [Explaining Neural Scaling Laws](https://arxiv.org/abs/2102.06701)
+40. [Correlated Weights in Infinite Limits of Deep Convolutional Neural Networks](https://arxiv.org/abs/2101.04097)
+41. [Dataset Meta-Learning from Kernel Ridge-Regression](https://arxiv.org/abs/2011.00050)
+42. [Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel](https://arxiv.org/abs/2010.15110)
+43. [Stable ResNet](https://arxiv.org/abs/2010.12859)
+44. [Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity](https://arxiv.org/abs/2010.11775)
+45. [Semi-supervised Batch Active Learning via Bilevel Optimization](https://arxiv.org/abs/2010.09654)
+46. [Temperature check: theory and practice for training models with softmax-cross-entropy losses](https://arxiv.org/abs/2010.07344)
+47. [Experimental Design for Overparameterized Learning with Application to Single Shot Deep Active Learning](https://arxiv.org/abs/2009.12820)
+48. [How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks](https://arxiv.org/abs/2009.11848)
+49. [Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit](http://www.gatsby.ucl.ac.uk/~balaji/udl2020/accepted-papers/UDL2020-paper-115.pdf)
+50. [Cold Posteriors and Aleatoric Uncertainty](https://arxiv.org/abs/2008.00029)
+51. [Asymptotics of Wide Convolutional Neural Networks](https://arxiv.org/abs/2008.08675)
+52. [Finite Versus Infinite Neural Networks: an Empirical Study](https://arxiv.org/abs/2007.15801)
+53. [Bayesian Deep Ensembles via the Neural Tangent Kernel](https://arxiv.org/abs/2007.05864)
+54. [The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks](https://arxiv.org/abs/2006.14599)
+55. [When Do Neural Networks Outperform Kernel Methods?](https://arxiv.org/abs/2006.13409)
+56. [Statistical Mechanics of Generalization in Kernel Regression](https://arxiv.org/abs/2006.13198)
+57. [Exact posterior distributions of wide Bayesian neural networks](https://arxiv.org/abs/2006.10541)
+58. [Infinite attention: NNGP and NTK for deep attention networks](https://arxiv.org/abs/2006.10540)
+59. [Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains](https://arxiv.org/abs/2006.10739)
+60. [Finding trainable sparse networks through Neural Tangent Transfer](https://arxiv.org/abs/2006.08228)
+61. [Coresets via Bilevel Optimization for Continual Learning and Streaming](https://arxiv.org/abs/2006.03875)
+62. [On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization](https://arxiv.org/abs/2004.05867)
+63. [The large learning rate phase of deep learning: the catapult mechanism](https://arxiv.org/abs/2003.02218)
+64. [Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks](https://arxiv.org/abs/2002.02561)
+65. [Taylorized Training: Towards Better Approximation of Neural Network Training at Finite Width](https://arxiv.org/abs/2002.04010)
+66. [On the Infinite Width Limit of Neural Networks with a Standard Parameterization](https://arxiv.org/abs/2001.07301)
+67. [Disentangling Trainability and Generalization in Deep Learning](https://arxiv.org/abs/1912.13053)
+68. [Information in Infinite Ensembles of Infinitely-Wide Neural Networks](https://arxiv.org/abs/1911.09189)
+69. [Training Dynamics of Deep Networks using Stochastic Gradient Descent via Neural Tangent Kernel](https://arxiv.org/abs/1905.13654)
+70. [Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent](https://arxiv.org/abs/1902.06720)
+71. [Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes](https://arxiv.org/abs/1810.05148)
 
 
 Please let us know if you make use of the code in a publication, and we'll add it

diff --git a/...entation/neurips_linearization_poster.pdf → ...entation/neurips_linearization_poster.pdf b/...entation/neurips_linearization_poster.pdf → ...entation/neurips_linearization_poster.pdf