Add scale option to spectral normalization. #3904

JesseFarebro · 2024-05-07T16:41:05Z

What does this PR do?

Adds a scale parameter to spectral normalization. One can increase the expressivity while retaining some benefits of spectral norm by parameterizing the spectral norm with a single scale parameter. This forces the spectral norm to be independent of the dimensionality of the weight matrix. This was originally proposed in Appendix E of [1] and recently applied to transformer models in [2].

[1] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida, 'Spectral Normalization for Generative Adversarial Networks', International Conference on Learning Representations (ICLR), 2018.
[2] Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, and Joshua M. Susskind, 'Stabilizing Transformer Training by Preventing Attention Entropy Collapse'. International Conference on Machine Learning (ICML), 2023.

Checklist

This PR fixes a minor issue (e.g.: typo or small bug) or improves the docs (you can dismiss the other
checks if that's the case).
This change is discussed in a Github issue/
discussion (please add a
link).
The documentation and docstrings adhere to the
documentation guidelines.
This change includes necessary high-coverage tests.
(No quality testing = no merge!)

One can increase the expressivity while retaining some benefits of spectral norm by parameterizing the spectral norm with a single scale parameter. This forces the spectral norm to be independent of the dimensionality of the weight matrix. This was originally proposed in Appendix E of [1] and recently applied to transformer models in [2]. [1] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida, 'Spectral Normalization for Generative Adversarial Networks', International Conference on Learning Representations (ICLR), 2018. [2] Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, and Joshua M. Susskind, 'Stabilizing Transformer Training by Preventing Attention Entropy Collapse'. International Conference on Machine Learning (ICML), 2023.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scale option to spectral normalization. #3904

Add scale option to spectral normalization. #3904

JesseFarebro commented May 7, 2024

Add scale option to spectral normalization. #3904

Are you sure you want to change the base?

Add scale option to spectral normalization. #3904

Conversation

JesseFarebro commented May 7, 2024

What does this PR do?

Checklist