Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scale option to spectral normalization. #3904

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

JesseFarebro
Copy link

What does this PR do?

Adds a scale parameter to spectral normalization. One can increase the expressivity while retaining some benefits of spectral norm by parameterizing the spectral norm with a single scale parameter. This forces the spectral norm to be independent of the dimensionality of the weight matrix. This was originally proposed in Appendix E of [1] and recently applied to transformer models in [2].

[1] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida, 'Spectral Normalization for Generative Adversarial Networks', International Conference on Learning Representations (ICLR), 2018.
[2] Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, and Joshua M. Susskind, 'Stabilizing Transformer Training by Preventing Attention Entropy Collapse'. International Conference on Machine Learning (ICML), 2023.

Checklist

  • This PR fixes a minor issue (e.g.: typo or small bug) or improves the docs (you can dismiss the other
    checks if that's the case).
  • This change is discussed in a Github issue/
    discussion (please add a
    link).
  • The documentation and docstrings adhere to the
    documentation guidelines.
  • This change includes necessary high-coverage tests.
    (No quality testing = no merge!)

One can increase the expressivity while retaining some benefits
of spectral norm by parameterizing the spectral norm with
a single scale parameter. This forces the spectral norm to be
independent of the dimensionality of the weight matrix.
This was originally proposed in Appendix E of [1] and recently
applied to transformer models in [2].

[1] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida, 'Spectral Normalization for Generative Adversarial Networks', International Conference on Learning Representations (ICLR), 2018.
[2] Shuangfei Zhai, Tatiana Likhomanenko, Etai Littwin, Dan Busbridge, Jason Ramapuram, Yizhe Zhang, Jiatao Gu, and Joshua M. Susskind, 'Stabilizing Transformer Training by Preventing Attention Entropy Collapse'. International Conference on Machine Learning (ICML), 2023.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant