-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FSDP2][2D] test_clip_grad_norm_2d is failing on main #126484
Labels
release notes: distributed (fsdp2)
release notes category
Comments
awgu
added a commit
that referenced
this issue
May 17, 2024
This fixes #126484. We change from transformer to MLP stack since transformer seems to introduce slight numeric differences when using TP. We include a sequence parallel layer norm module in the MLP stack to exercise `(S(0), R)` placement. cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]
ZelboK
pushed a commit
to ZelboK/pytorch
that referenced
this issue
May 19, 2024
This fixes pytorch#126484. We change from transformer to MLP stack since transformer seems to introduce slight numeric differences when using TP. We include a sequence parallel layer norm module in the MLP stack to exercise `(S(0), R)` placement. Pull Request resolved: pytorch#126497 Approved by: https://github.com/weifengpy, https://github.com/wz337
awgu
added a commit
that referenced
this issue
May 21, 2024
This fixes #126484. We change from transformer to MLP stack since transformer seems to introduce slight numeric differences when using TP. We include a sequence parallel layer norm module in the MLP stack to exercise `(S(0), R)` placement. cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]
awgu
added a commit
that referenced
this issue
May 21, 2024
This fixes #126484. We change from transformer to MLP stack since transformer seems to introduce slight numeric differences when using TP. We include a sequence parallel layer norm module in the MLP stack to exercise `(S(0), R)` placement. cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]
pytorchmergebot
pushed a commit
that referenced
this issue
May 22, 2024
This fixes #126484. We change from transformer to MLP stack since transformer seems to introduce slight numeric differences when using TP. We include a sequence parallel layer norm module in the MLP stack to exercise `(S(0), R)` placement. Pull Request resolved: #126497 Approved by: https://github.com/weifengpy, https://github.com/wz337
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
馃悰 Describe the bug
repro:
For norm_type 1, 2, 3, we observe numeric discrepancy between ref_total_norm and total_norm.
Some prints:
We need to investigate in the numerics to confirm whether this is a bug.
cc. @awgu
Versions
N\A.
The text was updated successfully, but these errors were encountered: