[FSDP2][2D] test_clip_grad_norm_2d is failing on main #126484

wz337 · 2024-05-17T00:23:15Z

🐛 Describe the bug

repro:

python test/distributed/_composable/fsdp/test_fully_shard_clip_grad_norm_.py -k test_clip_grad_norm_2d

For norm_type 1, 2, 3, we observe numeric discrepancy between ref_total_norm and total_norm.
Some prints:

0, rank: 0, norm_type=2, ref_total_norm=tensor(1200.0919, device='cuda:0'), total_norm.full_tensor()=tensor(1200.5303, device='cuda:0')\
0, rank: 0, norm_type=1, ref_total_norm=tensor(48862.6328, device='cuda:0'), total_norm.full_tensor()=tensor(48963.7656, device='cuda:0')\
0, rank: 0, norm_type=3, ref_total_norm=tensor(463.1410, device='cuda:0'), total_norm.full_tensor()=tensor(463.1594, device='cuda:0')

We need to investigate in the numerics to confirm whether this is a bug.

cc. @awgu

Versions

N\A.

The text was updated successfully, but these errors were encountered:

This fixes #126484. We change from transformer to MLP stack since transformer seems to introduce slight numeric differences when using TP. We include a sequence parallel layer norm module in the MLP stack to exercise `(S(0), R)` placement. cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

This fixes pytorch#126484. We change from transformer to MLP stack since transformer seems to introduce slight numeric differences when using TP. We include a sequence parallel layer norm module in the MLP stack to exercise `(S(0), R)` placement. Pull Request resolved: pytorch#126497 Approved by: https://github.com/weifengpy, https://github.com/wz337

This fixes #126484. We change from transformer to MLP stack since transformer seems to introduce slight numeric differences when using TP. We include a sequence parallel layer norm module in the MLP stack to exercise `(S(0), R)` placement. cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

This fixes #126484. We change from transformer to MLP stack since transformer seems to introduce slight numeric differences when using TP. We include a sequence parallel layer norm module in the MLP stack to exercise `(S(0), R)` placement. Pull Request resolved: #126497 Approved by: https://github.com/weifengpy, https://github.com/wz337

wz337 added the release notes: distributed (fsdp2) release notes category label May 17, 2024

awgu mentioned this issue May 17, 2024

[FSDP2] Fixed 2D clip grad norm test #126497

Closed

pytorchmergebot closed this as completed in 3f28906 May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP2][2D] test_clip_grad_norm_2d is failing on main #126484

[FSDP2][2D] test_clip_grad_norm_2d is failing on main #126484

wz337 commented May 17, 2024 •

edited

[FSDP2][2D] test_clip_grad_norm_2d is failing on main #126484

[FSDP2][2D] test_clip_grad_norm_2d is failing on main #126484

Comments

wz337 commented May 17, 2024 • edited

🐛 Describe the bug

Versions

wz337 commented May 17, 2024 •

edited