Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradients are clipped before the unscaling #468

Open
marcovisentin opened this issue Nov 4, 2023 · 1 comment
Open

Gradients are clipped before the unscaling #468

marcovisentin opened this issue Nov 4, 2023 · 1 comment

Comments

@marcovisentin
Copy link

At lines 114-115 in train.py.
I believe 'scaler.unscale_(optimizer)' should be added before gradient clipping.

@tensorctn
Copy link

In my opinion, scaler.step(optimizer) include unscaleing and it do two things,first unscaling if you did't unscale manualy before.second,it will check if there exists overflows,if there are no NAN/INF,it will execute the optimizer's step,if there are,it will skip this iteration's parametes update.so if the gradients are clipped after the scaler.step,I think it make no sense.the gradients clip just aim to avoid gradient explosion,but if there exist gradinent explosion ,scale.step will skip this iteration's parametes update,abosolutely there in no need for clipping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants