Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradient clipping not working for llama2_70b_lora benchmark #723

Open
michal2409 opened this issue Mar 27, 2024 · 1 comment
Open

Gradient clipping not working for llama2_70b_lora benchmark #723

michal2409 opened this issue Mar 27, 2024 · 1 comment

Comments

@michal2409
Copy link
Contributor

I’ve found that setting max_grad_norm has no effect, and we are not clipping gradients.

For verification, I ran convergence with max_grad_norm 1e-9 and saw no difference in eval loss, and checked the unscale_and_clip_grads and the self.clip_grad is set to 0 when I printed it here.

@nv-rborkar
Copy link
Contributor

nv-rborkar commented Mar 28, 2024

Discussed in Training WG (3/28): @itayhubara is verifying if setting this value correctly affect convergence & if this can improve convergence or reduce coefficienct of variance in RCPs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants