Gradient clipping not working for llama2_70b_lora benchmark #723

michal2409 · 2024-03-27T15:02:58Z

I’ve found that setting max_grad_norm has no effect, and we are not clipping gradients.

For verification, I ran convergence with max_grad_norm 1e-9 and saw no difference in eval loss, and checked the unscale_and_clip_grads and the self.clip_grad is set to 0 when I printed it here.

nv-rborkar · 2024-03-28T15:46:59Z

Discussed in Training WG (3/28): @itayhubara is verifying if setting this value correctly affect convergence & if this can improve convergence or reduce coefficienct of variance in RCPs.

nv-rborkar mentioned this issue Apr 5, 2024

llama2: fixing DS yaml by adding gradient clipping: 0.3, and small update to … #726

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient clipping not working for llama2_70b_lora benchmark #723

Gradient clipping not working for llama2_70b_lora benchmark #723

michal2409 commented Mar 27, 2024

nv-rborkar commented Mar 28, 2024 •

edited

Gradient clipping not working for llama2_70b_lora benchmark #723

Gradient clipping not working for llama2_70b_lora benchmark #723

Comments

michal2409 commented Mar 27, 2024

nv-rborkar commented Mar 28, 2024 • edited

nv-rborkar commented Mar 28, 2024 •

edited