Gradient Clipping #902

swfsql · 2023-12-14T00:58:17Z

Draft state.
Closes Gradient Clipping #596.
Adds Storage and Gradient view/mutating methods.
- Added dfdx::nn_traits::WithGrads trait and dfdx_derives::WithGrads proc macro, basead on ZeroGrads.
  - The overall design is as suggested by Gradient Clipping #596 (comment), allowing custom cpu operations on the elements.
  - The ZeroGrads trait could be merged into the WithGrads by mostly just merging their methods.
- Added dfdx_core::tensor::WithStorage trait.
- Change the interface so Cuda can do more with Cuda kernels, and make the necessary kernels.
  - This could be a separated improvement by a future PR. Since grad updates are not made that often, I think leaving things on cpu isn't too bad.
Changed some methods from Gradients:
- Exposed get_mut as pub.
- Exposed get_ref as pub, and lower the requirements from &mut self to &self.
Added gradient clamping and cliping methods.
- Add examples for all methods (view/mutate grads, clamp and clips).

Example using clip_norm:

// (...)
// let loss = dfdx::losses::cross_entropy_with_logits_loss(prediction_y, y);
grads = loss.backward();

// accumulates into norm_squared, then applies clip_norm
let mut norm_squared = 0.;
model.grads_norm_squared(&grads, &mut norm_squared);
model.grads_clip_norm(&mut grads, norm_squared.sqrt(), 1e-2);

opt.update(&mut model, &grads).unwrap();

Note that clip_norm doesn't change the grads "direction" because all grad values are scaled by the same value, while clip_value does changes the direction (because some values are changed while others are left intact). So for gradient descent, where the grads direction is supposed to be somewhat followed, my guess is that clip_norm is better.

Remove ftz

Avoid ci errors

…and cliping - Added `dfdx::nn_traits::WithGrads` trait and `dfdx_derives::WithGrads` proc macro, basead on `ZeroGrads`. - Added `dfdx_core::tensor::WithStorage` trait. - Changed some methods from `Gradients`: - Exposed `get_mut` as `pub`. - Exposed `get_ref` as `pub`, and lower the requirements from `&mut self` to `&self`. - Added gradient clamping and cliping methods.

rainiwu and others added 7 commits January 26, 2024 00:29

remove deprecated ftz intrinsics

5c532ec

suppress spurious cargo clippy warning

fb91f13

Merge pull request #1 from rainiwu/remove-ftz

24a8593

Remove ftz

avoid conv1d bound for cudnn

4e3f7c7

bump gemm

a8bc54c

clippy fix

557687c

Merge pull request #2 from swfsql/avoid-ci-errors

1175903

Avoid ci errors

swfsql force-pushed the issue-596 branch from 81f644d to c8eb559 Compare March 1, 2024 16:08

swfsql force-pushed the issue-596 branch from c8eb559 to 7a21ba7 Compare March 1, 2024 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient Clipping #902

Gradient Clipping #902

swfsql commented Dec 14, 2023 •

edited

Gradient Clipping #902

Are you sure you want to change the base?

Gradient Clipping #902

Conversation

swfsql commented Dec 14, 2023 • edited

swfsql commented Dec 14, 2023 •

edited