[Feature Request] Normalized gradient descent #594

smorad · 2023-10-09T15:55:00Z

Optax has various clipping operators, but as far as I can tell, it cannot scale by gradient norm. Adding these capabilities such that they could be chained would allow us to use normalized gradient descent methods (e.g. normalized Adam, etc).

A simple implementation might look like

def scale_by_norm(scale: float=1.0, eps: float=1e-6):
  def init_fn(params):
    del params
    return optax._src.base.OptState

  def update_fn(updates, state, params=None):
    del params
    g_norm = jnp.maximum(optax.global_norm(gradient) + eps, scale)
    def scale_fn(t):
       return t / g_norm

    updates = jax.tree_util.tree_map(scale_fn, updates)
    return updates, state

  return optax.GradientTransformation(init_fn, update_fn)

mtthss · 2023-10-10T07:50:37Z

Do you have a reference to this specific way of normalising?

smorad · 2023-10-10T09:23:38Z

This textbook describes it fairly well. My example might be a little fancy, but you could replace the maximum with

g_norm = (optax.global_norm(gradient) + eps) / scale

In this case, scale would refer to alpha in Eq 6.

mtthss · 2023-10-10T10:40:58Z

Sounds like it could be a good addition. Do you want to put together a PR?

SauravMaheshkar · 2023-11-25T20:19:09Z

Seems like a simple extension of

optax/optax/contrib/sam.py

Lines 63 to 80 in 841be5a

    
           def normalize() -> base.GradientTransformation: 
        
             """Normalizes the gradient. 
        
             Returns: 
        
               An (init_fn, update_fn) tuple. 
        
             """ 
        
             def init_fn(params): 
        
               del params 
        
               return NormalizeState() 
        
             def update_fn(updates, state, params=None): 
        
               del params 
        
               g_norm = utils.global_norm(updates) 
        
               updates = jax.tree_map(lambda g: g / g_norm, updates) 
        
               return updates, state 
        
             return base.GradientTransformation(init_fn, update_fn)

@mtthss can I take this up ?

smorad · 2023-11-25T21:53:14Z

I think this might actually be implemented in clip_by_global_norm. IIRC the code there actually scales the gradient rather than clips it. Might be worth double checking before starting.

vroulet · 2024-02-05T12:36:46Z

clip_by_global_norm clips but do not necessarily normalize (if the updates are less than clip norm, then they are just returned as is). In other words clip projects on a ball and @smorad you want to project on a sphere.
I think @SauravMaheshkar pointed out a good starting point.

fabianp added the enhancement New feature or request label Dec 3, 2023

SauravMaheshkar mentioned this issue May 2, 2024

feat: add scale_by_gradient_norm #958

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Normalized gradient descent #594

[Feature Request] Normalized gradient descent #594

smorad commented Oct 9, 2023 •

edited

mtthss commented Oct 10, 2023

smorad commented Oct 10, 2023 •

edited

mtthss commented Oct 10, 2023 •

edited

SauravMaheshkar commented Nov 25, 2023

smorad commented Nov 25, 2023

vroulet commented Feb 5, 2024

[Feature Request] Normalized gradient descent #594

[Feature Request] Normalized gradient descent #594

Comments

smorad commented Oct 9, 2023 • edited

mtthss commented Oct 10, 2023

smorad commented Oct 10, 2023 • edited

mtthss commented Oct 10, 2023 • edited

SauravMaheshkar commented Nov 25, 2023

smorad commented Nov 25, 2023

vroulet commented Feb 5, 2024

smorad commented Oct 9, 2023 •

edited

smorad commented Oct 10, 2023 •

edited

mtthss commented Oct 10, 2023 •

edited