You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can come up with convex problems for which gradient descent takes essentially forever to converge but second-order methods (e.g. Newton) perform well. Thus, in theory it really can matter which algorithm you choose! 🙂
Ideally, I would like to delete this part of the tutorial and engineer around this problem. At the end of model.fit we should check that the gradient is zero within some tolerance (specified at model initialization). If the norm of the gradient is above this tolerance we throw a really massive warning explaining that they should try a different optimization method / solver et cetera.
We may want to have a separate docs page that is focused on "debugging optimization failures" -- the warning message could link them to that. My guess is that optimization failures result from (a) using float32 and not float64, (b) not having enough regularization so the problem is only weakly convex - adding regularization should make it strictly convex, (c) problems with jaxopt that should be fixed (e.g. it seems like line search does a bad job if the initial learning rate is not tuned well?).
The text was updated successfully, but these errors were encountered:
Copying @ahwillia comments from flatironinstitute/nemos-workshop-feb-2024#8
The text was updated successfully, but these errors were encountered: