Solver issues #90

billbrod · 2024-01-26T17:38:48Z

Copying @ahwillia comments from flatironinstitute/nemos-workshop-feb-2024#8

You can come up with convex problems for which gradient descent takes essentially forever to converge but second-order methods (e.g. Newton) perform well. Thus, in theory it really can matter which algorithm you choose! 🙂

Ideally, I would like to delete this part of the tutorial and engineer around this problem. At the end of model.fit we should check that the gradient is zero within some tolerance (specified at model initialization). If the norm of the gradient is above this tolerance we throw a really massive warning explaining that they should try a different optimization method / solver et cetera.

We may want to have a separate docs page that is focused on "debugging optimization failures" -- the warning message could link them to that. My guess is that optimization failures result from (a) using float32 and not float64, (b) not having enough regularization so the problem is only weakly convex - adding regularization should make it strictly convex, (c) problems with jaxopt that should be fixed (e.g. it seems like line search does a bad job if the initial learning rate is not tuned well?).

billbrod mentioned this issue Jan 26, 2024

gets started writing text on current injection flatironinstitute/nemos-workshop-feb-2024#8

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solver issues #90

Solver issues #90

billbrod commented Jan 26, 2024

Solver issues #90

Solver issues #90

Comments

billbrod commented Jan 26, 2024