You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In sec 20.1, The comment about scipy.minimize where you say "we don’t even need to compute the gradient" may be misleading. As you know, by default it uses numerical differentaiton to compute the gradient, if the grad function is not specified by the user, so this is likely to be slow. You may want to mention automatic differentiation libraries like jax and pytorch, which can solve this problem for you. (Also scipy.minimize defaults to BFGS, not GD, and chooses step size automagically :) Since this book is trying to demonstrate "best practice" for DS (eg the nice way you use dataframe.pipe for reproducible wrangling), maybe you should show how to use scipy.minimize on your example problem?
In sec 20.2 first 2 paragraphs need rewriting to avoid repetition/ redundancy.
In sec 20.3 maybe mention that convex implies second order derivative is positive, so the function has a bowl shape.
This condition is easier to check in practice than the definition of convexity. It's probably also worth mentioning some examples of convex and non-convex loss functions encountered in the book.
Maybe mention SAGA and other variance reduced SGD methods since it is used in 21.4.1?
The text was updated successfully, but these errors were encountered:
murphyk
changed the title
sec 20.2 scipy.minimize comment
ch 20 (optimization): a few small issues
Jul 7, 2023
In sec 20.1, The comment about scipy.minimize where you say "we don’t even need to compute the gradient" may be misleading. As you know, by default it uses numerical differentaiton to compute the gradient, if the grad function is not specified by the user, so this is likely to be slow. You may want to mention automatic differentiation libraries like jax and pytorch, which can solve this problem for you. (Also scipy.minimize defaults to BFGS, not GD, and chooses step size automagically :) Since this book is trying to demonstrate "best practice" for DS (eg the nice way you use dataframe.pipe for reproducible wrangling), maybe you should show how to use scipy.minimize on your example problem?
In sec 20.2 first 2 paragraphs need rewriting to avoid repetition/ redundancy.
In sec 20.3 maybe mention that convex implies second order derivative is positive, so the function has a bowl shape.
This condition is easier to check in practice than the definition of convexity. It's probably also worth mentioning some examples of convex and non-convex loss functions encountered in the book.
Maybe mention SAGA and other variance reduced SGD methods since it is used in 21.4.1?
The text was updated successfully, but these errors were encountered: