Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

include hyperparameter tuning #9

Open
sp8rks opened this issue Jan 20, 2022 · 1 comment
Open

include hyperparameter tuning #9

sp8rks opened this issue Jan 20, 2022 · 1 comment
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@sp8rks
Copy link
Collaborator

sp8rks commented Jan 20, 2022

add a section on hyperparameter tuning since classical models were used with default hyperparameters

@anthony-wang anthony-wang added enhancement New feature or request good first issue Good for newcomers labels Jan 24, 2022
@sgbaird
Copy link

sgbaird commented Jan 27, 2022

Suggestion for some of the commentary in the markdown cell about hyperparameter optimization. Feel free to edit as needed.

  • If evaluations are very inexpensive (i.e. millions of evaluations), go with grid-based, random, or SOBOL points via e.g. sklearn.model_selection.GridSearchCV, sklearn.model_selection.RandomizedSearchCV, or skopt.sampler.Sobol, respectively. Grid-based may be good enough, but random is generally better than grid-based, and SOBOL is generally better than random. To integrate SOBOL with a CV search, see e.g. sklearn.model_selection.cross_validate
  • If evaluations are moderately inexpensive (i.e. tens of thousands of evaluations), go with a genetic algorithm via e.g. sklearn-genetic-opt or TPOT.
  • If evaluations are very expensive (i.e. hundreds of evaluations), go with Bayesian optimization via e.g. skopt.BayesSearchCV or Ax. BayesSearchCV is a more lightweight model and requires models to be optimized that match the scikit-learn estimator API. Ax has much more sophisticated Bayesian models, including automatic relevance determination (ARD) and corresponding feature importances, advanced handling of noise, and capabilities to handle high-dimensional datasets. It also has several interfaces ranging from easy-to-use to heavily customizable and is a tool that we recommend.
  • There may be other reasons in addition to the expense of model evaluation that can guide the choice of hyperparameter optimization scheme such as interpretability and ease of use.
  • In our case, due to [inexpensive/moderately expensive/expensive] model evaluations for sklearn models and to maintain a lightweight environment, we choose to use [GridSearchCV/sklearn-genetic-opt/skopt.BayesSearchCV; however, other options could have been used instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants