Improved code for "Example: Hyperparameters Tuning" Which can use "GASearchCV" for hyperparameter optimization with a "RandomForestClassifier". #149

MAhmadUzair · 2024-05-06T15:16:24Z

What the problem is:
In the current implementation of GASearchCV, I find it cumbersome to manually define a wide range of parameters and potential values for optimization without guidance on which parameters might have the greatest impact on model performance. This can be particularly frustrating for users who are not deeply familiar with the intricacies of each model parameter.

Use case for this feature:
This feature would be beneficial in educational settings or among novice machine learning practitioners, where understanding the impact of different parameters on model performance is crucial. It would also aid in more efficiently navigating the model tuning process, thus reducing the time and computational resources needed.

What I want to happen:
I would like an integrated feature within GASearchCV that suggests the most impactful parameters to optimize based on preliminary quick scans of the model’s performance with default settings. This feature could use a heuristic or data-driven approach to prioritize parameters that are likely to influence performance significantly.

Workflow you want to enable: The workflow would start with the user running a preliminary analysis using default model settings. Based on this analysis, GASearchCV would recommend a set of parameters to optimize, potentially with suggested ranges or distributions. The user could then either accept these recommendations directly into the optimization process or adjust them based on their specific needs and insights.

Additional context
Additional enhancements could include visualization tools integrated with GASearchCV to plot the evolution of model performance across generations, showing how different parameters impact the accuracy or other performance metrics. This would not only aid in selecting the best model but also in understanding the optimization process. Screenshots or visualizations of parameter impact and performance trends over generations could be particularly instructive for educational purposes and in-depth analysis.

This setup would create a more user-friendly, informative, and efficient optimization process, making advanced machine learning techniques more accessible and understandable to a broader range of users.

rodrigo-arenas · 2024-05-23T15:54:26Z

Hi @MAhmadUzair, I think this is an interesting idea, in the sense of allowing the user to understand better which parameters are more important during the optimization process, by running the algorithm for some generations.

However, I don't see a direct way the package could suggest a range of values for each hyperparameter, as the optimizer takes any classifier/regressor (it could even be a custom one you created), there is no way inside scikit-learn to know beforehand what is the accepted data type for each parameter (integer, float, categorical) with their expected range (so the algorithm doesn't fail or the values makes sense) without reading the algorithm docs directly.
For example for the Random Forest, even if n_estimators can go up to infinity, usually you don't need to go up to huge values to get a good result, or max_features, only makes sense as max the number of features of your dataset.

So the only way I see that the package could suggest this for any model would be to have a predefined list of models with their more common hypeparameters with their ranges and use this setting as default. If you see another way let me know to think through it

MAhmadUzair added the new feature Describe the request of new features label May 6, 2024

MAhmadUzair mentioned this issue May 7, 2024

Code Improvement in READ.md #150

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved code for "Example: Hyperparameters Tuning" Which can use "GASearchCV" for hyperparameter optimization with a "RandomForestClassifier". #149

Improved code for "Example: Hyperparameters Tuning" Which can use "GASearchCV" for hyperparameter optimization with a "RandomForestClassifier". #149

MAhmadUzair commented May 6, 2024

rodrigo-arenas commented May 23, 2024 •

edited

Improved code for "Example: Hyperparameters Tuning" Which can use "GASearchCV" for hyperparameter optimization with a "RandomForestClassifier". #149

Improved code for "Example: Hyperparameters Tuning" Which can use "GASearchCV" for hyperparameter optimization with a "RandomForestClassifier". #149

Comments

MAhmadUzair commented May 6, 2024

rodrigo-arenas commented May 23, 2024 • edited

rodrigo-arenas commented May 23, 2024 •

edited