Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved code for "Example: Hyperparameters Tuning" Which can use "GASearchCV" for hyperparameter optimization with a "RandomForestClassifier". #149

Open
MAhmadUzair opened this issue May 6, 2024 · 1 comment
Labels
new feature Describe the request of new features

Comments

@MAhmadUzair
Copy link
Contributor

What the problem is:
In the current implementation of GASearchCV, I find it cumbersome to manually define a wide range of parameters and potential values for optimization without guidance on which parameters might have the greatest impact on model performance. This can be particularly frustrating for users who are not deeply familiar with the intricacies of each model parameter.

Use case for this feature:
This feature would be beneficial in educational settings or among novice machine learning practitioners, where understanding the impact of different parameters on model performance is crucial. It would also aid in more efficiently navigating the model tuning process, thus reducing the time and computational resources needed.

What I want to happen:
I would like an integrated feature within GASearchCV that suggests the most impactful parameters to optimize based on preliminary quick scans of the model’s performance with default settings. This feature could use a heuristic or data-driven approach to prioritize parameters that are likely to influence performance significantly.

Workflow you want to enable: The workflow would start with the user running a preliminary analysis using default model settings. Based on this analysis, GASearchCV would recommend a set of parameters to optimize, potentially with suggested ranges or distributions. The user could then either accept these recommendations directly into the optimization process or adjust them based on their specific needs and insights.

Additional context
Additional enhancements could include visualization tools integrated with GASearchCV to plot the evolution of model performance across generations, showing how different parameters impact the accuracy or other performance metrics. This would not only aid in selecting the best model but also in understanding the optimization process. Screenshots or visualizations of parameter impact and performance trends over generations could be particularly instructive for educational purposes and in-depth analysis.

This setup would create a more user-friendly, informative, and efficient optimization process, making advanced machine learning techniques more accessible and understandable to a broader range of users.

@MAhmadUzair MAhmadUzair added the new feature Describe the request of new features label May 6, 2024
@rodrigo-arenas
Copy link
Owner

rodrigo-arenas commented May 23, 2024

Hi @MAhmadUzair, I think this is an interesting idea, in the sense of allowing the user to understand better which parameters are more important during the optimization process, by running the algorithm for some generations.

However, I don't see a direct way the package could suggest a range of values for each hyperparameter, as the optimizer takes any classifier/regressor (it could even be a custom one you created), there is no way inside scikit-learn to know beforehand what is the accepted data type for each parameter (integer, float, categorical) with their expected range (so the algorithm doesn't fail or the values makes sense) without reading the algorithm docs directly.
For example for the Random Forest, even if n_estimators can go up to infinity, usually you don't need to go up to huge values to get a good result, or max_features, only makes sense as max the number of features of your dataset.

So the only way I see that the package could suggest this for any model would be to have a predefined list of models with their more common hypeparameters with their ranges and use this setting as default. If you see another way let me know to think through it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature Describe the request of new features
Projects
None yet
Development

No branches or pull requests

2 participants