Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeking advice on adapting a normal sklearn pipeline/search space to Hyperopt #197

Open
seand412 opened this issue Mar 28, 2023 · 0 comments

Comments

@seand412
Copy link

Hello,

I am working on some automated model building for spectroscopy data. I have a pipeline and search space defined and can search through it using the typical sklearn GridSearchCV/RandomizedSearchCV/etc to find the best combination. But I can't find the right way to adapt it to use Hyperopt, to get the improved search speed from the Bayes search algorithm. I'm fairly new to this so apologies if I'm missing something obvious.

My starting/default pipeline:
pipe = Pipeline([
('scaler', Normalizer()), # Normalizer, MinMaxScaler, MaxAbsScaler, or StandardScaler
('corrector', MSCTransformer()), # MSCTransformer or SNVTransformer or DummyTransformer
('savgol', SavGolTransformer(window_length=15, polyorder=4, deriv=1)), # or DummyTransformer (no savgol)
('pls', PLSRegression(n_components=3))
])

I want to try different algorithms for the scaler step and corrector step, then different values of window_length/polyorder/deriv for the savgol step, then different values of n_components for the pls step. The transformers are all custom classes which inherit from sklearn.base.BaseEstimator and sklearn.base.TransformerMixin to get the fit_transform methods. The DummyTransformer simply passes through input data with the .transform, the SavGolTransformer returns the scipy.savgol_filter output of the input data with the .transform, etc.

With normal sklearn, my search space looks like this:

param_grid = [{
'scaler': [MinMaxScaler(), Normalizer(), MaxAbsScaler(), StandardScaler()],
'corrector': [MSCTransformer(), SNVTransformer(), DummyTransformer()],
'savgol__window_length': np.arange(5, 20),
'savgol__polyorder': np.arange(1, 5),
'savgol__deriv': np.arange(0, 2),
'savgol': [SavGolTransformer(15, 4, 1), DummyTransformer()],
'pls__n_components': np.arange(1, 4)
}]

Then I can search through this parameter space with (e.g.)
search = GridSearchCV(pipe, param_grid, scoring='neg_root_mean_squared_error')
search.fit(X_train, y_train)
to get the best results.

I have found a lot of examples of creating a search space for Hyperopt/hyperopt-sklearn, but none of them do quite what I'm trying to do here. Any help with constructing the proper Hyperopt search space and getting the best pipeline out would be much appreciated.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant