Seeking advice on adapting a normal sklearn pipeline/search space to Hyperopt #197

seand412 · 2023-03-28T17:14:07Z

Hello,

I am working on some automated model building for spectroscopy data. I have a pipeline and search space defined and can search through it using the typical sklearn GridSearchCV/RandomizedSearchCV/etc to find the best combination. But I can't find the right way to adapt it to use Hyperopt, to get the improved search speed from the Bayes search algorithm. I'm fairly new to this so apologies if I'm missing something obvious.

My starting/default pipeline:
pipe = Pipeline([
('scaler', Normalizer()), # Normalizer, MinMaxScaler, MaxAbsScaler, or StandardScaler
('corrector', MSCTransformer()), # MSCTransformer or SNVTransformer or DummyTransformer
('savgol', SavGolTransformer(window_length=15, polyorder=4, deriv=1)), # or DummyTransformer (no savgol)
('pls', PLSRegression(n_components=3))
])

I want to try different algorithms for the scaler step and corrector step, then different values of window_length/polyorder/deriv for the savgol step, then different values of n_components for the pls step. The transformers are all custom classes which inherit from sklearn.base.BaseEstimator and sklearn.base.TransformerMixin to get the fit_transform methods. The DummyTransformer simply passes through input data with the .transform, the SavGolTransformer returns the scipy.savgol_filter output of the input data with the .transform, etc.

With normal sklearn, my search space looks like this:

param_grid = [{
'scaler': [MinMaxScaler(), Normalizer(), MaxAbsScaler(), StandardScaler()],
'corrector': [MSCTransformer(), SNVTransformer(), DummyTransformer()],
'savgol__window_length': np.arange(5, 20),
'savgol__polyorder': np.arange(1, 5),
'savgol__deriv': np.arange(0, 2),
'savgol': [SavGolTransformer(15, 4, 1), DummyTransformer()],
'pls__n_components': np.arange(1, 4)
}]

Then I can search through this parameter space with (e.g.)
search = GridSearchCV(pipe, param_grid, scoring='neg_root_mean_squared_error')
search.fit(X_train, y_train)
to get the best results.

I have found a lot of examples of creating a search space for Hyperopt/hyperopt-sklearn, but none of them do quite what I'm trying to do here. Any help with constructing the proper Hyperopt search space and getting the best pipeline out would be much appreciated.

Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeking advice on adapting a normal sklearn pipeline/search space to Hyperopt #197

Seeking advice on adapting a normal sklearn pipeline/search space to Hyperopt #197

seand412 commented Mar 28, 2023

Seeking advice on adapting a normal sklearn pipeline/search space to Hyperopt #197

Seeking advice on adapting a normal sklearn pipeline/search space to Hyperopt #197

Comments

seand412 commented Mar 28, 2023