Skip to content
This repository has been archived by the owner on Nov 14, 2023. It is now read-only.

Support sklearn.compose.TransformedTargetRegressor #240

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

aidiss
Copy link

@aidiss aidiss commented Apr 18, 2022

Currently TuneSearchCV fails when provided with LGBMRegressor provided wrapped inside TransformedTargerRegressor.

For example, this block would fail

regressor = LGBMRegressor(**config)

regressor = TransformedTargetRegressorTwo(
    regressor=regressor, func=np.log1p, inverse_func=np.expm1
)

Failure happens partly because of failure of tune_search.utils.get_early_stop_type to read the type of the estimator.

Note sure if this is the correct implementation. But it should not break anything.

@Yard1
Copy link
Member

Yard1 commented Apr 18, 2022

Hey, this is a good catch. I think we would want to make this more generic if possible. We would eventually want to support other meta-estimators (including user-defined ones). Perhaps we can allow the user to just directly specify the early stop type. Also, I am not sure if we wouldn't need to change the logic related to early stopping in other places, too.

Can you add a unit test for this? Thanks!

@Yard1 Yard1 self-assigned this Apr 18, 2022
@aidiss
Copy link
Author

aidiss commented Apr 18, 2022

Added a test.
Let me know if its in the right place and done in a right way.

@aidiss
Copy link
Author

aidiss commented Apr 20, 2022

Secondly, I wonder, what could be implementation for handling estimators that are inside pipelines.
I guess we could traverse the estimators and look for the one that is supported? Or vice versa, skip the ones that are meta?
Or, maybe it could be possible to tell specifically what kind of early stopping is used by the estimator?

@@ -101,6 +103,14 @@ def get_early_stop_type(estimator, early_stopping):

if not early_stopping:
return EarlyStopping.NO_EARLY_STOP

if check_is_pipeline(estimator):
for step_name, step in estimator.steps:
Copy link
Member

@Yard1 Yard1 Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three things:

  • we shouldn't be only checking for regressors
  • this is not foolproof & goes against duck typing principle of sklearn
  • we can just assume that the last step of the pipeline is an estimator. You can't really make a pipeline with multiple estimators.

@Yard1
Copy link
Member

Yard1 commented Apr 22, 2022

Hey @aidiss I took a look at the logic and I unfortunately don't think this will work. We may detect the early stop type correctly, but we are unable to apply it during actual training if the target transformer is present. It would require a more through refactoring of the code. I'll put this on the backlog, unless you would be willing to work on this. We would need to support both the case with a pipeline and without it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants