Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] cannot correctly clone CascadeForestRegressor with sklearn.base.clone when using customized estimators #92

Open
IncubatorShokuhou opened this issue Aug 19, 2021 · 1 comment
Labels
needtriage Further information is requested

Comments

@IncubatorShokuhou
Copy link
Contributor

IncubatorShokuhou commented Aug 19, 2021

Describe the bug
cannot correctly clone CascadeForestClassifier/CascadeForestRegressor object with sklearn.base.clone when using customized stimators

To Reproduce

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.base import clone
from deepforest import CascadeForestRegressor
import xgboost as xgb
import lightgbm as lgb

X, y = load_boston(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
model = CascadeForestRegressor(random_state=1)

# set estimator
n_estimators = 4  # the number of base estimators per cascade layer
estimators = [lgb.LGBMRegressor(random_state=i)  for i in range(n_estimators)]
model.set_estimator(estimators)

# set predictor 
predictor = xgb.XGBRegressor()
model.set_predictor(predictor)

# clone model
model_new = clone(model)

# try to fit
model.fit(X_train, y_train)

Expected behavior
No error

Additional context

~/miniconda3/envs/pycaret/lib/python3.8/site-packages/deep_forest-0.1.5-py3.8-linux-x86_64.egg/deepforest/cascade.py in fit(self, X, y, sample_weight)
   1004                 if not hasattr(self, "predictor_"):
   1005                     msg = "Missing predictor after calling `set_predictor`"
-> 1006                     raise RuntimeError(msg)
   1007 
   1008             binner_ = Binner(

RuntimeError: Missing predictor after calling `set_predictor`

This bug occours because when the model is cloned, if the model has customized predictor or estimators, predictor='custom' will be cloned, while self.predictor_ / self.dummy_estimators will not be correctly cloned, which introduced the bug described above.

I think this bug can be easily fixed by putting the predictor and the list of estimators into the parameter of CascadeForestClassifier/CascadeForestRegressor, just like the way of those meta estimators (e.g. ngboost), but maybe the corresponding APIs will have to be changed.

For example, the API parameters could be:

model = CascadeForestRegressor(
    estimators=[lgb.LGBMRegressor(random_state=i) for i in range(n_estimators)],
    predictor=xgb.XGBRegressor(),
)
@xuyxu
Copy link
Member

xuyxu commented Aug 19, 2021

Thanks for reporting, will take a look during the weekend.

@xuyxu xuyxu added the needtriage Further information is requested label Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needtriage Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants