Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does BorutaPy work with cuML RandomForestClassifier? #99

Open
curtisraymond opened this issue Jul 15, 2021 · 8 comments
Open

Does BorutaPy work with cuML RandomForestClassifier? #99

curtisraymond opened this issue Jul 15, 2021 · 8 comments

Comments

@curtisraymond
Copy link

I tried running BorutaPy using cuML's RF classifier but I receive the following error: "ValueError: Bad param 'random_state' passed to set_params". Does BorutaPy work with cuML RandomForestClassifier?

Ideally I'd like to speed things up using a classifier that works well with gpu.

@Wuuzzaa
Copy link

Wuuzzaa commented Jul 15, 2021

At the moment boruta tries to set the random state to all estimators. cuML's RF classifier do not have this parameter.

You can try a fix like with lightgbm. Something like this before the else part could help you.

if isinstance(self.estimator, cuml_type_here): pass

# make sure we start with a new tree in each iteration
if self._is_lightgbm:
self.estimator.set_params(random_state=self.random_state.randint(0, 10000))
else:
self.estimator.set_params(random_state=self.random_state)

@curtisraymond
Copy link
Author

Thanks @Wuuzzaa.

I made the adjustment you recommended but now I'm receiving this error: "ValueError: Only methods with feature_importance_ attribute are currently supported in BorutaPy."

Any recommendations on this issue?

@Wuuzzaa
Copy link

Wuuzzaa commented Jul 15, 2021

Seems like the implementation from cuML´s random forest differs quiete a lot from sklearns. I just took a look at the docu and do not found something similar to the feature importance.
cuML Random Forest

Some kind of feature importance is necessary for boruta to determine which features are useful. I think there is no easy way to work around this issue.

@lindeberg25
Copy link

lindeberg25 commented Oct 21, 2023

@curtisraymond and @Wuuzzaa Hi ... any solution for this?

I'm going through the same problem. However, I'm getting a different error: "integer required"

Error

TypeError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/boruta/boruta_py.py in _get_imp(self, X, y)
383 try:
--> 384
385 self.estimator.fit(X, y)
randomforestclassifier.pyx in cuml.ensemble.randomforestclassifier.RandomForestClassifier.fit()

TypeError: an integer is required

ValueError: Please check your X and y variable. The providedestimator cannot be fitted to your data.
an integer is required

@Wuuzzaa
Copy link

Wuuzzaa commented Oct 21, 2023

My blind guess would be an error on your y data? y must be integers. Did you check your X and y for compatible Data types.
For the types see: docu

@lindeberg25
Copy link

Hi @Wuuzzaa ..

Thank you for the quick reply.

y are integers. It works fine when I use sklearn's RF classifier. But I get this error when I use cuML's RF classifier.

My guess is that there might be an incompatibility between cuML and BorutaPy

@Wuuzzaa
Copy link

Wuuzzaa commented Oct 25, 2023

BorutaPy was never planned to be used within cuML. Seems like it still do not work. Like beckernick mentioned there is still an open Issue on cuML for the implementation of the Feature Importance which is needed for boruta to work.

@beckernick
Copy link

Thanks for linking that issue @Wuuzzaa !

@lindeberg25 , we'd love to learn more about your use case and performance impact of using cuML's Random Forest vs. scikit-learn's RF. Let's continue the discussion on the linked issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants