Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BorutaPy selects different features in different iterations #121

Open
VEZcoding opened this issue Sep 27, 2023 · 1 comment
Open

BorutaPy selects different features in different iterations #121

VEZcoding opened this issue Sep 27, 2023 · 1 comment

Comments

@VEZcoding
Copy link

First of all thanks for the package, I've been using it a lot in my work.

I came up to something strange. I know Boruta selects everything that is important.

I have a dataframe of 200 observations and 2000 features. if I shuffle the order of the features in the dataframe, Boruta (Random forrest classifier) will return different important features.

Also if I have 200 observations with first 1000 features Boruta selects a list of n-important features. But If I add another 1000 to the mix Boruta will select another set of features and the features from the first 1000 group won't be in them.

So why is Boruta always selecting different features if it should always select the best ones? How can best features change if you change the order of columns.

@matteobolner
Copy link

Try setting a random state seed (random_state), and the results should be consistent across runs, even if you change the order of the features. For the second questions, I think it's to be expected that adding data could lead to different results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants