Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need content on feature importance for tree-based methods #47

Open
davidrosenberg opened this issue Dec 26, 2017 · 2 comments
Open

Need content on feature importance for tree-based methods #47

davidrosenberg opened this issue Dec 26, 2017 · 2 comments

Comments

@davidrosenberg
Copy link
Owner

Software routinely gives various measures of feature importance and even marginal dependency on features. Explanation of what such things are and pointing out how they fail seems like a potentially very useful Lab topic.

@buj201
Copy link

buj201 commented Jan 3, 2018

Related subtopic: Tree-based models that require one-hot encoding of categoricals, vs. out-of-box support for categoricals.

  1. Example dataset: Consider looking at https://archive.ics.uci.edu/ml/datasets/Adult (categoricals for occupation/education/etc). Ideally we'd find a dataset where the categoricals rank differently for the two runs described in 2.
  2. Use catboost or lightgbm with and without one hot encoding preprocessor.
  3. Get feature_importance_ from two versions.
  4. Compare.

@buj201 buj201 mentioned this issue Jan 3, 2018
@davidrosenberg
Copy link
Owner Author

Somewhat related to feature importance is feature selection. Here's a paper on non-linear feature selection with gradient boosting. Haven't read it yet but could be interesting: http://alicezheng.org/papers/gbfs.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants