Why does sklearn tree model "split" feature importance within identical features? #19569
Replies: 2 comments 2 replies
-
It probably comes from the fact that when 2 splits have the same gain, the one that gets selected is arbitrary.
It'd be good have a deterministic rule here, like select the feature with the lowest index. |
Beta Was this translation helpful? Give feedback.
-
Does xgboost have specific logic to deal with this? Presumably the same
should apply to features that have identical order (as in rank transform),
as these too will have equal importance with trees.
|
Beta Was this translation helpful? Give feedback.
-
When the dataset have identical features, to me it makes more sense that the tree model just pick one of it, so the rest of them will have a feature importance of 0 (like XGBoost does).
But sklearn tree model seems to be "splitting" feature importance between those identical features, could anyone help explain why sklearn does it this way? Thank you !
example code:
Beta Was this translation helpful? Give feedback.
All reactions