Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune redundant Decision Tree leaf nodes #251

Open
andrewdalpino opened this issue Sep 30, 2022 · 0 comments
Open

Prune redundant Decision Tree leaf nodes #251

andrewdalpino opened this issue Sep 30, 2022 · 0 comments
Assignees
Labels
help wanted Extra attention is needed optimization Make something perform faster

Comments

@andrewdalpino
Copy link
Member

andrewdalpino commented Sep 30, 2022

The current Decision Tree grow() implementation does not prune pure leaf nodes that have the same class outcome and both stem from a common ancestor node. Pruning would involve replacing the Split node with a pure leaf node. See image.

test

The problematic logic can be found here https://github.com/RubixML/ML/blob/master/src/Graph/Trees/DecisionTree.php#L188

Instead of terminating after the Split node was added to the stack, we could detect the condition of a pure split containing only one class and then right away replace the Split node with a leaf node.

Here is the test that generated this Graphviz visual (except the number of bins was set to 3 instead of 5) https://github.com/RubixML/ML/blob/master/tests/Classifiers/ClassificationTreeTest.php#L194.

This should speed up inference by reducing the number of splits that need to be evaluated as well as reduce the memory and storage cost of trained Decision Tree models. Effects Classification/Regression Trees, Extras Trees, Gradient Boost, Logit Boost, Random Forest, and AdaBoost.

@andrewdalpino andrewdalpino self-assigned this Sep 30, 2022
@andrewdalpino andrewdalpino added optimization Make something perform faster help wanted Extra attention is needed labels Sep 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed optimization Make something perform faster
Projects
None yet
Development

No branches or pull requests

1 participant