Replies: 1 comment
-
We use Cython and compile code for doing this search. If you do it in pure Python then this will be extremely slow. Another trick (that is not implemented in scikit-learn) is to bins the features to evaluate only a subset of splits (as done in |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there,
Currently I am trying to rebuild a random forest and have some problems that the runtime is considerably higher than with the Sklearn. Therefore I had a look into the code and unfortunately do not understand an important detail (Gini impurity and best split)
When calculating the best split, the Gini impurity is calculated. However, you would have to calculate it for virtually every possible split, which makes it quite time consuming and thus costly. How exactly does Sklearn do this? Finding the best split is quite a time consuming task and this is done in the trees per node, how does this work so fast and where is the trick? Looking forward to an answer.
Beta Was this translation helpful? Give feedback.
All reactions