-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT Remove statistics from neighbors.BinaryTree
#19884
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick comment.
Assuming there is a significant performance benefit in removing those, we would first need to go through a deprecation cycle for get_tree_stats
as it's technically part of the documented public API:
|
||
#------------------------------------------------------------ | ||
# Case 2: this is a leaf node. Update set of nearby points | ||
elif node_data[i_node].is_leaf: | ||
self.n_leaves += 1 | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass |
I'm not familiar with that part of the code but I'm surprised at first sight that these would account for such a difference in performance. They're all super simple operations and even Is there some python interaction going on (those can usually be detected with |
kd-trees and ball-trees are most useful when Still, I would like to see benchmarks. And maybe there are other ways to make that code run faster first. |
I re-read the original discussion and the main performance problem appears when using several threads on the same tree which can cause detrimental CPU cache invalidations as explained here: #13330 (comment) (just an hypothesis to be confirmed by CPU cache profiling but that looks like a good candidate). |
I ran some benchmarks on my machine. It's rather hard to conclude with the results (see bellow). I would rather redo them on a machine with more cores, as reported in #13330, to access any significance before pursuing with it. What do you think?
|
Reference Issues/PRs
Fixes #13330
Follow-up of #13331
What does this implement/fix? Explain your changes.
neighbors.BinaryTree
internally stores some statistics. Their computations has shown the slow the runtime down and albeit they are publicly exposed, they aren't popularly used (see #13330 and #13331).This PR removes statistics to improve
neighbors.BinaryTree
s' runtime.Any other comments?
This PR is experimental and created to assess the benefits of removing statistics.