Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT Remove statistics from neighbors.BinaryTree #19884

Closed
wants to merge 2 commits into from

Conversation

jjerphan
Copy link
Member

Reference Issues/PRs

Fixes #13330
Follow-up of #13331

What does this implement/fix? Explain your changes.

neighbors.BinaryTree internally stores some statistics. Their computations has shown the slow the runtime down and albeit they are publicly exposed, they aren't popularly used (see #13330 and #13331).

This PR removes statistics to improve neighbors.BinaryTrees' runtime.

Any other comments?

This PR is experimental and created to assess the benefits of removing statistics.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick comment.

Assuming there is a significant performance benefit in removing those, we would first need to go through a deprecation cycle for get_tree_stats as it's technically part of the documented public API:

https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.BallTree.html#sklearn.neighbors.BallTree.get_tree_stats


#------------------------------------------------------------
# Case 2: this is a leaf node. Update set of nearby points
elif node_data[i_node].is_leaf:
self.n_leaves += 1
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pass

@NicolasHug
Copy link
Member

I'm not familiar with that part of the code but I'm surprised at first sight that these would account for such a difference in performance. They're all super simple operations and even heap.largest(i_pt) should just be an array lookup.

Is there some python interaction going on (those can usually be detected with cython -a)? Or possibly something similar to #17299 is happening?

@ogrisel
Copy link
Member

ogrisel commented Apr 13, 2021

kd-trees and ball-trees are most useful when n_features is very small so adding one arithmetic operation per feature can have an impact in some cases.

Still, I would like to see benchmarks. And maybe there are other ways to make that code run faster first.

@ogrisel
Copy link
Member

ogrisel commented Apr 13, 2021

I re-read the original discussion and the main performance problem appears when using several threads on the same tree which can cause detrimental CPU cache invalidations as explained here: #13330 (comment) (just an hypothesis to be confirmed by CPU cache profiling but that looks like a good candidate).

@jjerphan
Copy link
Member Author

I ran some benchmarks on my machine. It's rather hard to conclude with the results (see bellow).

I would rather redo them on a machine with more cores, as reported in #13330, to access any significance before pursuing with it.

What do you think?

 → asv continuous -b Binary main remove_stats_binary_tree
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl
·· Installing 1e0d5985 <remove_stats_binary_tree> into conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl.
· Running 4 total benchmarks (2 commits * 1 environments * 2 benchmarks)
[  0.00%] · For scikit-learn commit 1e0d5985 <remove_stats_binary_tree> (round 1/1):
[  0.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl
[ 25.00%] ··· binary_tree.BinaryTreeStatsCreation.time_creation                                                                                                                                                                                                                                                              ok
[ 25.00%] ··· ======================================= ======= ============ ============ ============ ============ ============= ============
              --                                                                              d / leaf_size
              ----------------------------------------------- ------------------------------------------------------------------------------
                             BinaryTree                  n      10 / 10      10 / 100     100 / 10    100 / 100     1000 / 10    1000 / 100
              ======================================= ======= ============ ============ ============ ============ ============= ============
                 sklearn.neighbors._kd_tree.KDTree      100     106±6μs      71.7±2μs     496±10μs     132±4μs     4.16±0.08ms    615±10μs
                 sklearn.neighbors._kd_tree.KDTree      1000    959±50μs    570±100μs    7.61±0.2ms   3.96±0.1ms     75.9±1ms    41.9±0.8ms
                 sklearn.neighbors._kd_tree.KDTree     10000   12.4±0.4ms   8.09±0.2ms    115±6ms      76.4±2ms     1.34±0.08s   1.02±0.06s
               sklearn.neighbors._ball_tree.BallTree    100     111±4μs      79.4±4μs     333±7μs      96.6±8μs    2.64±0.08ms    302±8μs
               sklearn.neighbors._ball_tree.BallTree    1000    840±60μs     447±40μs    5.06±0.2ms   2.99±0.3ms     57.5±5ms     31.6±3ms
               sklearn.neighbors._ball_tree.BallTree   10000    11.4±2ms    7.13±0.3ms    85.6±7ms     63.9±7ms     1.08±0.09s    770±70ms
              ======================================= ======= ============ ============ ============ ============ ============= ============

[ 50.00%] ··· binary_tree.BinaryTreeStatsQuery.time_query                                                                                                                                                                                                                                                           4/36 failed
[ 50.00%] ··· ======================================= ======= ============ ============ ============= ============= ============ ============
              --                                                                               d / leaf_size
              ----------------------------------------------- -------------------------------------------------------------------------------
                             BinaryTree                  n      10 / 10      10 / 100      100 / 10     100 / 100    1000 / 10    1000 / 100
              ======================================= ======= ============ ============ ============= ============= ============ ============
                 sklearn.neighbors._kd_tree.KDTree      100     543±70μs     224±20μs     3.59±0.1ms   1.40±0.08ms    34.5±1ms    12.5±0.3ms
                 sklearn.neighbors._kd_tree.KDTree      1000   27.3±0.9ms   14.7±0.6ms     331±20ms      145±10ms    3.17±0.2s    1.43±0.06s
                 sklearn.neighbors._kd_tree.KDTree     10000    770±6ms      696±9ms      41.4±0.5s     23.3±0.4s      failed       failed
               sklearn.neighbors._ball_tree.BallTree    100     274±10μs     203±8μs     1.39±0.05ms   1.15±0.03ms   13.1±0.3ms   11.8±0.3ms
               sklearn.neighbors._ball_tree.BallTree    1000   19.0±0.8ms   15.2±0.6ms     136±5ms      116±0.9ms    1.41±0.04s   1.26±0.02s
               sklearn.neighbors._ball_tree.BallTree   10000   1.13±0.02s   1.75±0.05s    24.4±0.2s     23.6±0.7s      failed       failed
              ======================================= ======= ============ ============ ============= ============= ============ ============

[ 50.00%] · For scikit-learn commit 9b7ff272 <main> (round 1/1):
[ 50.00%] ·· Building for conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl..........
[ 50.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl
[ 75.00%] ··· binary_tree.BinaryTreeStatsCreation.time_creation                                                                                                                                                                                                                                                              ok
[ 75.00%] ··· ======================================= ======= ============ ============ ============ ============= ============= ============
              --                                                                               d / leaf_size
              ----------------------------------------------- -------------------------------------------------------------------------------
                             BinaryTree                  n      10 / 10      10 / 100     100 / 10     100 / 100     1000 / 10    1000 / 100
              ======================================= ======= ============ ============ ============ ============= ============= ============
                 sklearn.neighbors._kd_tree.KDTree      100     123±3μs      78.7±2μs     537±10μs      145±3μs     4.51±0.08ms    696±20μs
                 sklearn.neighbors._kd_tree.KDTree      1000    989±30μs     550±20μs    7.99±0.1ms   4.17±0.09ms    79.3±0.5ms   44.2±0.7ms
                 sklearn.neighbors._kd_tree.KDTree     10000   14.0±0.4ms   9.28±0.2ms    122±1ms       82.7±1ms     1.34±0.04s    936±90ms
               sklearn.neighbors._ball_tree.BallTree    100     102±2μs      71.5±2μs     319±7μs       94.7±3μs    2.57±0.06ms    286±5μs
               sklearn.neighbors._ball_tree.BallTree    1000    676±10μs     384±8μs     4.83±0.1ms   2.57±0.05ms    48.8±0.7ms   28.3±0.5ms
               sklearn.neighbors._ball_tree.BallTree   10000   9.27±0.2ms   6.30±0.2ms   76.2±0.6ms    53.1±0.5ms     880±70ms     636±50ms
              ======================================= ======= ============ ============ ============ ============= ============= ============

[100.00%] ··· binary_tree.BinaryTreeStatsQuery.time_query                                                                                                                                                                                                                                                           4/36 failed
[100.00%] ··· ======================================= ======= ============ ============ ============= ============= ============ ============
              --                                                                               d / leaf_size
              ----------------------------------------------- -------------------------------------------------------------------------------
                             BinaryTree                  n      10 / 10      10 / 100      100 / 10     100 / 100    1000 / 10    1000 / 100
              ======================================= ======= ============ ============ ============= ============= ============ ============
                 sklearn.neighbors._kd_tree.KDTree      100     488±10μs     212±4μs     3.39±0.08ms   1.17±0.03ms   32.4±0.5ms   11.9±0.2ms
                 sklearn.neighbors._kd_tree.KDTree      1000   27.2±0.6ms   13.9±0.4ms     289±1ms      127±0.7ms    2.82±0.01s   1.29±0.01s
                 sklearn.neighbors._kd_tree.KDTree     10000    779±7ms      700±9ms      36.4±0.3s     20.8±0.06s     failed       failed
               sklearn.neighbors._ball_tree.BallTree    100     290±20μs     228±10μs    1.50±0.07ms   1.29±0.04ms   13.9±0.4ms   11.9±0.4ms
               sklearn.neighbors._ball_tree.BallTree    1000    20.1±1ms    17.4±0.9ms     146±1ms       129±1ms     1.54±0.01s   1.40±0.01s
               sklearn.neighbors._ball_tree.BallTree   10000   1.56±0.06s   2.42±0.05s    33.3±0.5s     23.7±0.2s      failed       failed
              ======================================= ======= ============ ============ ============= ============= ============ ============

       before           after         ratio
     [9b7ff272]       [1e0d5985]
     <main>           <remove_stats_binary_tree>
+        676±10μs         840±60μs     1.24  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 1000, 10, 10)
+        880±70ms       1.08±0.09s     1.23  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 10000, 1000, 10)
+      9.27±0.2ms         11.4±2ms     1.23  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 10000, 10, 10)
+        636±50ms         770±70ms     1.21  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 10000, 1000, 100)
+      53.1±0.5ms         63.9±7ms     1.20  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 10000, 100, 100)
+     1.17±0.03ms      1.40±0.08ms     1.19  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._kd_tree.KDTree'>, 100, 100, 100)
+      48.8±0.7ms         57.5±5ms     1.18  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 1000, 1000, 10)
+         384±8μs         447±40μs     1.16  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 1000, 10, 100)
+     2.57±0.05ms       2.99±0.3ms     1.16  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 1000, 100, 100)
+         289±1ms         331±20ms     1.15  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._kd_tree.KDTree'>, 1000, 100, 10)
+       127±0.7ms         145±10ms     1.14  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._kd_tree.KDTree'>, 1000, 100, 100)
+      6.30±0.2ms       7.13±0.3ms     1.13  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 10000, 10, 100)
+      2.82±0.01s        3.17±0.2s     1.12  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._kd_tree.KDTree'>, 1000, 1000, 10)
+      76.2±0.6ms         85.6±7ms     1.12  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 10000, 100, 10)
+      28.3±0.5ms         31.6±3ms     1.12  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 1000, 1000, 100)
+        488±10μs         543±70μs     1.11  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._kd_tree.KDTree'>, 100, 10, 10)
+      1.29±0.01s       1.43±0.06s     1.11  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._kd_tree.KDTree'>, 1000, 1000, 100)
+        71.5±2μs         79.4±4μs     1.11  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 100, 10, 100)
-         145±3μs          132±4μs     0.91  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._kd_tree.KDTree'>, 100, 100, 100)
-         129±1ms        116±0.9ms     0.90  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._ball_tree.BallTree'>, 1000, 100, 100)
-      1.40±0.01s       1.26±0.02s     0.90  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._ball_tree.BallTree'>, 1000, 1000, 100)
-     1.29±0.04ms      1.15±0.03ms     0.90  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._ball_tree.BallTree'>, 100, 100, 100)
-        228±10μs          203±8μs     0.89  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._ball_tree.BallTree'>, 100, 10, 100)
-        696±20μs         615±10μs     0.88  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._kd_tree.KDTree'>, 100, 1000, 100)
-      14.0±0.4ms       12.4±0.4ms     0.88  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._kd_tree.KDTree'>, 10000, 10, 10)
-      17.4±0.9ms       15.2±0.6ms     0.87  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._ball_tree.BallTree'>, 1000, 10, 100)
-      9.28±0.2ms       8.09±0.2ms     0.87  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._kd_tree.KDTree'>, 10000, 10, 100)
-         123±3μs          106±6μs     0.86  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._kd_tree.KDTree'>, 100, 10, 10)
-       33.3±0.5s        24.4±0.2s     0.73  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._ball_tree.BallTree'>, 10000, 100, 10)
-      2.42±0.05s       1.75±0.05s     0.72  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._ball_tree.BallTree'>, 10000, 10, 100)
-      1.56±0.06s       1.13±0.02s     0.72  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._ball_tree.BallTree'>, 10000, 10, 10)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

@jjerphan jjerphan closed this Apr 22, 2021
@jjerphan jjerphan deleted the remove_stats_binary_tree branch April 22, 2021 15:33
@jjerphan jjerphan restored the remove_stats_binary_tree branch April 22, 2021 15:33
@jjerphan jjerphan deleted the remove_stats_binary_tree branch July 10, 2022 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nearest neighbors with trees perf decreased by debugging stats
3 participants