Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT PairwiseDistancesReduction: Do not slice memoryviews in _compute_dist_middle_terms #24715

Merged
merged 5 commits into from
Nov 22, 2022

Conversation

jjerphan
Copy link
Member

Reference Issues/PRs

Relates to #22587.
Originally part of #24542.

What does this implement/fix? Explain your changes.

See the reasons here: #17299

Any other comments?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see a performance improvement with this change?

Codewise, this LGTM. In general, I am okay with avoiding slicing memoryviews in Cython.

jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Oct 26, 2022
@jjerphan
Copy link
Member Author

I am currently rerunning benchmarks on relevant configurations on a machine with 128 cores.

I know using Amdahl's Law that 2.5% of kneighbors is not parallelized as we get up to ×40 speed-ups plateau.

If we get increased performances in this PR, it means that the slicing is likely part of the sequential portions. In this case, removing the slicing would allow getting the sequential portions bellow the 2.5% of the whole implementation (and I would like to reassess maximum speed-up again, but this can be done after this PR getting merged if significant improvement are observed).

@jjerphan
Copy link
Member Author

Most configurations aren't impacted but the ones with small datasets are sometimes slower or faster. 🤔

       before           after         ratio
     [b7f3fd68]       [7066a5aa]
     <main>           <benchmarks/maint/pdr-do-not-slice~1>
+        16.4±2ms         29.8±5ms     1.81  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_ArgKmin(1000, 10000, 100, 'euclidean', 'auto', <class 'numpy.float64'>, 'dense', 'dense')
+        22.7±2ms         32.5±6ms     1.43  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_ArgKmin(1000, 10000, 100, 'euclidean', 'auto', <class 'numpy.float32'>, 'dense', 'dense')
+        113±20ms         145±20ms     1.28  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_ArgKmin(1000, 1000, 100, 'euclidean', 'auto', <class 'numpy.float32'>, 'dense', 'dense')
+        119±30ms         140±20ms     1.17  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_ArgKmin(1000, 1000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float32'>, 'dense', 'dense')
-        620±50ms         550±40ms     0.89  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_ArgKmin(10000, 10000, 100, 'euclidean', 'auto', <class 'numpy.float64'>, 'dense', 'dense')
-        108±10ms        95.0±10ms     0.88  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_ArgKmin(10000, 1000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float32'>, 'dense', 'dense')
-        18.3±6ms         14.3±1ms     0.78  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_ArgKmin(1000, 10000, 100, 'euclidean', 'parallel_on_X', <class 'numpy.float64'>, 'dense', 'dense')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.
Full ASV results
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
·· Installing 7066a5aa <benchmarks/maint/pdr-do-not-slice> into conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
· Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)
[  0.00%] · For scikit-learn commit 7066a5aa <benchmarks/maint/pdr-do-not-slice> (round 1/1):
[  0.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
[ 50.00%] ··· ...iseDistancesReductionsBenchmark.time_ArgKmin        6/54 failed
[ 50.00%] ··· ========== ======== ============ =========== =============== =============== ========= ======== ============
               n_train    n_test   n_features     metric       strategy         dtype       X_train   X_test              
              ---------- -------- ------------ ----------- --------------- --------------- --------- -------- ------------
                 1000      1000       100       euclidean        auto       numpy.float32    dense    dense     145±20ms  
                 1000      1000       100       euclidean        auto       numpy.float64    dense    dense     136±30ms  
                 1000      1000       100       euclidean   parallel_on_X   numpy.float32    dense    dense     43.4±6ms  
                 1000      1000       100       euclidean   parallel_on_X   numpy.float64    dense    dense    72.9±20ms  
                 1000      1000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense     140±20ms  
                 1000      1000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     150±30ms  
                 1000     10000       100       euclidean        auto       numpy.float32    dense    dense     32.5±6ms  
                 1000     10000       100       euclidean        auto       numpy.float64    dense    dense     29.8±5ms  
                 1000     10000       100       euclidean   parallel_on_X   numpy.float32    dense    dense     30.8±6ms  
                 1000     10000       100       euclidean   parallel_on_X   numpy.float64    dense    dense     14.3±1ms  
                 1000     10000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense    828±200ms  
                 1000     10000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     809±50ms  
                 1000     100000      100       euclidean        auto       numpy.float32    dense    dense     69.3±1ms  
                 1000     100000      100       euclidean        auto       numpy.float64    dense    dense    36.5±0.4ms 
                 1000     100000      100       euclidean   parallel_on_X   numpy.float32    dense    dense    65.2±0.7ms 
                 1000     100000      100       euclidean   parallel_on_X   numpy.float64    dense    dense    36.5±0.4ms 
                 1000     100000      100       euclidean   parallel_on_Y   numpy.float32    dense    dense    10.0±0.8s  
                 1000     100000      100       euclidean   parallel_on_Y   numpy.float64    dense    dense    9.50±0.6s  
                10000      1000       100       euclidean        auto       numpy.float32    dense    dense     110±10ms  
                10000      1000       100       euclidean        auto       numpy.float64    dense    dense    96.8±10ms  
                10000      1000       100       euclidean   parallel_on_X   numpy.float32    dense    dense     56.5±3ms  
                10000      1000       100       euclidean   parallel_on_X   numpy.float64    dense    dense     51.6±5ms  
                10000      1000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense    95.0±10ms  
                10000      1000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     100±10ms  
                10000     10000       100       euclidean        auto       numpy.float32    dense    dense     614±50ms  
                10000     10000       100       euclidean        auto       numpy.float64    dense    dense     550±40ms  
                10000     10000       100       euclidean   parallel_on_X   numpy.float32    dense    dense     50.3±3ms  
                10000     10000       100       euclidean   parallel_on_X   numpy.float64    dense    dense     42.3±7ms  
                10000     10000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense     617±80ms  
                10000     10000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     512±40ms  
                10000     100000      100       euclidean        auto       numpy.float32    dense    dense     276±8ms   
                10000     100000      100       euclidean        auto       numpy.float64    dense    dense     263±10ms  
                10000     100000      100       euclidean   parallel_on_X   numpy.float32    dense    dense     285±10ms  
                10000     100000      100       euclidean   parallel_on_X   numpy.float64    dense    dense     262±10ms  
                10000     100000      100       euclidean   parallel_on_Y   numpy.float32    dense    dense    5.20±0.1s  
                10000     100000      100       euclidean   parallel_on_Y   numpy.float64    dense    dense    4.10±0.3s  
               10000000    1000       100       euclidean        auto       numpy.float32    dense    dense     2.23±0s   
               10000000    1000       100       euclidean        auto       numpy.float64    dense    dense     2.21±0s   
               10000000    1000       100       euclidean   parallel_on_X   numpy.float32    dense    dense    21.3±0.03s 
               10000000    1000       100       euclidean   parallel_on_X   numpy.float64    dense    dense    21.5±0.01s 
               10000000    1000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense    2.26±0.03s 
               10000000    1000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense    2.21±0.03s 
               10000000   10000       100       euclidean        auto       numpy.float32    dense    dense    22.7±0.04s 
               10000000   10000       100       euclidean        auto       numpy.float64    dense    dense    22.6±0.07s 
               10000000   10000       100       euclidean   parallel_on_X   numpy.float32    dense    dense    26.1±0.02s 
               10000000   10000       100       euclidean   parallel_on_X   numpy.float64    dense    dense    25.8±0.06s 
               10000000   10000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense    22.7±0.07s 
               10000000   10000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense    22.5±0.07s 
               10000000   100000      100       euclidean        auto       numpy.float32    dense    dense      failed   
               10000000   100000      100       euclidean        auto       numpy.float64    dense    dense      failed   
               10000000   100000      100       euclidean   parallel_on_X   numpy.float32    dense    dense      failed   
               10000000   100000      100       euclidean   parallel_on_X   numpy.float64    dense    dense      failed   
               10000000   100000      100       euclidean   parallel_on_Y   numpy.float32    dense    dense      failed   
               10000000   100000      100       euclidean   parallel_on_Y   numpy.float64    dense    dense      failed   
              ========== ======== ============ =========== =============== =============== ========= ======== ============

[ 50.00%] ···· For parameters: 10000000, 100000, 100, 'euclidean', 'auto', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'auto', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_X', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_X', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)

[ 50.00%] · For scikit-learn commit b7f3fd68 <main> (round 1/1):
[ 50.00%] ·· Building for conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
[ 50.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
[100.00%] ··· ...iseDistancesReductionsBenchmark.time_ArgKmin        6/54 failed
[100.00%] ··· ========== ======== ============ =========== =============== =============== ========= ======== ============
               n_train    n_test   n_features     metric       strategy         dtype       X_train   X_test              
              ---------- -------- ------------ ----------- --------------- --------------- --------- -------- ------------
                 1000      1000       100       euclidean        auto       numpy.float32    dense    dense     113±20ms  
                 1000      1000       100       euclidean        auto       numpy.float64    dense    dense     148±30ms  
                 1000      1000       100       euclidean   parallel_on_X   numpy.float32    dense    dense     47.4±3ms  
                 1000      1000       100       euclidean   parallel_on_X   numpy.float64    dense    dense    60.1±20ms  
                 1000      1000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense     119±30ms  
                 1000      1000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     158±30ms  
                 1000     10000       100       euclidean        auto       numpy.float32    dense    dense     22.7±2ms  
                 1000     10000       100       euclidean        auto       numpy.float64    dense    dense     16.4±2ms  
                 1000     10000       100       euclidean   parallel_on_X   numpy.float32    dense    dense     28.7±6ms  
                 1000     10000       100       euclidean   parallel_on_X   numpy.float64    dense    dense     18.3±6ms  
                 1000     10000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense    1.01±0.2s  
                 1000     10000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     739±40ms  
                 1000     100000      100       euclidean        auto       numpy.float32    dense    dense     65.3±1ms  
                 1000     100000      100       euclidean        auto       numpy.float64    dense    dense    36.8±0.3ms 
                 1000     100000      100       euclidean   parallel_on_X   numpy.float32    dense    dense    64.6±0.5ms 
                 1000     100000      100       euclidean   parallel_on_X   numpy.float64    dense    dense    36.6±0.3ms 
                 1000     100000      100       euclidean   parallel_on_Y   numpy.float32    dense    dense    10.4±0.05s 
                 1000     100000      100       euclidean   parallel_on_Y   numpy.float64    dense    dense    7.80±0.8s  
                10000      1000       100       euclidean        auto       numpy.float32    dense    dense     109±20ms  
                10000      1000       100       euclidean        auto       numpy.float64    dense    dense     108±20ms  
                10000      1000       100       euclidean   parallel_on_X   numpy.float32    dense    dense     51.8±3ms  
                10000      1000       100       euclidean   parallel_on_X   numpy.float64    dense    dense     51.6±5ms  
                10000      1000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense     108±10ms  
                10000      1000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     101±10ms  
                10000     10000       100       euclidean        auto       numpy.float32    dense    dense     570±60ms  
                10000     10000       100       euclidean        auto       numpy.float64    dense    dense     620±50ms  
                10000     10000       100       euclidean   parallel_on_X   numpy.float32    dense    dense     50.7±3ms  
                10000     10000       100       euclidean   parallel_on_X   numpy.float64    dense    dense    40.6±0.7ms 
                10000     10000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense     581±40ms  
                10000     10000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     515±60ms  
                10000     100000      100       euclidean        auto       numpy.float32    dense    dense     282±10ms  
                10000     100000      100       euclidean        auto       numpy.float64    dense    dense     266±10ms  
                10000     100000      100       euclidean   parallel_on_X   numpy.float32    dense    dense     285±20ms  
                10000     100000      100       euclidean   parallel_on_X   numpy.float64    dense    dense     264±10ms  
                10000     100000      100       euclidean   parallel_on_Y   numpy.float32    dense    dense    4.94±0.6s  
                10000     100000      100       euclidean   parallel_on_Y   numpy.float64    dense    dense    4.89±0.4s  
               10000000    1000       100       euclidean        auto       numpy.float32    dense    dense     2.31±0s   
               10000000    1000       100       euclidean        auto       numpy.float64    dense    dense    2.29±0.02s 
               10000000    1000       100       euclidean   parallel_on_X   numpy.float32    dense    dense    21.7±0.04s 
               10000000    1000       100       euclidean   parallel_on_X   numpy.float64    dense    dense    21.7±0.03s 
               10000000    1000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense    2.33±0.03s 
               10000000    1000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense    2.32±0.03s 
               10000000   10000       100       euclidean        auto       numpy.float32    dense    dense    22.5±0.05s 
               10000000   10000       100       euclidean        auto       numpy.float64    dense    dense    22.6±0.04s 
               10000000   10000       100       euclidean   parallel_on_X   numpy.float32    dense    dense    26.4±0.1s  
               10000000   10000       100       euclidean   parallel_on_X   numpy.float64    dense    dense    25.9±0.04s 
               10000000   10000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense    22.5±0.08s 
               10000000   10000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense    22.7±0.05s 
               10000000   100000      100       euclidean        auto       numpy.float32    dense    dense      failed   
               10000000   100000      100       euclidean        auto       numpy.float64    dense    dense      failed   
               10000000   100000      100       euclidean   parallel_on_X   numpy.float32    dense    dense      failed   
               10000000   100000      100       euclidean   parallel_on_X   numpy.float64    dense    dense      failed   
               10000000   100000      100       euclidean   parallel_on_Y   numpy.float32    dense    dense      failed   
               10000000   100000      100       euclidean   parallel_on_Y   numpy.float64    dense    dense      failed   
              ========== ======== ============ =========== =============== =============== ========= ======== ============

[100.00%] ···· For parameters: 10000000, 100000, 100, 'euclidean', 'auto', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'auto', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_X', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_X', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)

       before           after         ratio
     [b7f3fd68]       [7066a5aa]
     <main>           <benchmarks/maint/pdr-do-not-slice~1>
+        16.4±2ms         29.8±5ms     1.81  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_ArgKmin(1000, 10000, 100, 'euclidean', 'auto', <class 'numpy.float64'>, 'dense', 'dense')
+        22.7±2ms         32.5±6ms     1.43  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_ArgKmin(1000, 10000, 100, 'euclidean', 'auto', <class 'numpy.float32'>, 'dense', 'dense')
+        113±20ms         145±20ms     1.28  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_ArgKmin(1000, 1000, 100, 'euclidean', 'auto', <class 'numpy.float32'>, 'dense', 'dense')
+        119±30ms         140±20ms     1.17  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_ArgKmin(1000, 1000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float32'>, 'dense', 'dense')
-        620±50ms         550±40ms     0.89  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_ArgKmin(10000, 10000, 100, 'euclidean', 'auto', <class 'numpy.float64'>, 'dense', 'dense')
-        108±10ms        95.0±10ms     0.88  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_ArgKmin(10000, 1000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float32'>, 'dense', 'dense')
-        18.3±6ms         14.3±1ms     0.78  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_ArgKmin(1000, 10000, 100, 'euclidean', 'parallel_on_X', <class 'numpy.float64'>, 'dense', 'dense')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

@jjerphan jjerphan marked this pull request as ready for review November 4, 2022 10:46
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this one. I think the previous code was fine and did a few less arithmetic operations (substractions) compared to the new one.

Furthermore I don't see the relation with #17299 which is about Cython function calls on views. Here there is no Cython function call on memory views, right?

It's only about manual pointer arithmetic and dereferencing before calling a function with point arguments.

Feel free to merge if you believe it's an improvement but I am not convinced myself.

I am confused by the ASV results. Maybe you can confirm that this does not significantly change the performance when using a lower number of threads?

@jjerphan
Copy link
Member Author

jjerphan commented Nov 17, 2022

It is more than slicing comes with a sets of extra instructions that are more costly than pointer arithmetic.

See the red lines on the difference between generated sources bellow which calls internals of Cython for memoryview/memoryviews slices creation and reference counting prefixed by __Pyx and __PYX.

Source code generated on `main` @ 68a7427
cat a.cpp
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":219
 *         return
 *
 *     cdef DTYPE_t * _compute_dist_middle_terms(             # <<<<<<<<<<<<<<
 *         self,
 *         ITYPE_t X_start,
 */

static __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_f_7sklearn_7metrics_29_pairwise_distances_reduction_21_middle_term_computer_30DenseDenseMiddleTermComputer64__compute_dist_middle_terms(struct __pyx_obj_7sklearn_7metrics_29_pairwise_distances_reduction_21_middle_term_computer_DenseDenseMiddleTermComputer64 *__pyx_v_self, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_X_start, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_X_end, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_Y_start, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_Y_end, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_thread_num) {
  __Pyx_memviewslice __pyx_v_X_c = { 0, 0, { 0 }, { 0 }, { 0 } };
  __Pyx_memviewslice __pyx_v_Y_c = { 0, 0, { 0 }, { 0 }, { 0 } };
  __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_v_dist_middle_terms;
  enum __pyx_t_7sklearn_5utils_12_cython_blas_BLAS_Order __pyx_v_order;
  enum __pyx_t_7sklearn_5utils_12_cython_blas_BLAS_Trans __pyx_v_ta;
  enum __pyx_t_7sklearn_5utils_12_cython_blas_BLAS_Trans __pyx_v_tb;
  __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_m;
  __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_n;
  __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_K;
  __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t __pyx_v_alpha;
  __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_v_A;
  __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_v_B;
  __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_lda;
  __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_ldb;
  __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t __pyx_v_beta;
  __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_ldc;
  __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_r;
  __Pyx_memviewslice __pyx_t_1 = { 0, 0, { 0 }, { 0 }, { 0 } };
  int __pyx_t_2;
  __Pyx_memviewslice __pyx_t_3 = { 0, 0, { 0 }, { 0 }, { 0 } };
  Py_ssize_t __pyx_t_4;
  Py_ssize_t __pyx_t_5;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":228
 *     ) nogil:
 *         cdef:
 *             const DTYPE_t[:, ::1] X_c = self.X[X_start:X_end, :]             # <<<<<<<<<<<<<<
 *             const DTYPE_t[:, ::1] Y_c = self.Y[Y_start:Y_end, :]
 *             DTYPE_t *dist_middle_terms = self.dist_middle_terms_chunks[thread_num].data()
 */
  __pyx_t_1.data = __pyx_v_self->X.data;
  __pyx_t_1.memview = __pyx_v_self->X.memview;
  __PYX_INC_MEMVIEW(&__pyx_t_1, 0);
  __pyx_t_2 = -1;
  if (unlikely(__pyx_memoryview_slice_memviewslice(
    &__pyx_t_1,
    __pyx_v_self->X.shape[0], __pyx_v_self->X.strides[0], __pyx_v_self->X.suboffsets[0],
    0,
    0,
    &__pyx_t_2,
    __pyx_v_X_start,
    __pyx_v_X_end,
    0,
    1,
    1,
    0,
    1) < 0))
{
    __PYX_ERR(0, 228, __pyx_L1_error)
}

__pyx_t_1.shape[1] = __pyx_v_self->X.shape[1];
__pyx_t_1.strides[1] = __pyx_v_self->X.strides[1];
    __pyx_t_1.suboffsets[1] = -1;

__pyx_v_X_c = __pyx_t_1;
  __pyx_t_1.memview = NULL;
  __pyx_t_1.data = NULL;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":229
 *         cdef:
 *             const DTYPE_t[:, ::1] X_c = self.X[X_start:X_end, :]
 *             const DTYPE_t[:, ::1] Y_c = self.Y[Y_start:Y_end, :]             # <<<<<<<<<<<<<<
 *             DTYPE_t *dist_middle_terms = self.dist_middle_terms_chunks[thread_num].data()
 *
 */
  __pyx_t_3.data = __pyx_v_self->Y.data;
  __pyx_t_3.memview = __pyx_v_self->Y.memview;
  __PYX_INC_MEMVIEW(&__pyx_t_3, 0);
  __pyx_t_2 = -1;
  if (unlikely(__pyx_memoryview_slice_memviewslice(
    &__pyx_t_3,
    __pyx_v_self->Y.shape[0], __pyx_v_self->Y.strides[0], __pyx_v_self->Y.suboffsets[0],
    0,
    0,
    &__pyx_t_2,
    __pyx_v_Y_start,
    __pyx_v_Y_end,
    0,
    1,
    1,
    0,
    1) < 0))
{
    __PYX_ERR(0, 229, __pyx_L1_error)
}

__pyx_t_3.shape[1] = __pyx_v_self->Y.shape[1];
__pyx_t_3.strides[1] = __pyx_v_self->Y.strides[1];
    __pyx_t_3.suboffsets[1] = -1;

__pyx_v_Y_c = __pyx_t_3;
  __pyx_t_3.memview = NULL;
  __pyx_t_3.data = NULL;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":230
 *             const DTYPE_t[:, ::1] X_c = self.X[X_start:X_end, :]
 *             const DTYPE_t[:, ::1] Y_c = self.Y[Y_start:Y_end, :]
 *             DTYPE_t *dist_middle_terms = self.dist_middle_terms_chunks[thread_num].data()             # <<<<<<<<<<<<<<
 *
 *             # Careful: LDA, LDB and LDC are given for F-ordered arrays
 */
  __pyx_v_dist_middle_terms = (__pyx_v_self->__pyx_base.dist_middle_terms_chunks[__pyx_v_thread_num]).data();

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":237
 *             #
 *             # Here, we use their counterpart values to work with C-ordered arrays.
 *             BLAS_Order order = RowMajor             # <<<<<<<<<<<<<<
 *             BLAS_Trans ta = NoTrans
 *             BLAS_Trans tb = Trans
 */
  __pyx_v_order = __pyx_e_7sklearn_5utils_12_cython_blas_RowMajor;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":238
 *             # Here, we use their counterpart values to work with C-ordered arrays.
 *             BLAS_Order order = RowMajor
 *             BLAS_Trans ta = NoTrans             # <<<<<<<<<<<<<<
 *             BLAS_Trans tb = Trans
 *             ITYPE_t m = X_c.shape[0]
 */
  __pyx_v_ta = __pyx_e_7sklearn_5utils_12_cython_blas_NoTrans;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":239
 *             BLAS_Order order = RowMajor
 *             BLAS_Trans ta = NoTrans
 *             BLAS_Trans tb = Trans             # <<<<<<<<<<<<<<
 *             ITYPE_t m = X_c.shape[0]
 *             ITYPE_t n = Y_c.shape[0]
 */
  __pyx_v_tb = __pyx_e_7sklearn_5utils_12_cython_blas_Trans;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":240
 *             BLAS_Trans ta = NoTrans
 *             BLAS_Trans tb = Trans
 *             ITYPE_t m = X_c.shape[0]             # <<<<<<<<<<<<<<
 *             ITYPE_t n = Y_c.shape[0]
 *             ITYPE_t K = X_c.shape[1]
 */
  __pyx_v_m = (__pyx_v_X_c.shape[0]);

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":241
 *             BLAS_Trans tb = Trans
 *             ITYPE_t m = X_c.shape[0]
 *             ITYPE_t n = Y_c.shape[0]             # <<<<<<<<<<<<<<
 *             ITYPE_t K = X_c.shape[1]
 *             DTYPE_t alpha = - 2.
 */
  __pyx_v_n = (__pyx_v_Y_c.shape[0]);

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":242
 *             ITYPE_t m = X_c.shape[0]
 *             ITYPE_t n = Y_c.shape[0]
 *             ITYPE_t K = X_c.shape[1]             # <<<<<<<<<<<<<<
 *             DTYPE_t alpha = - 2.
 *             # Casting for A and B to remove the const is needed because APIs exposed via
 */
  __pyx_v_K = (__pyx_v_X_c.shape[1]);

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":243
 *             ITYPE_t n = Y_c.shape[0]
 *             ITYPE_t K = X_c.shape[1]
 *             DTYPE_t alpha = - 2.             # <<<<<<<<<<<<<<
 *             # Casting for A and B to remove the const is needed because APIs exposed via
 *             # scipy.linalg.cython_blas aren't reflecting the arguments' const qualifier.
 */
  __pyx_v_alpha = -2.;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":247
 *             # scipy.linalg.cython_blas aren't reflecting the arguments' const qualifier.
 *             # See: https://github.com/scipy/scipy/issues/14262
 *             DTYPE_t * A = <DTYPE_t *> &X_c[0, 0]             # <<<<<<<<<<<<<<
 *             DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0]
 *             ITYPE_t lda = X_c.shape[1]
 */
  __pyx_t_4 = 0;
  __pyx_t_5 = 0;
  __pyx_v_A = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=0 */ (__pyx_v_X_c.data + __pyx_t_4 * __pyx_v_X_c.strides[0]) )) + __pyx_t_5)) )))));

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":248
 *             # See: https://github.com/scipy/scipy/issues/14262
 *             DTYPE_t * A = <DTYPE_t *> &X_c[0, 0]
 *             DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0]             # <<<<<<<<<<<<<<
 *             ITYPE_t lda = X_c.shape[1]
 *             ITYPE_t ldb = X_c.shape[1]
 */
  __pyx_t_5 = 0;
  __pyx_t_4 = 0;
  __pyx_v_B = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=0 */ (__pyx_v_Y_c.data + __pyx_t_5 * __pyx_v_Y_c.strides[0]) )) + __pyx_t_4)) )))));

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":249
 *             DTYPE_t * A = <DTYPE_t *> &X_c[0, 0]
 *             DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0]
 *             ITYPE_t lda = X_c.shape[1]             # <<<<<<<<<<<<<<
 *             ITYPE_t ldb = X_c.shape[1]
 *             DTYPE_t beta = 0.
 */
  __pyx_v_lda = (__pyx_v_X_c.shape[1]);

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":250
 *             DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0]
 *             ITYPE_t lda = X_c.shape[1]
 *             ITYPE_t ldb = X_c.shape[1]             # <<<<<<<<<<<<<<
 *             DTYPE_t beta = 0.
 *             ITYPE_t ldc = Y_c.shape[0]
 */
  __pyx_v_ldb = (__pyx_v_X_c.shape[1]);

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":251
 *             ITYPE_t lda = X_c.shape[1]
 *             ITYPE_t ldb = X_c.shape[1]
 *             DTYPE_t beta = 0.             # <<<<<<<<<<<<<<
 *             ITYPE_t ldc = Y_c.shape[0]
 *
 */
  __pyx_v_beta = 0.;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":252
 *             ITYPE_t ldb = X_c.shape[1]
 *             DTYPE_t beta = 0.
 *             ITYPE_t ldc = Y_c.shape[0]             # <<<<<<<<<<<<<<
 *
 *         # dist_middle_terms = `-2 * X_c @ Y_c.T`
 */
  __pyx_v_ldc = (__pyx_v_Y_c.shape[0]);

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":255
 *
 *         # dist_middle_terms = `-2 * X_c @ Y_c.T`
 *         _gemm(order, ta, tb, m, n, K, alpha, A, lda, B, ldb, beta, dist_middle_terms, ldc)             # <<<<<<<<<<<<<<
 *
 *         return dist_middle_terms
 */
  __pyx_fuse_1__pyx_f_7sklearn_5utils_12_cython_blas__gemm(__pyx_v_order, __pyx_v_ta, __pyx_v_tb, __pyx_v_m, __pyx_v_n, __pyx_v_K, __pyx_v_alpha, __pyx_v_A, __pyx_v_lda, __pyx_v_B, __pyx_v_ldb, __pyx_v_beta, __pyx_v_dist_middle_terms, __pyx_v_ldc);

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":257
 *         _gemm(order, ta, tb, m, n, K, alpha, A, lda, B, ldb, beta, dist_middle_terms, ldc)
 *
 *         return dist_middle_terms             # <<<<<<<<<<<<<<
 *
 *
 */
  __pyx_r = __pyx_v_dist_middle_terms;
  goto __pyx_L0;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":219
 *         return
 *
 *     cdef DTYPE_t * _compute_dist_middle_terms(             # <<<<<<<<<<<<<<
 *         self,
 *         ITYPE_t X_start,
 */

  /* function exit code */
  __pyx_L1_error:;
  __PYX_XDEC_MEMVIEW(&__pyx_t_1, 0);
  __PYX_XDEC_MEMVIEW(&__pyx_t_3, 0);
  __Pyx_WriteUnraisable("sklearn.metrics._pairwise_distances_reduction._middle_term_computer.DenseDenseMiddleTermComputer64._compute_dist_middle_terms", __pyx_clineno, __pyx_lineno, __pyx_filename, 1, 1);
  __pyx_r = 0;
  __pyx_L0:;
  __PYX_XDEC_MEMVIEW(&__pyx_v_X_c, 0);
  __PYX_XDEC_MEMVIEW(&__pyx_v_Y_c, 0);
  return __pyx_r;
}
Source code generated on this PR @ 7705579
cat gh-24715_extract.cpp
/* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":219
 *         return
 *
 *     cdef DTYPE_t * _compute_dist_middle_terms(             # <<<<<<<<<<<<<<
 *         self,
 *         ITYPE_t X_start,
 */

static __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_f_7sklearn_7metrics_29_pairwise_distances_reduction_21_middle_term_computer_30DenseDenseMiddleTermComputer64__compute_dist_middle_terms(struct __pyx_obj_7sklearn_7metrics_29_pairwise_distances_reduction_21_middle_term_computer_DenseDenseMiddleTermComputer64 *__pyx_v_self, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_X_start, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_X_end, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_Y_start, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_Y_end, __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_thread_num) {
  __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_v_dist_middle_terms;
  enum __pyx_t_7sklearn_5utils_12_cython_blas_BLAS_Order __pyx_v_order;
  enum __pyx_t_7sklearn_5utils_12_cython_blas_BLAS_Trans __pyx_v_ta;
  enum __pyx_t_7sklearn_5utils_12_cython_blas_BLAS_Trans __pyx_v_tb;
  __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_m;
  __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_n;
  __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_K;
  __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t __pyx_v_alpha;
  __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_v_A;
  __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_v_B;
  __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_lda;
  __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_ldb;
  __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t __pyx_v_beta;
  __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_v_ldc;
  __pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *__pyx_r;
  __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_t_1;
  Py_ssize_t __pyx_t_2;
  Py_ssize_t __pyx_t_3;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":228
 *     ) nogil:
 *         cdef:
 *             DTYPE_t *dist_middle_terms = self.dist_middle_terms_chunks[thread_num].data()             # <<<<<<<<<<<<<<
 *
 *             # Careful: LDA, LDB and LDC are given for F-ordered arrays
 */
  __pyx_v_dist_middle_terms = (__pyx_v_self->__pyx_base.dist_middle_terms_chunks[__pyx_v_thread_num]).data();

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":235
 *             #
 *             # Here, we use their counterpart values to work with C-ordered arrays.
 *             BLAS_Order order = RowMajor             # <<<<<<<<<<<<<<
 *             BLAS_Trans ta = NoTrans
 *             BLAS_Trans tb = Trans
 */
  __pyx_v_order = __pyx_e_7sklearn_5utils_12_cython_blas_RowMajor;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":236
 *             # Here, we use their counterpart values to work with C-ordered arrays.
 *             BLAS_Order order = RowMajor
 *             BLAS_Trans ta = NoTrans             # <<<<<<<<<<<<<<
 *             BLAS_Trans tb = Trans
 *             ITYPE_t m = X_end - X_start
 */
  __pyx_v_ta = __pyx_e_7sklearn_5utils_12_cython_blas_NoTrans;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":237
 *             BLAS_Order order = RowMajor
 *             BLAS_Trans ta = NoTrans
 *             BLAS_Trans tb = Trans             # <<<<<<<<<<<<<<
 *             ITYPE_t m = X_end - X_start
 *             ITYPE_t n = Y_end - Y_start
 */
  __pyx_v_tb = __pyx_e_7sklearn_5utils_12_cython_blas_Trans;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":238
 *             BLAS_Trans ta = NoTrans
 *             BLAS_Trans tb = Trans
 *             ITYPE_t m = X_end - X_start             # <<<<<<<<<<<<<<
 *             ITYPE_t n = Y_end - Y_start
 *             ITYPE_t K = self.n_features
 */
  __pyx_v_m = (__pyx_v_X_end - __pyx_v_X_start);

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":239
 *             BLAS_Trans tb = Trans
 *             ITYPE_t m = X_end - X_start
 *             ITYPE_t n = Y_end - Y_start             # <<<<<<<<<<<<<<
 *             ITYPE_t K = self.n_features
 *             DTYPE_t alpha = - 2.
 */
  __pyx_v_n = (__pyx_v_Y_end - __pyx_v_Y_start);

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":240
 *             ITYPE_t m = X_end - X_start
 *             ITYPE_t n = Y_end - Y_start
 *             ITYPE_t K = self.n_features             # <<<<<<<<<<<<<<
 *             DTYPE_t alpha = - 2.
 *             # Casting for A and B to remove the const is needed because APIs exposed via
 */
  __pyx_t_1 = __pyx_v_self->__pyx_base.n_features;
  __pyx_v_K = __pyx_t_1;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":241
 *             ITYPE_t n = Y_end - Y_start
 *             ITYPE_t K = self.n_features
 *             DTYPE_t alpha = - 2.             # <<<<<<<<<<<<<<
 *             # Casting for A and B to remove the const is needed because APIs exposed via
 *             # scipy.linalg.cython_blas aren't reflecting the arguments' const qualifier.
 */
  __pyx_v_alpha = -2.;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":245
 *             # scipy.linalg.cython_blas aren't reflecting the arguments' const qualifier.
 *             # See: https://github.com/scipy/scipy/issues/14262
 *             DTYPE_t * A = <DTYPE_t *> &self.X[X_start, 0]             # <<<<<<<<<<<<<<
 *             DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0]
 *             ITYPE_t lda = self.n_features
 */
  __pyx_t_2 = __pyx_v_X_start;
  __pyx_t_3 = 0;
  __pyx_v_A = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=0 */ (__pyx_v_self->X.data + __pyx_t_2 * __pyx_v_self->X.strides[0]) )) + __pyx_t_3)) )))));

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":246
 *             # See: https://github.com/scipy/scipy/issues/14262
 *             DTYPE_t * A = <DTYPE_t *> &self.X[X_start, 0]
 *             DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0]             # <<<<<<<<<<<<<<
 *             ITYPE_t lda = self.n_features
 *             ITYPE_t ldb = self.n_features
 */
  __pyx_t_3 = __pyx_v_Y_start;
  __pyx_t_2 = 0;
  __pyx_v_B = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=0 */ (__pyx_v_self->Y.data + __pyx_t_3 * __pyx_v_self->Y.strides[0]) )) + __pyx_t_2)) )))));

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":247
 *             DTYPE_t * A = <DTYPE_t *> &self.X[X_start, 0]
 *             DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0]
 *             ITYPE_t lda = self.n_features             # <<<<<<<<<<<<<<
 *             ITYPE_t ldb = self.n_features
 *             DTYPE_t beta = 0.
 */
  __pyx_t_1 = __pyx_v_self->__pyx_base.n_features;
  __pyx_v_lda = __pyx_t_1;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":248
 *             DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0]
 *             ITYPE_t lda = self.n_features
 *             ITYPE_t ldb = self.n_features             # <<<<<<<<<<<<<<
 *             DTYPE_t beta = 0.
 *             ITYPE_t ldc = Y_end - Y_start
 */
  __pyx_t_1 = __pyx_v_self->__pyx_base.n_features;
  __pyx_v_ldb = __pyx_t_1;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":249
 *             ITYPE_t lda = self.n_features
 *             ITYPE_t ldb = self.n_features
 *             DTYPE_t beta = 0.             # <<<<<<<<<<<<<<
 *             ITYPE_t ldc = Y_end - Y_start
 *
 */
  __pyx_v_beta = 0.;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":250
 *             ITYPE_t ldb = self.n_features
 *             DTYPE_t beta = 0.
 *             ITYPE_t ldc = Y_end - Y_start             # <<<<<<<<<<<<<<
 *
 *         # dist_middle_terms = `-2 * X[X_start:X_end] @ Y[Y_start:Y_end].T`
 */
  __pyx_v_ldc = (__pyx_v_Y_end - __pyx_v_Y_start);

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":253
 *
 *         # dist_middle_terms = `-2 * X[X_start:X_end] @ Y[Y_start:Y_end].T`
 *         _gemm(order, ta, tb, m, n, K, alpha, A, lda, B, ldb, beta, dist_middle_terms, ldc)             # <<<<<<<<<<<<<<
 *
 *         return dist_middle_terms
 */
  __pyx_fuse_1__pyx_f_7sklearn_5utils_12_cython_blas__gemm(__pyx_v_order, __pyx_v_ta, __pyx_v_tb, __pyx_v_m, __pyx_v_n, __pyx_v_K, __pyx_v_alpha, __pyx_v_A, __pyx_v_lda, __pyx_v_B, __pyx_v_ldb, __pyx_v_beta, __pyx_v_dist_middle_terms, __pyx_v_ldc);

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":255
 *         _gemm(order, ta, tb, m, n, K, alpha, A, lda, B, ldb, beta, dist_middle_terms, ldc)
 *
 *         return dist_middle_terms             # <<<<<<<<<<<<<<
 *
 *
 */
  __pyx_r = __pyx_v_dist_middle_terms;
  goto __pyx_L0;

  /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":219
 *         return
 *
 *     cdef DTYPE_t * _compute_dist_middle_terms(             # <<<<<<<<<<<<<<
 *         self,
 *         ITYPE_t X_start,
 */

  /* function exit code */
  __pyx_L0:;
  return __pyx_r;
}
Difference between both
diff main_extract.cpp gh-24715_extract.cpp
10,11d9
<   __Pyx_memviewslice __pyx_v_X_c = { 0, 0, { 0 }, { 0 }, { 0 } };
<   __Pyx_memviewslice __pyx_v_Y_c = { 0, 0, { 0 }, { 0 }, { 0 } };
27,34c25,27
<   __Pyx_memviewslice __pyx_t_1 = { 0, 0, { 0 }, { 0 }, { 0 } };
<   int __pyx_t_2;
<   __Pyx_memviewslice __pyx_t_3 = { 0, 0, { 0 }, { 0 }, { 0 } };
<   Py_ssize_t __pyx_t_4;
<   Py_ssize_t __pyx_t_5;
<   int __pyx_lineno = 0;
<   const char *__pyx_filename = NULL;
<   int __pyx_clineno = 0;
---
>   __pyx_t_7sklearn_5utils_9_typedefs_ITYPE_t __pyx_t_1;
>   Py_ssize_t __pyx_t_2;
>   Py_ssize_t __pyx_t_3;
39,110d31
<  *             const DTYPE_t[:, ::1] X_c = self.X[X_start:X_end, :]             # <<<<<<<<<<<<<<
<  *             const DTYPE_t[:, ::1] Y_c = self.Y[Y_start:Y_end, :]
<  *             DTYPE_t *dist_middle_terms = self.dist_middle_terms_chunks[thread_num].data()
<  */
<   __pyx_t_1.data = __pyx_v_self->X.data;
<   __pyx_t_1.memview = __pyx_v_self->X.memview;
<   __PYX_INC_MEMVIEW(&__pyx_t_1, 0);
<   __pyx_t_2 = -1;
<   if (unlikely(__pyx_memoryview_slice_memviewslice(
<     &__pyx_t_1,
<     __pyx_v_self->X.shape[0], __pyx_v_self->X.strides[0], __pyx_v_self->X.suboffsets[0],
<     0,
<     0,
<     &__pyx_t_2,
<     __pyx_v_X_start,
<     __pyx_v_X_end,
<     0,
<     1,
<     1,
<     0,
<     1) < 0))
< {
<     __PYX_ERR(0, 228, __pyx_L1_error)
< }
< 
< __pyx_t_1.shape[1] = __pyx_v_self->X.shape[1];
< __pyx_t_1.strides[1] = __pyx_v_self->X.strides[1];
<     __pyx_t_1.suboffsets[1] = -1;
< 
< __pyx_v_X_c = __pyx_t_1;
<   __pyx_t_1.memview = NULL;
<   __pyx_t_1.data = NULL;
< 
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":229
<  *         cdef:
<  *             const DTYPE_t[:, ::1] X_c = self.X[X_start:X_end, :]
<  *             const DTYPE_t[:, ::1] Y_c = self.Y[Y_start:Y_end, :]             # <<<<<<<<<<<<<<
<  *             DTYPE_t *dist_middle_terms = self.dist_middle_terms_chunks[thread_num].data()
<  *
<  */
<   __pyx_t_3.data = __pyx_v_self->Y.data;
<   __pyx_t_3.memview = __pyx_v_self->Y.memview;
<   __PYX_INC_MEMVIEW(&__pyx_t_3, 0);
<   __pyx_t_2 = -1;
<   if (unlikely(__pyx_memoryview_slice_memviewslice(
<     &__pyx_t_3,
<     __pyx_v_self->Y.shape[0], __pyx_v_self->Y.strides[0], __pyx_v_self->Y.suboffsets[0],
<     0,
<     0,
<     &__pyx_t_2,
<     __pyx_v_Y_start,
<     __pyx_v_Y_end,
<     0,
<     1,
<     1,
<     0,
<     1) < 0))
< {
<     __PYX_ERR(0, 229, __pyx_L1_error)
< }
< 
< __pyx_t_3.shape[1] = __pyx_v_self->Y.shape[1];
< __pyx_t_3.strides[1] = __pyx_v_self->Y.strides[1];
<     __pyx_t_3.suboffsets[1] = -1;
< 
< __pyx_v_Y_c = __pyx_t_3;
<   __pyx_t_3.memview = NULL;
<   __pyx_t_3.data = NULL;
< 
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":230
<  *             const DTYPE_t[:, ::1] X_c = self.X[X_start:X_end, :]
<  *             const DTYPE_t[:, ::1] Y_c = self.Y[Y_start:Y_end, :]
117c38
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":237
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":235
126c47
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":238
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":236
131c52
<  *             ITYPE_t m = X_c.shape[0]
---
>  *             ITYPE_t m = X_end - X_start
135c56
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":239
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":237
139,140c60,61
<  *             ITYPE_t m = X_c.shape[0]
<  *             ITYPE_t n = Y_c.shape[0]
---
>  *             ITYPE_t m = X_end - X_start
>  *             ITYPE_t n = Y_end - Y_start
144c65
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":240
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":238
147,149c68,70
<  *             ITYPE_t m = X_c.shape[0]             # <<<<<<<<<<<<<<
<  *             ITYPE_t n = Y_c.shape[0]
<  *             ITYPE_t K = X_c.shape[1]
---
>  *             ITYPE_t m = X_end - X_start             # <<<<<<<<<<<<<<
>  *             ITYPE_t n = Y_end - Y_start
>  *             ITYPE_t K = self.n_features
151c72
<   __pyx_v_m = (__pyx_v_X_c.shape[0]);
---
>   __pyx_v_m = (__pyx_v_X_end - __pyx_v_X_start);
153c74
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":241
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":239
155,157c76,78
<  *             ITYPE_t m = X_c.shape[0]
<  *             ITYPE_t n = Y_c.shape[0]             # <<<<<<<<<<<<<<
<  *             ITYPE_t K = X_c.shape[1]
---
>  *             ITYPE_t m = X_end - X_start
>  *             ITYPE_t n = Y_end - Y_start             # <<<<<<<<<<<<<<
>  *             ITYPE_t K = self.n_features
160c81
<   __pyx_v_n = (__pyx_v_Y_c.shape[0]);
---
>   __pyx_v_n = (__pyx_v_Y_end - __pyx_v_Y_start);
162,165c83,86
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":242
<  *             ITYPE_t m = X_c.shape[0]
<  *             ITYPE_t n = Y_c.shape[0]
<  *             ITYPE_t K = X_c.shape[1]             # <<<<<<<<<<<<<<
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":240
>  *             ITYPE_t m = X_end - X_start
>  *             ITYPE_t n = Y_end - Y_start
>  *             ITYPE_t K = self.n_features             # <<<<<<<<<<<<<<
169c90,91
<   __pyx_v_K = (__pyx_v_X_c.shape[1]);
---
>   __pyx_t_1 = __pyx_v_self->__pyx_base.n_features;
>   __pyx_v_K = __pyx_t_1;
171,173c93,95
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":243
<  *             ITYPE_t n = Y_c.shape[0]
<  *             ITYPE_t K = X_c.shape[1]
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":241
>  *             ITYPE_t n = Y_end - Y_start
>  *             ITYPE_t K = self.n_features
180c102
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":247
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":245
183,189c105,111
<  *             DTYPE_t * A = <DTYPE_t *> &X_c[0, 0]             # <<<<<<<<<<<<<<
<  *             DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0]
<  *             ITYPE_t lda = X_c.shape[1]
<  */
<   __pyx_t_4 = 0;
<   __pyx_t_5 = 0;
<   __pyx_v_A = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=0 */ (__pyx_v_X_c.data + __pyx_t_4 * __pyx_v_X_c.strides[0]) )) + __pyx_t_5)) )))));
---
>  *             DTYPE_t * A = <DTYPE_t *> &self.X[X_start, 0]             # <<<<<<<<<<<<<<
>  *             DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0]
>  *             ITYPE_t lda = self.n_features
>  */
>   __pyx_t_2 = __pyx_v_X_start;
>   __pyx_t_3 = 0;
>   __pyx_v_A = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=0 */ (__pyx_v_self->X.data + __pyx_t_2 * __pyx_v_self->X.strides[0]) )) + __pyx_t_3)) )))));
191c113
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":248
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":246
193,200c115,122
<  *             DTYPE_t * A = <DTYPE_t *> &X_c[0, 0]
<  *             DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0]             # <<<<<<<<<<<<<<
<  *             ITYPE_t lda = X_c.shape[1]
<  *             ITYPE_t ldb = X_c.shape[1]
<  */
<   __pyx_t_5 = 0;
<   __pyx_t_4 = 0;
<   __pyx_v_B = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=0 */ (__pyx_v_Y_c.data + __pyx_t_5 * __pyx_v_Y_c.strides[0]) )) + __pyx_t_4)) )))));
---
>  *             DTYPE_t * A = <DTYPE_t *> &self.X[X_start, 0]
>  *             DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0]             # <<<<<<<<<<<<<<
>  *             ITYPE_t lda = self.n_features
>  *             ITYPE_t ldb = self.n_features
>  */
>   __pyx_t_3 = __pyx_v_Y_start;
>   __pyx_t_2 = 0;
>   __pyx_v_B = ((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t *)(&(*((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=1 */ ((char *) (((__pyx_t_7sklearn_5utils_9_typedefs_DTYPE_t const  *) ( /* dim=0 */ (__pyx_v_self->Y.data + __pyx_t_3 * __pyx_v_self->Y.strides[0]) )) + __pyx_t_2)) )))));
202,206c124,128
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":249
<  *             DTYPE_t * A = <DTYPE_t *> &X_c[0, 0]
<  *             DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0]
<  *             ITYPE_t lda = X_c.shape[1]             # <<<<<<<<<<<<<<
<  *             ITYPE_t ldb = X_c.shape[1]
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":247
>  *             DTYPE_t * A = <DTYPE_t *> &self.X[X_start, 0]
>  *             DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0]
>  *             ITYPE_t lda = self.n_features             # <<<<<<<<<<<<<<
>  *             ITYPE_t ldb = self.n_features
209c131,132
<   __pyx_v_lda = (__pyx_v_X_c.shape[1]);
---
>   __pyx_t_1 = __pyx_v_self->__pyx_base.n_features;
>   __pyx_v_lda = __pyx_t_1;
211,214c134,137
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":250
<  *             DTYPE_t * B = <DTYPE_t *> &Y_c[0, 0]
<  *             ITYPE_t lda = X_c.shape[1]
<  *             ITYPE_t ldb = X_c.shape[1]             # <<<<<<<<<<<<<<
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":248
>  *             DTYPE_t * B = <DTYPE_t *> &self.Y[Y_start, 0]
>  *             ITYPE_t lda = self.n_features
>  *             ITYPE_t ldb = self.n_features             # <<<<<<<<<<<<<<
216c139
<  *             ITYPE_t ldc = Y_c.shape[0]
---
>  *             ITYPE_t ldc = Y_end - Y_start
218c141,142
<   __pyx_v_ldb = (__pyx_v_X_c.shape[1]);
---
>   __pyx_t_1 = __pyx_v_self->__pyx_base.n_features;
>   __pyx_v_ldb = __pyx_t_1;
220,222c144,146
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":251
<  *             ITYPE_t lda = X_c.shape[1]
<  *             ITYPE_t ldb = X_c.shape[1]
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":249
>  *             ITYPE_t lda = self.n_features
>  *             ITYPE_t ldb = self.n_features
224c148
<  *             ITYPE_t ldc = Y_c.shape[0]
---
>  *             ITYPE_t ldc = Y_end - Y_start
229,230c153,154
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":252
<  *             ITYPE_t ldb = X_c.shape[1]
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":250
>  *             ITYPE_t ldb = self.n_features
232c156
<  *             ITYPE_t ldc = Y_c.shape[0]             # <<<<<<<<<<<<<<
---
>  *             ITYPE_t ldc = Y_end - Y_start             # <<<<<<<<<<<<<<
234c158
<  *         # dist_middle_terms = `-2 * X_c @ Y_c.T`
---
>  *         # dist_middle_terms = `-2 * X[X_start:X_end] @ Y[Y_start:Y_end].T`
236c160
<   __pyx_v_ldc = (__pyx_v_Y_c.shape[0]);
---
>   __pyx_v_ldc = (__pyx_v_Y_end - __pyx_v_Y_start);
238c162
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":255
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":253
240c164
<  *         # dist_middle_terms = `-2 * X_c @ Y_c.T`
---
>  *         # dist_middle_terms = `-2 * X[X_start:X_end] @ Y[Y_start:Y_end].T`
247c171
<   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":257
---
>   /* "sklearn/metrics/_pairwise_distances_reduction/_middle_term_computer.pyx":255
266,270d189
<   __pyx_L1_error:;
<   __PYX_XDEC_MEMVIEW(&__pyx_t_1, 0);
<   __PYX_XDEC_MEMVIEW(&__pyx_t_3, 0);
<   __Pyx_WriteUnraisable("sklearn.metrics._pairwise_distances_reduction._middle_term_computer.DenseDenseMiddleTermComputer64._compute_dist_middle_terms", __pyx_clineno, __pyx_lineno, __pyx_filename, 1, 1);
<   __pyx_r = 0;
272,273d190
<   __PYX_XDEC_MEMVIEW(&__pyx_v_X_c, 0);
<   __PYX_XDEC_MEMVIEW(&__pyx_v_Y_c, 0);

Still, I am rerunning benchmarks using 8 threads.

@jjerphan
Copy link
Member Author

Performances have not significantly changed.

Details
asv continuous -b PairwiseDistancesR -e upstream/main maint/pdr-do-not-slice
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
·· Building 7705579f <maint/pdr-do-not-slice> for conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
·· Installing 7705579f <maint/pdr-do-not-slice> into conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
· Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)
[  0.00%] · For scikit-learn commit 7705579f <maint/pdr-do-not-slice> (round 1/1):
[  0.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
[ 50.00%] ··· ...iseDistancesReductionsBenchmark.time_ArgKmin       12/54 failed
[ 50.00%] ··· ========== ======== ============ =========== =============== =============== ========= ======== =============
               n_train    n_test   n_features     metric       strategy         dtype       X_train   X_test               
              ---------- -------- ------------ ----------- --------------- --------------- --------- -------- -------------
                 1000      1000       100       euclidean        auto       numpy.float32    dense    dense    9.75±0.04ms 
                 1000      1000       100       euclidean        auto       numpy.float64    dense    dense    6.93±0.03ms 
                 1000      1000       100       euclidean   parallel_on_X   numpy.float32    dense    dense    7.44±0.07ms 
                 1000      1000       100       euclidean   parallel_on_X   numpy.float64    dense    dense    5.38±0.02ms 
                 1000      1000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense    9.76±0.04ms 
                 1000      1000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense    6.92±0.02ms 
                 1000     10000       100       euclidean        auto       numpy.float32    dense    dense     43.1±0.1ms 
                 1000     10000       100       euclidean        auto       numpy.float64    dense    dense    42.6±0.08ms 
                 1000     10000       100       euclidean   parallel_on_X   numpy.float32    dense    dense     43.2±0.1ms 
                 1000     10000       100       euclidean   parallel_on_X   numpy.float64    dense    dense    42.7±0.07ms 
                 1000     10000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense     64.2±0.1ms 
                 1000     10000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense    62.0±0.07ms 
                 1000     100000      100       euclidean        auto       numpy.float32    dense    dense     416±0.4ms  
                 1000     100000      100       euclidean        auto       numpy.float64    dense    dense     414±0.6ms  
                 1000     100000      100       euclidean   parallel_on_X   numpy.float32    dense    dense     416±0.4ms  
                 1000     100000      100       euclidean   parallel_on_X   numpy.float64    dense    dense     414±0.7ms  
                 1000     100000      100       euclidean   parallel_on_Y   numpy.float32    dense    dense      640±1ms   
                 1000     100000      100       euclidean   parallel_on_Y   numpy.float64    dense    dense      616±1ms   
                10000      1000       100       euclidean        auto       numpy.float32    dense    dense    42.8±0.07ms 
                10000      1000       100       euclidean        auto       numpy.float64    dense    dense     42.4±0.1ms 
                10000      1000       100       euclidean   parallel_on_X   numpy.float32    dense    dense    44.5±0.08ms 
                10000      1000       100       euclidean   parallel_on_X   numpy.float64    dense    dense    42.9±0.05ms 
                10000      1000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense    42.8±0.06ms 
                10000      1000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     42.4±0.2ms 
                10000     10000       100       euclidean        auto       numpy.float32    dense    dense     383±0.4ms  
                10000     10000       100       euclidean        auto       numpy.float64    dense    dense     384±0.4ms  
                10000     10000       100       euclidean   parallel_on_X   numpy.float32    dense    dense     383±0.3ms  
                10000     10000       100       euclidean   parallel_on_X   numpy.float64    dense    dense     384±0.5ms  
                10000     10000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense     415±0.3ms  
                10000     10000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     408±0.5ms  
                10000     100000      100       euclidean        auto       numpy.float32    dense    dense      3.74±0s   
                10000     100000      100       euclidean        auto       numpy.float64    dense    dense      3.75±0s   
                10000     100000      100       euclidean   parallel_on_X   numpy.float32    dense    dense      3.74±0s   
                10000     100000      100       euclidean   parallel_on_X   numpy.float64    dense    dense      3.77±0s   
                10000     100000      100       euclidean   parallel_on_Y   numpy.float32    dense    dense      4.13±0s   
                10000     100000      100       euclidean   parallel_on_Y   numpy.float64    dense    dense      4.07±0s   
               10000000    1000       100       euclidean        auto       numpy.float32    dense    dense     37.2±0.06s 
               10000000    1000       100       euclidean        auto       numpy.float64    dense    dense     37.6±0.1s  
               10000000    1000       100       euclidean   parallel_on_X   numpy.float32    dense    dense     42.5±0.02s 
               10000000    1000       100       euclidean   parallel_on_X   numpy.float64    dense    dense     41.1±0.02s 
               10000000    1000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense     37.1±0.04s 
               10000000    1000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     37.4±0.01s 
               10000000   10000       100       euclidean        auto       numpy.float32    dense    dense       failed   
               10000000   10000       100       euclidean        auto       numpy.float64    dense    dense       failed   
               10000000   10000       100       euclidean   parallel_on_X   numpy.float32    dense    dense       failed   
               10000000   10000       100       euclidean   parallel_on_X   numpy.float64    dense    dense       failed   
               10000000   10000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense       failed   
               10000000   10000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense       failed   
               10000000   100000      100       euclidean        auto       numpy.float32    dense    dense       failed   
               10000000   100000      100       euclidean        auto       numpy.float64    dense    dense       failed   
               10000000   100000      100       euclidean   parallel_on_X   numpy.float32    dense    dense       failed   
               10000000   100000      100       euclidean   parallel_on_X   numpy.float64    dense    dense       failed   
               10000000   100000      100       euclidean   parallel_on_Y   numpy.float32    dense    dense       failed   
               10000000   100000      100       euclidean   parallel_on_Y   numpy.float64    dense    dense       failed   
              ========== ======== ============ =========== =============== =============== ========= ======== =============

[ 50.00%] ···· For parameters: 10000000, 10000, 100, 'euclidean', 'auto', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 10000, 100, 'euclidean', 'auto', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 10000, 100, 'euclidean', 'parallel_on_X', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 10000, 100, 'euclidean', 'parallel_on_X', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 10000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 10000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'auto', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'auto', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_X', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_X', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)

[ 50.00%] · For scikit-learn commit 68a74272 <main> (round 1/1):
[ 50.00%] ·· Building for conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
[ 50.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl
[100.00%] ··· ...iseDistancesReductionsBenchmark.time_ArgKmin       12/54 failed
[100.00%] ··· ========== ======== ============ =========== =============== =============== ========= ======== =============
               n_train    n_test   n_features     metric       strategy         dtype       X_train   X_test               
              ---------- -------- ------------ ----------- --------------- --------------- --------- -------- -------------
                 1000      1000       100       euclidean        auto       numpy.float32    dense    dense    9.75±0.06ms 
                 1000      1000       100       euclidean        auto       numpy.float64    dense    dense    6.92±0.07ms 
                 1000      1000       100       euclidean   parallel_on_X   numpy.float32    dense    dense    7.39±0.07ms 
                 1000      1000       100       euclidean   parallel_on_X   numpy.float64    dense    dense    5.39±0.02ms 
                 1000      1000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense    9.84±0.05ms 
                 1000      1000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     7.05±0.2ms 
                 1000     10000       100       euclidean        auto       numpy.float32    dense    dense     43.1±0.2ms 
                 1000     10000       100       euclidean        auto       numpy.float64    dense    dense     42.6±0.1ms 
                 1000     10000       100       euclidean   parallel_on_X   numpy.float32    dense    dense     43.2±0.1ms 
                 1000     10000       100       euclidean   parallel_on_X   numpy.float64    dense    dense     42.6±0.1ms 
                 1000     10000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense     64.0±0.3ms 
                 1000     10000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     61.6±0.3ms 
                 1000     100000      100       euclidean        auto       numpy.float32    dense    dense     417±0.8ms  
                 1000     100000      100       euclidean        auto       numpy.float64    dense    dense     415±0.6ms  
                 1000     100000      100       euclidean   parallel_on_X   numpy.float32    dense    dense     417±0.7ms  
                 1000     100000      100       euclidean   parallel_on_X   numpy.float64    dense    dense     414±0.4ms  
                 1000     100000      100       euclidean   parallel_on_Y   numpy.float32    dense    dense      635±1ms   
                 1000     100000      100       euclidean   parallel_on_Y   numpy.float64    dense    dense      609±2ms   
                10000      1000       100       euclidean        auto       numpy.float32    dense    dense     42.8±0.2ms 
                10000      1000       100       euclidean        auto       numpy.float64    dense    dense     42.4±0.2ms 
                10000      1000       100       euclidean   parallel_on_X   numpy.float32    dense    dense     44.4±0.1ms 
                10000      1000       100       euclidean   parallel_on_X   numpy.float64    dense    dense     42.9±0.1ms 
                10000      1000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense     42.7±0.2ms 
                10000      1000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     42.5±0.2ms 
                10000     10000       100       euclidean        auto       numpy.float32    dense    dense      387±2ms   
                10000     10000       100       euclidean        auto       numpy.float64    dense    dense     385±0.4ms  
                10000     10000       100       euclidean   parallel_on_X   numpy.float32    dense    dense      388±2ms   
                10000     10000       100       euclidean   parallel_on_X   numpy.float64    dense    dense     385±0.5ms  
                10000     10000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense     416±0.5ms  
                10000     10000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense      410±1ms   
                10000     100000      100       euclidean        auto       numpy.float32    dense    dense      3.75±0s   
                10000     100000      100       euclidean        auto       numpy.float64    dense    dense     3.76±0.01s 
                10000     100000      100       euclidean   parallel_on_X   numpy.float32    dense    dense      3.75±0s   
                10000     100000      100       euclidean   parallel_on_X   numpy.float64    dense    dense      3.76±0s   
                10000     100000      100       euclidean   parallel_on_Y   numpy.float32    dense    dense      4.14±0s   
                10000     100000      100       euclidean   parallel_on_Y   numpy.float64    dense    dense      4.08±0s   
               10000000    1000       100       euclidean        auto       numpy.float32    dense    dense     37.3±0.1s  
               10000000    1000       100       euclidean        auto       numpy.float64    dense    dense     37.5±0.08s 
               10000000    1000       100       euclidean   parallel_on_X   numpy.float32    dense    dense     42.8±0.09s 
               10000000    1000       100       euclidean   parallel_on_X   numpy.float64    dense    dense     41.0±0.02s 
               10000000    1000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense     37.2±0.02s 
               10000000    1000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense     37.5±0.05s 
               10000000   10000       100       euclidean        auto       numpy.float32    dense    dense       failed   
               10000000   10000       100       euclidean        auto       numpy.float64    dense    dense       failed   
               10000000   10000       100       euclidean   parallel_on_X   numpy.float32    dense    dense       failed   
               10000000   10000       100       euclidean   parallel_on_X   numpy.float64    dense    dense       failed   
               10000000   10000       100       euclidean   parallel_on_Y   numpy.float32    dense    dense       failed   
               10000000   10000       100       euclidean   parallel_on_Y   numpy.float64    dense    dense       failed   
               10000000   100000      100       euclidean        auto       numpy.float32    dense    dense       failed   
               10000000   100000      100       euclidean        auto       numpy.float64    dense    dense       failed   
               10000000   100000      100       euclidean   parallel_on_X   numpy.float32    dense    dense       failed   
               10000000   100000      100       euclidean   parallel_on_X   numpy.float64    dense    dense       failed   
               10000000   100000      100       euclidean   parallel_on_Y   numpy.float32    dense    dense       failed   
               10000000   100000      100       euclidean   parallel_on_Y   numpy.float64    dense    dense       failed   
              ========== ======== ============ =========== =============== =============== ========= ======== =============

[100.00%] ···· For parameters: 10000000, 10000, 100, 'euclidean', 'auto', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 10000, 100, 'euclidean', 'auto', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 10000, 100, 'euclidean', 'parallel_on_X', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 10000, 100, 'euclidean', 'parallel_on_X', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 10000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 10000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'auto', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'auto', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_X', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_X', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float32'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)
               
               For parameters: 10000000, 100000, 100, 'euclidean', 'parallel_on_Y', <class 'numpy.float64'>, 'dense', 'dense'
               
               
               asv: benchmark timed out (timeout 500s)


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

@ogrisel
Copy link
Member

ogrisel commented Nov 22, 2022

Since the performance impact of slicing vs pointer arithmetic is negligible in this case (thanks for checking), I would go for the solution that is the most readable. On can argue that the new version is slightly better because it removes the X_c indirection to make the comment on line 400 easier to interpret. So let's merge this once CI is green.

@jjerphan jjerphan merged commit ad14f91 into scikit-learn:main Nov 22, 2022
@jjerphan jjerphan deleted the maint/pdr-do-not-slice branch November 22, 2022 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants