-
-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Avoid memoryviews' slicing for KMeans
Cython implementations
#24565
Comments
Label: Cython, Clustering |
Thank you for opening this issue and proposing a PR, @adam2392.
Yes — cross-referencing #17299.
I think 1-D slicing comes with some overhead. This is present in k-means' internals here for instance: scikit-learn/sklearn/cluster/_k_means_common.pyx Lines 155 to 159 in 798aeeb
but it might not be as costly. I think we can proceed with separate benchmarks for both removing 2D- and 1D-slicing and see if those removals are useful. |
KMeans
Cython implementations
Closing this as there is no performance gain. |
as seen in #24566 experiments. |
Summary
Addresses issues raised in #17299
The proposal is to modify the LOC here: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/cluster/_k_means_common.pyx#L155-L159. There are currently three places where the
_euclidean_sparse_dense
Cython function is used and can be optimized.The issue with the current implementation is that
centers
is a 2D memview and thus passing incenters[j]
creates a 1D memview. I think going from 1D memview to another 1D memview is okay(?) If not, then we need to also modify the other arguments.Proposal
Change the signature of the Cython function
_euclidean_sparse_dense
to this:and adjust the unit tests and corresponding Cython code. I will put up a draft PR to demonstrate what is needed there.
Misc.
cc: @jjerphan who brought this up as a possible Cython improvement for me to help out with.
The text was updated successfully, but these errors were encountered: