Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Allow users to pass instances of DistanceMetric directly to metric keyword arguments #26329

Open
1 of 5 tasks
Micky774 opened this issue May 5, 2023 · 0 comments
Open
1 of 5 tasks
Labels

Comments

@Micky774
Copy link
Contributor

Micky774 commented May 5, 2023

Motivation

SIMD intrinsics can accelerate pairwise distance computation by a factors of ~2.5-3.5x for float64 data, and ~5-6x for float32 data (benchmarked by this gist: https://gist.github.com/Micky774/bd1b8394fdaa82b25dcdfc111835c19b).

Plots

01ef290e-2055-47c8-ae0f-3fa009c478ef
373fb964-81cf-4550-9875-39e0d5de28bd

These benefits translate effectively into computation-bound estimators, such as KNeighborsRegressor (based on #26267):

Plots

1d7664fd-870d-4a7a-8f1a-7f25a0029d9c

f9aa8ab6-2dd3-4b47-9e20-0a6ea4e17cc3

Alternatives Considered

As discussed in #26010 and Micky774#11, while there is a significant preference towards avoiding implementing SIMD-based solutions within scikit-learn at this time. I do believe that there is a reasonable way to maintain such work (at least up to SSE3 instructions), however a better-accepted solution is to create a plug-in for DistanceMetric and offer the SIMD-accelerated implementations as an engine. While this is indeed a good solution in the long run, there is still much work needed to be done on the plug-in API (#22438). Working on a separate engine/plug-in for DistanceMetric while the API is still being solidified and #25535 is still unmerged is probably going to do more harm than good by adding one more moving part to the mix and slowing down the review process.

Suggested Solution

Allow users to pass instances of DistanceMetric directly to metric keyword arguments. This is backwards compatible and doesn't require any significant new infrastructure (mainly small changes to validation and updated docs/tests). This enables third-party libraries to provide their own accelerated solutions immediately.

In practice, this involves changes mainly in the following:

  • ArgKmin
  • RadiusNeighbors
  • ArgKminClassMode
  • pairwise_distances
  • pairwise_distances_argmin

This will allow us to enable the functionality in parts of the following estimators (non-exhaustive):

- NearestNeighbors
- KNeighborsRegressor
- KNeighborsClassifier
- RadiusNeighborsRegressor
- RadiusNeighborsClassifier
- DBSCAN
- OPTICS
- Isomap
- TSNE (self.method != "exact")
- KernelDensity
- AffinityPropagation
- Birch
- MeanShift
- NearestCentroid

Notes:

  1. pairwise_distanecs can't actually use the DistanceMetric in its current state, however once FEA Introduce PairwiseDistances, a generic back-end for pairwise_distances #25561 is completed, it can benefit from accelerated DistanceMetric options as well.
  2. Currently {KD, Ball}Tree support passing DistanceMetric through the metric argument, however do not support DistanceMetric32 (see: ENH Add float32 implementations for BallTree and KDTree  #25914)

Implementation

I have a sample implementation of this for KNeighborsRegressor, which is achieved by enabling this functionality for ArgKmin along with updating parameter validation in NeighborsBase; please see #26267.

@Micky774 Micky774 added the API label May 5, 2023
@Micky774 Micky774 changed the title API Allow users to pass instances of DistanceMetric{32} directly to metric keyword arguments bbgAPI Allow users to pass instances of DistanceMetric directly to metric keyword arguments Jul 24, 2023
@Micky774 Micky774 changed the title bbgAPI Allow users to pass instances of DistanceMetric directly to metric keyword arguments API Allow users to pass instances of DistanceMetric directly to metric keyword arguments Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant