API Allow users to pass instances of DistanceMetric
directly to metric
keyword arguments
#26329
Open
1 of 5 tasks
Labels
Motivation
SIMD intrinsics can accelerate pairwise distance computation by a factors of ~2.5-3.5x for
float64
data, and ~5-6x forfloat32
data (benchmarked by this gist: https://gist.github.com/Micky774/bd1b8394fdaa82b25dcdfc111835c19b).Plots
These benefits translate effectively into computation-bound estimators, such as
KNeighborsRegressor
(based on #26267):Plots
Alternatives Considered
As discussed in #26010 and Micky774#11, while there is a significant preference towards avoiding implementing SIMD-based solutions within scikit-learn at this time. I do believe that there is a reasonable way to maintain such work (at least up to
SSE3
instructions), however a better-accepted solution is to create a plug-in forDistanceMetric
and offer the SIMD-accelerated implementations as an engine. While this is indeed a good solution in the long run, there is still much work needed to be done on the plug-in API (#22438). Working on a separate engine/plug-in forDistanceMetric
while the API is still being solidified and #25535 is still unmerged is probably going to do more harm than good by adding one more moving part to the mix and slowing down the review process.Suggested Solution
Allow users to pass instances of
DistanceMetric
directly tometric
keyword arguments. This is backwards compatible and doesn't require any significant new infrastructure (mainly small changes to validation and updated docs/tests). This enables third-party libraries to provide their own accelerated solutions immediately.In practice, this involves changes mainly in the following:
ArgKmin
RadiusNeighbors
ArgKminClassMode
pairwise_distances
pairwise_distances_argmin
This will allow us to enable the functionality in parts of the following estimators (non-exhaustive):
Notes:
pairwise_distanecs
can't actually use theDistanceMetric
in its current state, however once FEA IntroducePairwiseDistances
, a generic back-end forpairwise_distances
#25561 is completed, it can benefit from acceleratedDistanceMetric
options as well.{KD, Ball}Tree
support passingDistanceMetric
through themetric
argument, however do not supportDistanceMetric32
(see: ENH Addfloat32
implementations forBallTree
andKDTree
#25914)Implementation
I have a sample implementation of this for
KNeighborsRegressor
, which is achieved by enabling this functionality forArgKmin
along with updating parameter validation inNeighborsBase
; please see #26267.The text was updated successfully, but these errors were encountered: