You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Incorrect clustering results when using dbscan clustering with brute algorithm and large values.
See the below code snippet - when the data type is changed to float64 or the algorithm is changed to ball_tree the correct clustering results are obtained.
For context, the large values are geographic coordinates.
I suspect this is because of the use of squared euclidean distance in the brute algorithm, which is outside the range where float32 can provide adequate precision:
Description
Incorrect clustering results when using dbscan clustering with brute algorithm and large values.
See the below code snippet - when the data type is changed to float64 or the algorithm is changed to ball_tree the correct clustering results are obtained.
For context, the large values are geographic coordinates.
I suspect this is because of the use of squared euclidean distance in the brute algorithm, which is outside the range where float32 can provide adequate precision:
scikit-learn/sklearn/neighbors/base.py
Line 699 in 7b136e9
Perhaps it would be appropriate to warn if epsilon is so small compared to the values being clustered that it will cause issues with precision.
Steps/Code to Reproduce
Results
The text was updated successfully, but these errors were encountered: