Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] NearestNeighbors search with euclidean metric and two_pass_precision does not compute square root distances. #5788

Open
rtubelleza opened this issue Feb 28, 2024 · 1 comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working

Comments

@rtubelleza
Copy link

Describe the bug
The euclidean distances returned by NearestNeighbors do not seem to be square rooted when two_pass_precision=True

Steps/Code to reproduce bug

# Need to reproduce bug below
import cuml
import cupy

# Log package versions
print(cuml.__version__)
print(cupy.__version__)

X, _ = blobs = cuml.datasets.make_blobs(
    n_samples=100, n_features=2, 
    centers=None, 
    cluster_std=1.0, 
    center_box=(-10.0, 10.0), 
    shuffle=True, 
    random_state=42, 
    return_centers=False, 
    order='F', dtype='float32')

for tpp in [True, False]:
    print(f"\nTwo Pass Precision: {tpp}")
    # d**2
    squared = cuml.neighbors.NearestNeighbors(
        n_neighbors=15, 
        algorithm="brute", 
        metric="sqeuclidean", 
        output_type="cupy"
        )
    squared.fit(X)
    d_sq, idx_sq = squared.kneighbors(X, two_pass_precision=tpp)
    
    
    
    # d 
    root = cuml.neighbors.NearestNeighbors(
        n_neighbors=15, 
        algorithm="brute", 
        metric="euclidean", 
        output_type="cupy"
        )
    root.fit(X)
    d_r, idx_r = root.kneighbors(X, two_pass_precision=tpp)
    
    
    # Log results
    print(cupy.all(d_sq == d_r))
    print(cupy.all(idx_sq == idx_r))
    # Manually calculate the euclidean distance between the first sample, and its first neighbor
    print(f"d2 = {((X[0] - X[idx_sq[0, 1].item()])**2).sum()}")
    print(f"d = {cp.sqrt(((X[0] - X[idx_sq[0, 1].item()])**2).sum())}")
    print(d_sq[0, 1])
    print(d_r[0, 1])

Outputs:

24.02.00
13.0.0

Two Pass Precision: True
True
True
d2 = 0.2593870162963867
d = 0.5093005299568176
0.25938702
0.25938702

Two Pass Precision: False
False
True
d2 = 0.2593870162963867
d = 0.5093005299568176
0.25938416
0.5092977

Expected behavior
Expect differences between distances when metric is "euclidean" and "sqeuclidean". Expect distances in euclidean space to be square rooted, but matches outputs of sqeuclidean. Only occurs when two_pass_precision = True.

@rtubelleza rtubelleza added ? - Needs Triage Need team to review and classify bug Something isn't working labels Feb 28, 2024
@dantegd
Copy link
Member

dantegd commented Mar 20, 2024

Thanks for the issue @rtubelleza, this looks like a bug in the code, we'll look into it and try to solve it as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants