Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is lastfm-64-dot called -dot? #400

Open
thomasahle opened this issue Apr 27, 2023 · 4 comments
Open

Why is lastfm-64-dot called -dot? #400

thomasahle opened this issue Apr 27, 2023 · 4 comments

Comments

@thomasahle
Copy link
Contributor

From the path http://ann-benchmarks.com/lastfm-64-dot_10_angular.html it seems that this dataset is actually angular.
But the name indicates dot-product, which many of the algorithms don't natively support.

@erikbern
Copy link
Owner

yeah this is very confusing – I think it's a mistake. https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/datasets.py#L427 indicates it's angular (cosine) distance too.

Maybe let's remove this dataset from the benchmarks for now.

@maumueller
Copy link
Collaborator

@benfred should be able to shed some light on this.

@benfred
Copy link
Contributor

benfred commented Apr 27, 2023

The original intent was to test out inner-product distance (dot), not angular distance: #91 .

IIRC, the rationale was that certain algorithms either didn't support IP distance - or didn't have good performance when applying transforms like https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/XboxInnerProduct.pdf to convert IP distance to a cosine space

@erikbern
Copy link
Owner

I think it's nice to have a dataset for dot products. But I'll fix that after I'm done with this run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants