Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we use ann-benchmarks to benchmark the All-Nearest-Neighbors problem? #317

Open
maxsu opened this issue Nov 6, 2022 · 1 comment
Open

Comments

@maxsu
Copy link

maxsu commented Nov 6, 2022

The All-Nearest-Neighbors problem (All-NN) asks us to find all pairs of nearest neighbors in a set of n points in d-dimensions. Classically, there is a deterministic O(n log n) algorithm, as well as stochastic O(n sqrt(n)) algorithm.

I would like to test the All-NN performance of algorithms like Faiss. According to a google insider, the closed version of faiss does have a mode that solve All-NN. However, I don't know if the public version does.

I suspect it may, and I suspect that ann-benchmarks has a mode that can effectively test All-NN performance via its batch mode. However I don't understand the terminology well enough to determine of batch mode is equivalent to solving All-NN. Can it?

@maumueller
Copy link
Collaborator

@maxsu With some manual intervention, this is rather easy to support. (Just change datasets.py to not split up train/test but use the dataset for both.) Practically, the dataset files will become really large (because of the n x 1000 groundtruth data), and our brute-force approach should probably be replaced by faiss to compute the groundtruth quicker. From there, it's indeed just using batch mode.

@Cecca and I did these changes at some point for this paper. The code is available here, but we didn't intend to merge something back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants