Can we use ann-benchmarks to benchmark the All-Nearest-Neighbors problem? #317

maxsu · 2022-11-06T00:26:29Z

The All-Nearest-Neighbors problem (All-NN) asks us to find all pairs of nearest neighbors in a set of n points in d-dimensions. Classically, there is a deterministic O(n log n) algorithm, as well as stochastic O(n sqrt(n)) algorithm.

I would like to test the All-NN performance of algorithms like Faiss. According to a google insider, the closed version of faiss does have a mode that solve All-NN. However, I don't know if the public version does.

I suspect it may, and I suspect that ann-benchmarks has a mode that can effectively test All-NN performance via its batch mode. However I don't understand the terminology well enough to determine of batch mode is equivalent to solving All-NN. Can it?

maumueller · 2022-11-08T09:55:31Z

@maxsu With some manual intervention, this is rather easy to support. (Just change datasets.py to not split up train/test but use the dataset for both.) Practically, the dataset files will become really large (because of the n x 1000 groundtruth data), and our brute-force approach should probably be replaced by faiss to compute the groundtruth quicker. From there, it's indeed just using batch mode.

@Cecca and I did these changes at some point for this paper. The code is available here, but we didn't intend to merge something back.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can we use ann-benchmarks to benchmark the All-Nearest-Neighbors problem? #317

Can we use ann-benchmarks to benchmark the All-Nearest-Neighbors problem? #317

maxsu commented Nov 6, 2022

maumueller commented Nov 8, 2022

Can we use ann-benchmarks to benchmark the All-Nearest-Neighbors problem? #317

Can we use ann-benchmarks to benchmark the All-Nearest-Neighbors problem? #317

Comments

maxsu commented Nov 6, 2022

maumueller commented Nov 8, 2022