Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low recall with SPACEV1B dataset on GPU #3344

Open
2 of 4 tasks
karthik86248 opened this issue Apr 4, 2024 · 1 comment
Open
2 of 4 tasks

Low recall with SPACEV1B dataset on GPU #3344

karthik86248 opened this issue Apr 4, 2024 · 1 comment
Labels

Comments

@karthik86248
Copy link

Summary

I have downloaded 1 billion vectors (subset) from the original SPACEV1B dataset hosted on the SPTAG repo . The groundtruth was computed manually.

I'm using a slightly modified version of the bench_gpu_1bn.py script file to run ANN on SPACEV1B.

The recall values reported are relatively low (around 0.3). the index used is : OPQ24_96,IVF262144,PQ24.
Tried experimenting higher PQ values like 38, 32 etc but no significant improvement.

From the big-ann-benchmarks competition baseline, the suggested index is : IVF1048576,SQ8. Planning to try this index next. The GPU benchmark scripts in FAISS repo don't seem to support this SQ index.

Platform

OS: Ubuntu 22.04.1 LTS

Faiss version: 1.7.4

Installed from: Conda

Faiss compilation options:

Running on:

  • CPU
  • GPU

Interface:

  • C++
  • Python

Reproduction instructions

@mdouze
Copy link
Contributor

mdouze commented Apr 5, 2024

Space1B should not be too problematic, see tab2 in https://proceedings.mlr.press/v176/simhadri22a/simhadri22a.pdf
Support for SQ8 was added to Faiss after the GPU bench script. However, if you run on a single GPU the dataset may not fit in the GPU ram.

@mdouze mdouze added the question label Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants