Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexIVFPQ search first pass very slow #3331

Open
2 of 4 tasks
Kh4L opened this issue Mar 29, 2024 · 4 comments
Open
2 of 4 tasks

IndexIVFPQ search first pass very slow #3331

Kh4L opened this issue Mar 29, 2024 · 4 comments
Labels

Comments

@Kh4L
Copy link

Kh4L commented Mar 29, 2024

Summary

First pass of IndexIVFPQ search takes about 60s, even with a fairly small dataset.

1st search time 60.47380566596985
2nd search time 0.0010259151458740234

Platform

Faiss version: https://github.com/facebookresearch/faiss/tree/v1.7.4

Installed from: source

Faiss compilation options: cmake -B build . && make -C build -j faiss && make -C build -j swigfaiss

Running on:

  • CPU
  • GPU

Interface:

  • C++
  • Python

Reproduction instructions

import torch
import faiss
import faiss.contrib.torch_utils
import time

dataset = torch.rand((2**14, 128)).cuda()

query_k = 12
num_cells = 10
num_cells_to_visit = 10
bits_per_vector = 8
channels = dataset.size(1)

index = faiss.IndexIVFPQ(
    faiss.IndexFlatL2(channels),
    channels,
    num_cells,
    bits_per_vector,
    8,
    faiss.METRIC_L2,
)
index.nprobe = num_cells_to_visit

index = faiss.index_cpu_to_gpu(
    faiss.StandardGpuResources(),
    dataset.device.index,
    index,
)

index.train(dataset)
index.add(dataset.detach())

emb = torch.rand((512,128)).cuda()
start = time.time()
score, idx = index.search(emb.detach(), query_k)
print(f"1st search time {time.time() - start}")
emb = torch.rand((512,128)).cuda()
start = time.time()
score, idx = index.search(emb.detach(), query_k)
print(f"2nd search time {time.time() - start}")
@Kh4L
Copy link
Author

Kh4L commented Apr 3, 2024

Screenshot 2024-04-03 at 2 07 34 PM Profile with nsys, `pass1SelectLists` and `pass2SelectLists` are very slow and it's mostly CPU

@mdouze mdouze added the GPU label Apr 5, 2024
@mdouze
Copy link
Contributor

mdouze commented Apr 5, 2024

It is probably compiling some kernels on the first run. Do you have a NVIDIA cache directory? Normally in ~/.nv

@Kh4L
Copy link
Author

Kh4L commented Apr 11, 2024

It is probably compiling some kernels on the first run. Do you have a NVIDIA cache directory? Normally in ~/.nv

I don't see any ~/.nv after the first run, and the first pass is consistently very long (60+ s) across runs.

@sunxiaojie99
Copy link

I also encountered this problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants