Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to build hnsw faster #3316

Open
2 tasks
HiCheems opened this issue Mar 25, 2024 · 2 comments
Open
2 tasks

how to build hnsw faster #3316

HiCheems opened this issue Mar 25, 2024 · 2 comments
Labels

Comments

@HiCheems
Copy link

Summary

Recently, I build an index with index type "IDMap,HNSW32,Flat". The dataset size is 8M and dimension is 200. I used a very lone time to build it, more than 48h. Is some parameter setting wrong?

Platform

OS: docker with 40 core

Installed from: pip install faiss-cpu

Running on:

  • [x ] CPU
  • GPU

Interface:

  • C++
  • [ x] Python

Reproduction instructions

import os
os.environ["OMP_NUM_THREADS"] = "20"
os.environ["OMP_WAIT_POLICY"] = "PASSIVE"

dimension = 200
index_type = "IDMap,HNSW16,Flat"
metric_type = faiss.METRIC_INNER_PRODUCT
index = faiss.index_factory(dimension,index_type,metric_type)

for i in range(8000000):
embedding = np.random.rand(dimension).astype('float32')
l2_norm = np.linalg.norm(embedding)
normalized_embedding = embedding / l2_norm
normalized_embedding = normalized_embedding.reshape(1, -1)
index.add_with_ids(normalized_embedding, np.array([i]))

@mdouze
Copy link
Contributor

mdouze commented Mar 25, 2024

Building an HNSW index is indeed slow, but 48h seems excessive.
Could you try installing Faiss with conda?

@mdouze mdouze added the install label Mar 26, 2024
@HiCheems
Copy link
Author

Building an HNSW index is indeed slow, but 48h seems excessive. Could you try installing Faiss with conda?

It gets worse. I think maybe the problem is because it build hnsw index using a single core, even though there are 40 cores available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants