Skip to content

Brute force search without an index

Matthijs Douze edited this page Mar 29, 2019 · 11 revisions

In the case of IndexFlat, the added data is just copied to the index without further encoding or organization. Therefore, it may be useful to short-circuit the indexing altogether. This makes it possible to:

  • avoid having two copies of the same data in memory

  • do updates on the data between searches.

Faiss provides low-level functions to do the brute-force search in this context.

The functions take a matrix of database vectors and a matrix of query vectors and return the k-nearest neighbors and their distances.

Brute force search on CPU

On CPU, the relevant function is knn_L2sqr or knn_inner_product.

Example usage in Python: this gist

Brute force search on GPU

On GPU, the relevant function is bruteForceKnn. An additional advantage is that it takes both CPU and GPU resident data as input or output.

An example usage in python, from pytorch GPU data is here: test_pytorch_faiss.py

Note that on GPU, the synchronization is needed because Faiss uses a non-default CUDA stream. The easiest workaround is to force it to use the default stream. This is done via

res = faiss.StandardGpuResources()
res.setDefaultNullStreamAllDevices()

Also, by default the amount of memory allocated by the resource object is too large for simple brute force. You can set:

res.setTempMemory(64 * 1024 * 1024)

ie. use only 64M (if that is not enough, try a few other values, according to some reports setting to 0 works just fine).

Clone this wiki locally