Skip to content

Case studies

Matthijs Douze edited this page Dec 5, 2022 · 11 revisions

In this page, we reference example use cases for Faiss, with some explanations. The examples will most often be in the form of Python notebooks, but as usual translation to C++ should be smooth.

Implementing an evolving IVF dataset

This script demonstrates how to add/remove elements from an IVF dataset in a rolling fashion. The key is to use a Hashtable as DirectMap type and remove with IDSelectorArray. Removal cost is then proportional to the number of elements to remove instead of number of elements in the dataset.

demo_rolling_dataset.ipynb

Fast indexing of 2M vectors for max inner product search

This script demonstates how to speed up a recommendation system. Conceptually, the queries vectors are users and the database vectors are items to recommend. The metric to "compare" them is maximum inner product, ie. which item is the most relevant for each user. There is a real-time constraint for this use case (should be returned in < 5 ms) and the accuracy should be as high as possible.

recommendation_2M.ipynb

Limited size clustering

This script demonstrates how to do a k-means variant where in addition the clusters are constrained to contain no more than a maximum number of points.

limited_size_clustering.ipynb

Asymmetric binary search

This script demonstrates an asymmetric search use case: the query vectors are in full precision and the database vectors are compressed as binary vectors. This implementation is slow, it is mainly intended to show how much accuracy can be regained with asymmetric search.

demo_asymmetric_binary.ipynb

Manual training of IVFPQ

This script demonstrates how to manually train an IVFPQ index enclosed in a OPQ pre-processor. This can be useful, for example, if there are pre-trained centroids handy for the data distribution.

This is also implemented in the function train_ivf_index_with_2level. It should be easy to expand to other types of composite indexes.

manual_IVFPQ_training.ipynb

Mixed sparse-dense clustering

There is a sparse clustering implementation in faiss.contrib.clustering. This script demonstrates how to cluster vectors that are composed of a dense part of dimension d1 and a sparse part of dimension d2 where d2 >> d1. The centroids are represented as full dense vectors.

The implementation relies on the clustering.DatasetAssign object, that abstracts away the representation of the vectors to cluster. The clustering module contains a pure Python implementation of kmeans that can consume this DatasetAssign.

sparse_dense_clustering.ipynb

Clone this wiki locally