Implementation notes

Here are a few notes on implementation details for which we sometimes get questions. We describe the tradeoffs and maybe a few unexpected design choices or results.

Matrix multiplication to do many L2 distance computations

A typical operation in IndexFlatL2 is to exhaustively compare a set of nq query vectors and a set of nb database vectors in dimension d (then select the top-k smallest vectors).

Faiss has two implementations of this operation:

direct implementation that loops over nq, nb and the dimension of the vectors.
an implementation that uses the decomposition d(x, y) = ||x||^2 + ||y||^2 - 2 * <x, y>. This is faster because the most expensive operation in O(nq * nb * d) can be handed over to BLAS that normally does this efficiently.

We use implementation 1 when nq < 20 and d is a multiple of 4, and implementation 2 otherwise. The threshold 20 can be adjusted via global variable faiss::distance_compute_bias_threshold (accessible in Python via faiss.cvar.distance_compute_bias_threshold).

Note that solution 2 may be less stable numerically than 1 for vectors of very different magnitudes, see discussion in issue #297.

k-means implementation

k-means is implemented in the Clustering object

After initialization, the k-means iterates two operations:

assign training points to centroids
recompute centroids as the center of mass of the points they are assigned to.

In terms of performance, the first operation is the most costly (by far). Incidentally, it can be performed by any index, since it is a nearest-neighbor search of the vectors to the centroids. Therefore the index is a parameter of the Clustering train method. It can be replaced with a GPU index (example) or a HNSW index (example).