Additive quantizers

[This page is Work In Progress and depends on a PR that has not been merged yet]

An additive quantizer approximates a vector x in dimension d as

x ~= x' = T_1(i_1) + ... + T_M(i_M)

where T_m is a table of d-dimensional vectors indexed by i_m. The size of table m is K_m. The code that is stored to represent x is

[i_1,..., i_M].

It is of size (in bits)

ceil(log_2(K_1)) + ... + ceil(log_2(K_M)).

As usual with data-adaptive quantizers, the tables T_1, ..., T_M are learned on a training set.

The product quantization can be considered as a special case of additive quantization where only a sub-vector of size d/M is non-zero in each table T_m.

A good introduction to additive quantization is Additive quantization for extreme vector compression, Babenko and Lempitsky, CVPR'14.

Faiss implementation

The Faiss implementation is based on the AdditiveQuantizer object.

The Faiss implementation imposes that K_m is a power of two. It does not in general impose that the K_m's are all identical.

The decoding of an additive quantizer is unambiguous. However there is no simple method to (1) train an additive quantizer and (2) perform the encoding. Therefore, Faiss contains two implementations of additive quantizers: the Residual Quantizer and the Local Search Quantizer.

The residual quantizer

The Faiss residual quantizer is in the [ResidualQuantizer] (https://github.com/facebookresearch/faiss/blob/master/faiss/impl/ResidualQuantizer.h) object.

In a residual quantizer, the encoding is sequential. At stage m of the encoding of x, the encoder picks the entry that best reconstructs the residual of vector x w.r.t. the previous encoding steps:

i_m = argmin_j || T_m(j) - (x - T_1(i_1) + ... + T_{m-1}(i_{m-1})) ||^2

This is a greedy approach, which tends to get trapped in local minima. To avoid the local minima, the encoder maintains a beam of possible codes and picks the best code at stage M. The beam size is given by the max_beam_size field of ResidualQuantizer: the larger the more accurate, but also the slower.

At training time, the tables are trained sequentially by k-means at each step. The max_beam_size is also used for that. The ResidualQuantizer supports a version of k-means that starts training in a lower dimension, as described in "Improved Residual Vector Quantization for High-dimensional Approximate Nearest Neighbor Search", Shicong et al, AAAI'15. We found that this k-means implementation was more useful for residual quantizer training than for k-means used in non-recursive settings (like IVF training). The alternative k-means is implemented in the ProgressiveDimClustering object.

The Local Search Quantizer

The LSQ quantizer is from "LSQ++: Lower running time and higher recall in multi-codebook quantization", Julieta Martinez, et al. ECCV 2018. It is implemented in the LocalSearchQuantizer object.

At encoding time, LSQ starts from a random encoding of the vector and proceeds to a simulated annealing optimization to optimize the codes. The parameter that determines the speed vs. accuracy tradeoff is the number of optimization steps encode_ils_iters.

The Faiss LSQ implementation is limited to constant codebook sizes K_1 = ... = K_M.

As a codec

The simplest usage of the additive quantizers is as codecs. Like other Faiss codecs, they have a compute_codes function and a decode function.

Since encoding is approximate and iterative, the tradeoff between them is mainly on encoding time vs. accuracy.

In the following, we compare different quantizers in terms of reconstruction error and recall for symmetric comparison.

Comparison of the additive quantizer options

We compare the additive quantizer options on the Deep1M and Bigann1M datasets (bigann rather than SIFT1M because we need more than 100k training vectors). The results are compareable with the Codec benchmarks. The regimes are in 64 bits and 128 bits. Since the limiting factor of additive quantizers is often the encoding time, we plot the accuracy vs. encoding time. The different operating points for a given codec are obtained by varying the max_beam_size parameters (for the residual quantizer) or the encode_ils_iters parameter (for the LSQ).

64-bit results

128-bit results

One can observe that:

the PQ variants (PQ and OPQ) are the fastest options by far
it is worthwhile to experiment with other sizes than the traditions 8-bit elementary quantizers. Many interesting operating points are obtained with laerger vocabularies (although training time can be prohibitive)
for most use cases, the residual quantizer is better than LSQ
for the residual quantizer, the encoding with look-up tables (rq_lut) is interesting for high values of max_beam_size (and also when the data dimensionality increases, but this is not shown in the plots). This is enabled by setting use_beam_LUT to 1 in the quantizer object.

Data from this notebook, generated by benchs/bench_quantizer.py.

Flat indexes

Additive quantizers can be used to do compressed-domain distance comparisons to do efficient comparisons between one query and many encoded vectors. The flat indexes are implemented in IndexAdditiveQuantizer, parent class of IndexResidualQuantizer and IndexLocalSearchQuantizer. The search function depends only on the additive quantization decoder, therefore it is in common between the two child classes.

LUT implementation

For a given query q, it is possible to precompute look-up tables

LUT_m(i) = <T_m(i), q> for all m=1..M and i=1..K_m

Then the dot product with x' encoded as [i_1,..., i_M] is:

<q, x> ~= <q, x'> = LUT_1(i_1) + ... + LUT_m(i_M)

The issue with L2 search and storing the norm

LUTs cannot be used directly to compute L2 distances. However,

||q - x'||^2 = ||q||^2 + ||x'||^2 - <q, x'>

Therefore if the norm of ||x'|| is stored, the distances can be computed. This is determined by the search_type field in the AdditiveQuantizer object.

It can take the following values:

ST_decompress: no norm stored, decompress database vector at runtime
ST_LUT_nonorm: just treat ||x'|| = 0 (may be OK for normalized vectors)
ST_norm_float, ST_norm_qint8, ST_norm_qint4: use various levels of compression

Evaluation

TODO evaluate the accuracy of ST_decompress vs. ST_norm_float and friends on SIFT1M with IndexResidualQuantizer and IndexLocalSearchQuantizer.

IVF indexes with additive quantizer codes

The additive quantizers can be used as a drop-in replacement for product or scalar quantization: the database vectors x are then stored in different buckets, as determined by a first-level coarse quantizer.

This is implemented in the objects IndexIVFAdditiveQuantizer, IndexIVFResidualQuantizer and IndexIVFLocalSearchQuantizer.

In addition, each bucket is associated to a centroid c given by the coarse quantizer. Is usually more accurate to encode the residual vector x-c since it usually has a lower norm. The flag by_residual determines whether the vector is stored directly or by residual.

Evaluation

TODO evaluate IndexIVFResidualQuantizer and IndexIVFLocalSearchQuantizer vs. IndexIVFPQ in terms of accuracy vs. distance computations because IVFAdditiveQuantizer is not well optimized for speed

Coarse quantizers

Every quantizer has a set of reproduction values, ie. the span of elements that it can decode. If the number of values is not too large, they can be considered as a set of centroids. By performing nearest-neighbor search in the set of centroids, we obtain a coarse quantizer for an IVF index.

Visiting nprobe > 1 buckets is supported naturally by searching for the nprobe nearest centroids.

Compared to a "flat" coarse quantizer, the accuracy is a bit lower. However, the assignment is more efficient and the coarse quantizer codebook is much more compact.

The coarse quantizer versions are AdditiveCoarseQuantizer and its children ResidualCoarseQuantizer and LSQCoarseQuantizer.

Evaluation

TODO comparison on 1M vectors / 262k centroids with IMI, IVF_HSNW

Splitting a residual quantizer into prefix and suffix

There is a special property of residual quantizers: a flat IndexResidual can be converted without re-encoding into an IndexIVFResidual. This works as follows:

codes are split into a prefix and a suffix, for example [i_1,..,i_3] and [i_4,...,i_M]
the residual codec is split into a ResidualCoarseQuantizer for the first 3 codebooks and an IndexIVFResidual for the remaining M-3 ones.
the codes [i_4, ... , i_M] are stored in the inverted lists

At search time, the distances computed are equivalent to the flat IndexResidual, except that the search can be performed in a non-exhaustive way.