Skip to content

Vector codecs

Matthijs Douze edited this page Feb 17, 2023 · 5 revisions

Faiss can be used to encode vectors into binary blobs. The compression is usually lossy, ie. decoding reconstructs only an approximation of the input vectors. The vectors are fixed-size and the binary blobs are also of fixed size.

API

Since most Faiss indexes do encode the vectors they store, the codec API just uses plain indexes as codecs. The codec can be constructed using the index_factory and trained with the train method.

The codec API add three functions that are prefixed with sa_ (standalone):

  • sa_code_size: returns the size in bytes of the codes generated by the codec
  • sa_encode: encodes a set of vectors into a set of codes
  • sa_decode: decode a set of codes to vectors

The codec itself must be trained on representative data to compress. It can be stored in a file or in a binary blob.

As usual the API is available in C++ and Python.

Example usage

The example below gets 1000 float vectors in 20D and compresses them to 4 bits per component. This generates 1000 codes of 10 bytes each (represented as a uint8 matrix).

  • In Python: the data and codes are matrices, float32 for data and uint8 for codes. All sizes are managed automatically
# generate some data
x = np.random.rand(1000, 20).astype('float32')
# prepare codec
codec = faiss.index_factory(20, "SQ4")
codec.train(x)
# encode
code = codec.sa_encode(x)
print(x.nbytes)
# decode
x_decoded = codec.sa_decode(code)
  • In C++: the data and codes are floating-point and uint8_t buffers respectively.
// prepare data
std::vector<float> x(1000 * 20); 
for (int i = 0; i < 1000 * 20; i++) x[i] = drand48();
// prepare codec
faiss::Index *codec = faiss::index_factory(20, "SQ4"); 
codec->train(1000, x.data()); 
// encode
std::vector<uint8_t> code(1000 * codec->sa_code_size()); 
codec->sa_encode(1000, x.data(), code.data()); 
// decode
std::vector<float> x_decoded(1000 * codec->d); 
codec->sa_decode(1000, code.data(), x_decoded.data());

The codecs can be used like any other Faiss, in particular use read_index and write_index to store them to files.

Guidelines to choose a codec

Criteria

Choosing a codec means striking a tradeoff between:

  • bytes per encoded vector. This is the main reason why the vectors should be encoded in the first place
  • accuracy of the codec. This can be measured as raw reconstruction accuracy (L2 distance between the decoded vector and the original one), or for a downstream task, ie. how well the encoded vectors perform in similarity search compared to non-encoded vectors.
  • size of the codec. To get self-contained compressed data, the serialized codec needs to be transmitted along with the codes, so it should not be too large.
  • timings for training / compression / decompression: this is usually less important than for the indexing use cases, because for codes the dominant factor is usually I/O. However it is still useful to keep it in mind.

See also the Vector codecs benchmarks section for an extensive comparison of the codecs on a few representative use cases.

Types of codecs

The codecs are most often made of one or several pre-processing steps, that transform the vectors into smaller vectors, followed by a quantization step, that reduces the vectors to integers that can be stored. In the factory string, the pre-processing and quantization are strings separated by commas.

Below, we present the compression options from least to strongest compression. See the benchmarks page for a detailed comparison on realistic datasets.

Very mild compression

To reduce the size of a factor 2x-4x a scalar quantizer is sufficient.

  • SQfp16 (2x): converts to 16-bit floating point, for vectors that don't need an extreme accuracy, this is almost lossless
  • SQ8 (4x): compresses to 8 bit per component.

Mild compression

For large vectors with relatively smooth distributions, it is often better to just do a PCA then use a scalar quantizer. Note that you need about 10x the output size of the PCA to train it, and that the PCA matrix is stored along with the codec, which can be bulky

  • PCARx,SQ8 (code size x): compresses by PCA followed by a random rotation, then uses a scalar quantizer
  • PCARx,SQ4 (code size x/2): same, with a 4-bit scalar quantizer (SQ6 also works)

Average compression

For more serious compression, the product quantizer comes into play. It is usually better to pre-process the data with OPQ.

  • OPQ64_256,PQ64(code size 64). Reduce to 256 dim then encode to 64 byte. Usually, it is best to keep the second argument of OPQ at around 4*x
  • OPQ64_256,PQ64x10 (code size 80). The default number of bits for the PQ is 8. By increasing that number to 10, 12, etc. a smooth set of operating points can be obtained. The tradeoff is: PQ8x10 produces the same code size as PQ10x8 but it is more accurate and the codec is larger.
  • OPQ64_256,Residual2x14,PQ64(code size 64+4). Apply a 2-level quantization, where the first level is also a quantizer. This usually gives a better tradeoff than PQ64 alone, but at the cost of a longer training and encoding times, and a larger codec.

Binary codes

If binary codes are required, mainly because they must be compared in the compressed domain, here is a codec that produces them:

  • PCAR256,LSHt: produces 256-bit codes with a random rotation followed by binarization. If the input dimension is 256 or lower, then use RR256,LSHt.
  • ITQ256,LSHt:pre-process the data with an ITQ transform, and produce binary codes.

How does it compare to plain Faiss indexes

A Faiss index contains (potentially) compressed vectors, plus additional structure to index them quickly. It is possible to extract the compressed vectors from the index, but it is not very convenient. The standalone codec addresses this issue.

(Experimental) High-performance decoder kernels

Faiss provides experimental high-performance decoders in the form of C++ templates for many codecs from IVFPQ, Residual+PQ and PQ families of codecs. Each of these templates provides the following functions:

  • store(): decodes a single vector into a provided buffer. This is similar to sa_decode(1, ...).
  • accum(): adds a weighted decoded single vector to a provided buffer. This allows to effectively create a linear combination of decoded vectors.

Experimental decoder kernels allow to perform decoding operation much faster than a standard sa_decode() call. The price of the performance is that the parameters of the codec template have to be known beforehand. Also, please note that some of the combinations of parameters / dimensionality values are not currently supported due to the experimental nature of the code.

Please refer to SADecodeKernels.h for the full list of accepted decoders and to test_cppcontrib_sa_decode.cpp for the examples of code.

Clone this wiki locally