Adding support for FAISS #225

raman-r-4978 · 2020-09-15T07:55:15Z

Do you guys have any plans to support faiss other than nmslib in future?

Few issues I have encountered while using nmslib is,

It doesn't support mmap, leads to massive memory consumptions while reading index graphs.
Nowadays, the most common vector size for NLP is 768 or 1024, with that adding ~1M vectors to the nmslib index takes a very very long time to build graphs compared to the faiss IVFFlat type of indexes
Speaking of time, considering graph merges adds a whole additional complications to this.

Since faiss has a significant solutions to handle these issues, I would be happy to have both of them integrated into this plugin.

Attaching an ES issue thread that you might be interested in.

I have also created Java bindings for faiss which can be found here.

Hope it helps

vamshin · 2020-09-16T03:29:36Z

Hi @RamanRajarathinam,

We are in plans to support FAISS. Nothing concrete yet. Will keep this thread open as a Feature request. Based on the community feedback we could prioritize the possibility of this feature. Those who end up on this thread looking for FAISS support please +1 this thread.

Kavan72 · 2020-12-15T19:17:02Z

+1

YashalShakti · 2021-01-14T08:50:05Z

+1

hiro-v · 2021-01-22T07:07:21Z

+1

walker313504 · 2021-03-18T02:17:26Z

+1

greav · 2021-03-18T08:54:13Z

+1

jmazanec15 · 2021-03-19T19:02:07Z

As an update, we are working to add faiss support to the plugin. We recently received a contribution to add the library and its HNSW implementation. Because we do not see improvement with faiss's HNSW versus nmslib's, we have decided to incorporate other faiss methods before releasing. We will build off of that contribution in faiss-support branch. We are looking into adding functionality for inverted file systems, product quantization, as well as composite indices. Because these methods require training, the implementation is a little more complex. In the coming weeks, we will publish an RFC. In the meantime, please feel free to "+1" or mention a specific feature from faiss you would like to have supported.

alwc · 2021-03-21T18:33:03Z

+1

luyuncheng · 2021-03-22T06:30:19Z

+1, As ml-supervised-workflow shows. may be we can use some workflow in faiss training

jmazanec15 · 2021-04-02T17:02:49Z

@luyuncheng That is a Elastic commercial feature, so we cannot use that.

I am exploring a couple approaches to training. First, adding a training step in the SaveIndex jni function that takes a subset of the vectors that will be indexed and uses them for training. This approach has several flaws including

With training, segment creation can be very costly - producing long index times. From my experiments, training a Product Quantizer is pretty costly
For encoding based methods, the raw vectors still need to be stored with Lucene because I do not believe it is possible to merge the encodings of 2 faiss indices with separately trained encoders without losing a significant amount of information.

I am working on the mapping interface to support faiss's composite indices, so I implemented this approach to be able to create trained faiss indices to test the interface.

As a second approach, I am going to explore adding a "train" api. In this approach, a user would create an Elasticsearch faiss index, and then they would also create a separate Elasticsearch index containing the training data. When they call the "train" api, it would create a faiss library index based on the configuration of the Elasticsearch faiss index, and then train the faiss library index with data from the training index, and then serialize the faiss library index in an Elasticsearch system index.

Then, when a user starts to ingest data, during segment creation, instead of creating a new, untrained index from faiss's index factory, it would create a copy of the empty, trained index from the faiss library index stored in the Elasticsearch system index. This way, training would only incur a one time cost when the train api is called, and thus speed up segment creation.

Additionally, if all segments use the same trained models, it would be easier to perform segment merges without relying on storing the raw vectors in Lucene. But I have not explored this in detail yet.

I would appreciate any feedback on either of these approaches and any other different approaches that might be worth considering.

luyuncheng · 2021-04-06T06:09:49Z

all segments use the same trained models
without relying on storing the raw vectors in Lucene

LGTM, i am wondering the data to be trained stored in the same index or separate into 2 indices

jmazanec15 · 2021-04-06T18:09:07Z

@luyuncheng My thinking on having a separate index is that it will be easier to delete. I think in theory, you could use the same index with this approach. This train API will require an index and a field in order to gather the training data. The index could be the same as the one being trained, but would be a separate field.

vamshin added the Features New functionality added label Sep 16, 2020

This was referenced Dec 25, 2020

Adding support for FAISS JNI #285

Merged

Feature: Support for FAISS JNI #286

Closed

vamshin assigned jmazanec15 Mar 18, 2021

jmazanec15 mentioned this issue Apr 20, 2021

faiss interface refactoring to support multiple methods #344

Closed

jmazanec15 mentioned this issue May 24, 2021

Support Facebook's faiss library as approximate k-NN engine opensearch-project/k-NN#27

Closed

14 tasks

jmazanec15 mentioned this issue Jul 22, 2021

[RFC] Support Facebook's faiss library as another Approximate k-NN engine opensearch-project/k-NN#70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for FAISS #225

Adding support for FAISS #225

raman-r-4978 commented Sep 15, 2020 •

edited

vamshin commented Sep 16, 2020

Kavan72 commented Dec 15, 2020

YashalShakti commented Jan 14, 2021

hiro-v commented Jan 22, 2021

walker313504 commented Mar 18, 2021

greav commented Mar 18, 2021

jmazanec15 commented Mar 19, 2021

alwc commented Mar 21, 2021

luyuncheng commented Mar 22, 2021

jmazanec15 commented Apr 2, 2021

luyuncheng commented Apr 6, 2021

jmazanec15 commented Apr 6, 2021

Adding support for FAISS #225

Adding support for FAISS #225

Comments

raman-r-4978 commented Sep 15, 2020 • edited

vamshin commented Sep 16, 2020

Kavan72 commented Dec 15, 2020

YashalShakti commented Jan 14, 2021

hiro-v commented Jan 22, 2021

walker313504 commented Mar 18, 2021

greav commented Mar 18, 2021

jmazanec15 commented Mar 19, 2021

alwc commented Mar 21, 2021

luyuncheng commented Mar 22, 2021

jmazanec15 commented Apr 2, 2021

luyuncheng commented Apr 6, 2021

jmazanec15 commented Apr 6, 2021

raman-r-4978 commented Sep 15, 2020 •

edited