Any plan to support learned sparse vector search? #2388

cyccbxhl · 2024-05-09T04:15:54Z

Recently learned sparse representations have been developed to compute term weights using a neural model such as transformer based retriever and deliver strong relevance results, together with document expansion.
Sparse vector is like '((1, 0.2), (4, 0.3), (100, 0.4), (1000, 5.4), ...)', the first field of sparse point is index(1, 4, 100, 1000), its second is weight which is float type.
A simple implementation of sparse vector search is that treating every index as term and save weight in postings then similarity of two sparse vector is computed by dot product.
Because of k is usually big and the dimension of sparse vector is often high (avg. > 100), the performance of Block-Max Wand is even worse than exhaustive-or search(not use BMW).
2GTI looks like a good doc skip algorithm for sparse vector search.
I'm wandering it's any plan for tantivy to support sparse vector search?

fulmicoton · 2024-05-09T06:53:55Z

@cyccbxhl no plans at the moment. It is worth to leave the issue open. Several companies in that space are using tantivy. This ticket could spark a discussion.

cyccbxhl · 2024-05-09T08:08:00Z

By the way, BMP maybe better than 2GTI

aecio · 2024-05-20T21:48:23Z

Yet another algorithm option that looks promising: https://arxiv.org/pdf/2404.18812

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any plan to support learned sparse vector search? #2388

Any plan to support learned sparse vector search? #2388

cyccbxhl commented May 9, 2024

fulmicoton commented May 9, 2024

cyccbxhl commented May 9, 2024

aecio commented May 20, 2024

Any plan to support learned sparse vector search? #2388

Any plan to support learned sparse vector search? #2388

Comments

cyccbxhl commented May 9, 2024

fulmicoton commented May 9, 2024

cyccbxhl commented May 9, 2024

aecio commented May 20, 2024