Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any plan to support learned sparse vector search? #2388

Open
cyccbxhl opened this issue May 9, 2024 · 3 comments
Open

Any plan to support learned sparse vector search? #2388

cyccbxhl opened this issue May 9, 2024 · 3 comments

Comments

@cyccbxhl
Copy link

cyccbxhl commented May 9, 2024

Recently learned sparse representations have been developed to compute term weights using a neural model such as transformer based retriever and deliver strong relevance results, together with document expansion.
Sparse vector is like '((1, 0.2), (4, 0.3), (100, 0.4), (1000, 5.4), ...)', the first field of sparse point is index(1, 4, 100, 1000), its second is weight which is float type.
A simple implementation of sparse vector search is that treating every index as term and save weight in postings then similarity of two sparse vector is computed by dot product.
Because of k is usually big and the dimension of sparse vector is often high (avg. > 100), the performance of Block-Max Wand is even worse than exhaustive-or search(not use BMW).
2GTI looks like a good doc skip algorithm for sparse vector search.
I'm wandering it's any plan for tantivy to support sparse vector search?

@fulmicoton
Copy link
Collaborator

@cyccbxhl no plans at the moment. It is worth to leave the issue open. Several companies in that space are using tantivy. This ticket could spark a discussion.

@cyccbxhl
Copy link
Author

cyccbxhl commented May 9, 2024

By the way, BMP maybe better than 2GTI

@aecio
Copy link

aecio commented May 20, 2024

Yet another algorithm option that looks promising: https://arxiv.org/pdf/2404.18812

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants