Attention sparse embeddings #235

generall · 2024-05-09T16:48:17Z

Intro new sparse model based on attention weights. Consider it as an extension of bm25 for short documents.

joein · 2024-05-15T15:56:39Z

tests/test_attention_embeddings.py

+        "History is merely a list of surprises... It can only prepare us to be surprised yet again.",
+    ]))
+
+    for result in output:


I bet this test could be better :D

I am open to suggestions

I would check some types / shapes / values (e.g. that query values are [1, 1, 1, 1]), etc.

By the way, should not we initialize it as SparseTextEmbedding(model_name="Qdrant/bm42-all-minilm-l6-v2-attentions") ?
If we want to initialize it as SparseTextEmbedding, then we also need to overload methods like query_embed in it.

Other than this, the PR looks ok

If we want to initialize it as SparseTextEmbedding, then we also need to overload methods like query_embed in it.

yeah, this is right

…ests

joein reviewed May 15, 2024

View reviewed changes

generall marked this pull request as ready for review May 22, 2024 16:17

generall and others added 9 commits May 22, 2024 18:18

WIP: sparse embeddings using attention

e3eecac

support for stopwords

b9a9820

apply stopwords

8427276

proceed implementation of sparse attention embeddings (#234)

a4bb5a3

complete inference

81f5567

query embed + comment

65252ab

use simpler weights formula instead of sorting of words

223352e

update tests

6e5eafc

fix: fix bm42 usage, add query_embed to SparseTextEmbedding, update t…

ad80e59

…ests

generall force-pushed the attention-sparse-embeddings branch from ee7a780 to ad80e59 Compare May 22, 2024 16:18

joein self-requested a review May 23, 2024 10:35

joein approved these changes May 23, 2024

View reviewed changes

joein merged commit dfd25d4 into main May 24, 2024
17 checks passed

joein deleted the attention-sparse-embeddings branch May 24, 2024 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention sparse embeddings #235

Attention sparse embeddings #235

generall commented May 9, 2024 •

edited

joein May 15, 2024 •

edited

generall May 15, 2024

joein May 16, 2024

generall May 16, 2024

Attention sparse embeddings #235

Attention sparse embeddings #235

Conversation

generall commented May 9, 2024 • edited

joein May 15, 2024 • edited

Choose a reason for hiding this comment

generall May 15, 2024

Choose a reason for hiding this comment

joein May 16, 2024

Choose a reason for hiding this comment

generall May 16, 2024

Choose a reason for hiding this comment

generall commented May 9, 2024 •

edited

joein May 15, 2024 •

edited