Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuggingFace script and Sentence Transformers script giving different results #44

Open
vladkvit opened this issue Oct 18, 2023 · 2 comments

Comments

@vladkvit
Copy link

vladkvit commented Oct 18, 2023

I copy-pasted the two scripts [0][1] into a notebook without any changes. They produce different embeddings and different results.
HG gives:
Cosine similarity between "I'm searching for a planet not too far from Earth." and "Neptune is the eight..." is: 0.622
Cosine similarity between "I'm searching for a planet not too far from Earth." and "TRAPPIST-1d, also de..." is: 0.490
Cosine similarity between "I'm searching for a planet not too far from Earth." and "A harsh desert world..." is: 0.433

Sentence Transformers gives:
Cosine similarity between "I'm searching for a planet not too far from Earth." and "Neptune is the eight..." is: 0.480
Cosine similarity between "I'm searching for a planet not too far from Earth." and "TRAPPIST-1d, also de..." is: 0.370
Cosine similarity between "I'm searching for a planet not too far from Earth." and "A harsh desert world..." is: 0.369

I checked the embeddings; both the doc and the query embeddings are different between the two scripts. I also tried running on GPU (by adding .cuda() in relevant places) - same results as above.

If it helps, I can dump the embedding vectors or the full code in the comments.

It would be nice to have the expected output in the README as well.

[0] https://github.com/Muennighoff/sgpt#asymmetric-semantic-search-be
[1] https://github.com/Muennighoff/sgpt#asymmetric-semantic-search-be-st

@Muennighoff
Copy link
Owner

@vladkvit
Copy link
Author

I figured it out. I was using the wrong sentence-transformers library.

I was using the latest official version:
pip install --upgrade git+https://github.com/UKPLab/sentence-transformers.git
while what worked was your fork:
pip install --upgrade git+https://github.com/Muennighoff/sentence-transformers.git@sgpt_poolings_specb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants