Usage for semantic search #47

rut00 · 2024-02-21T07:47:18Z

Hello,
I want to create a semantic search functionality. The model is accurate in calculating the similarity between the word synonyms. Here are a few excerpts of the demo:
Supposedly, my dataset has the following lines:

User: Who is the author of "Romeo and Juliet"?
Model: "Romeo and Juliet" was written by William Shakespeare.

User: Describe the water cycle.
Model: The water cycle is the continuous movement of water on, above, and below the surface of the Earth, involving processes like evaporation, condensation, precipitation, and runoff.

User: Describe the process of DNA replication.
Model: DNA replication is the process of copying DNA to ensure genetic continuity during cell division.

And my prompts:

Prompt 1: rameo and juliet
Output: "Romeo and Juliet" was written by William Shakespeare.

Prompt 2: Tell me about water cycle
Output: The water cycle is the continuous movement of water on, above, and below the surface of the Earth, involving processes like evaporation, condensation, precipitation, and runoff.

Prompt 3: Deoxyribonucleic acid
Output: DNA replication is the process of copying DNA to ensure genetic continuity during cell division.

Prompt 4: what is python language
Output: Photosynthesis is the process by which green plants and some other organisms use sunlight to synthesize foods with the help of chlorophyll pigments.

The confidence value for each of the prompts ranges between 0.25 and 0.4. The issue I am facing is, that the model gives the same confidence value for wrong and right prompt outputs as seen in prompt 4 output. I want to show "No results found" if the given word is not in the dataset.

How do I solve this issue and make it more efficient? Thank you in advance.

The text was updated successfully, but these errors were encountered:

Muennighoff · 2024-02-21T07:50:22Z

You're using the Cross-Encoder, correct?

rut00 · 2024-02-21T07:54:36Z

No, I am using Asymmetric Semantic Search Bi-encoder.

Muennighoff · 2024-02-21T08:19:48Z

I see, so you're saying that the cosine similarity for what is python language and Photosynthesis is the process by which green plants and s... is as high as the other ones?

rut00 · 2024-02-21T09:48:46Z

Yes. The confidence levels are so similar that I cannot put a threshold level for differentiating them.

Muennighoff · 2024-02-21T09:51:18Z

Hm what model are you using? I'd recommend switching to a bigger / better one, specifically I'd recommend this one: https://huggingface.co/GritLM/GritLM-7B

rut00 · 2024-02-21T09:53:00Z

I am using this model: SGPT-125M-weightedmean-msmarco-specb-bitfit and I will try the recommended model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage for semantic search #47

Usage for semantic search #47

rut00 commented Feb 21, 2024 •

edited

Muennighoff commented Feb 21, 2024

rut00 commented Feb 21, 2024

Muennighoff commented Feb 21, 2024

rut00 commented Feb 21, 2024 •

edited

Muennighoff commented Feb 21, 2024

rut00 commented Feb 21, 2024 •

edited

Usage for semantic search #47

Usage for semantic search #47

Comments

rut00 commented Feb 21, 2024 • edited

Muennighoff commented Feb 21, 2024

rut00 commented Feb 21, 2024

Muennighoff commented Feb 21, 2024

rut00 commented Feb 21, 2024 • edited

Muennighoff commented Feb 21, 2024

rut00 commented Feb 21, 2024 • edited

rut00 commented Feb 21, 2024 •

edited

rut00 commented Feb 21, 2024 •

edited

rut00 commented Feb 21, 2024 •

edited