Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training SGPT for Custom Dataset #6

Open
rajarajanvakil opened this issue Jun 29, 2022 · 1 comment
Open

Training SGPT for Custom Dataset #6

rajarajanvakil opened this issue Jun 29, 2022 · 1 comment

Comments

@rajarajanvakil
Copy link

Hi I read your paper that is cool, am trying to do this on my own dataset and my dataset is huge. Can you please tell me the exact ways to train from the scratch to achieve SGPT- both symmetric and asymmetric in both the encoder. But cross encoder would be our interest.
I Have one doubt are you using bert to produce cross and BI encoder embedding. In my understanding you are using BERT as initial pipeline before fetching it to GPT to produce the cosine similarity and log probabilities please help

@Muennighoff
Copy link
Owner

Hey!

  1. No BERT model is used
  2. For the SGPT Cross-Encoder no training is necessary. Just use the script here. For symmetric search just change the prompt 😇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants