Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question About top-p sampling #44

Open
xiaotingxuan opened this issue Apr 7, 2023 · 1 comment
Open

Question About top-p sampling #44

xiaotingxuan opened this issue Apr 7, 2023 · 1 comment

Comments

@xiaotingxuan
Copy link

Hello , thanks for sharing your code, it is really helpful.

I notice there is a hyperparameter top-p, the code is here. When we run decode, this hyperparameter is set -1, so we don't actually use "top-p sampling".

But I still wonder what it is for , did you use it in your experiment?if we use it,what is the appropriate value? Could you please provide me with further details or refer me to any relevant literature that would allow me to better understand it

Thank you in advance for your assistance

@summmeer
Copy link
Collaborator

Hi,
We dind't use top-p sampling in our experiment. During sampling, we compute the logits of each token, and you can do top-p sampling or beam search based on this. These sampling strategies can be easily borrowed from the generation of AR models. You're free to try it. However, honestly speaking, top-p or beam search may not work as much as you think. But it is still worth to try and investigate meticulously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants