Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Parallel sampling: The new token after prefill is duplicated for all generations #161

Open
masahi opened this issue Jan 16, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@masahi
Copy link
Member

masahi commented Jan 16, 2024

zxybazh#3 (comment)

@masahi masahi added the bug Something isn't working label Jan 16, 2024
@masahi masahi self-assigned this Jan 17, 2024
Lunderberg pushed a commit to Lunderberg/mlc-llm that referenced this issue Jan 30, 2024
Previously when we introducing the support of "local model path" and
"HuggingFace model path", we removed the support of specifying model
name when build. This turns out preventing us from only specifying the
much shorter model name, as well as doing automatic search for models
existing on the disk.

Therefore, this PR brings back the argparse support for `--model`. Now
we will carefully handle the case where both the model name and one of
the model path / HF path are specified. We now also support model
searching on disk, and support specifying the model only by a short
model name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant