-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen-14B-Chat inference repeat #1144
Comments
Repeat is kind of normal in LLM models.
|
I think I did not add duplicate questions caused by Qwen's Prompt Template format. By the way, how should Baichuan's Prompt Template be written?I tried BAICHUAN_PROMPT_FORMAT = "{prompt} ",but failed |
Can you help me to sovle this problem? |
@fengenbao Hi, Baichuan does not need to add extra prompt templates.
These are all input args that you can modify.
Please check this README.md. https://github.com/intel/neural-speed/tree/main
|
Thanks for your attention! I classify this question in detail in another issue #1148, please help check if the parameters are set correctly |
When i use python_api_example or streaming_llm python scripts to inference Qwen-14B-Chat,the first two questions were outputted normally, but the third question has been repeating itself since then. I find it strange and can stably reproduce this error. And it seems like something has been repeating the prompts all along.
my RAG prompt length=654
The text was updated successfully, but these errors were encountered: