Qwen-14B-Chat inference repeat #1144

Storm0921 · 2024-01-16T02:50:18Z

When i use python_api_example or streaming_llm python scripts to inference Qwen-14B-Chat，the first two questions were outputted normally, but the third question has been repeating itself since then. I find it strange and can stably reproduce this error. And it seems like something has been repeating the prompts all along.

my RAG prompt length=654

a32543254 · 2024-01-16T07:10:33Z

Repeat is kind of normal in LLM models.
Here is some possible solution:

Try to use do_sample=True in generate api ?
change woq_config args : compute dtype from int8 to bf16
Increase reptition_penalty value
Increase top_k value

Storm0921 · 2024-01-16T10:02:35Z

Repeat is kind of normal in LLM models. Here is some possible solution:

Try to use do_sample=True in generate api ?

change woq_config args : compute dtype from int8 to bf16

Increase reptition_penalty value

Increase top_k value

I think I did not add duplicate questions caused by Qwen's Prompt Template format. By the way, how should Baichuan's Prompt Template be written?I tried BAICHUAN_PROMPT_FORMAT = "{prompt} "，but failed

Storm0921 · 2024-01-17T02:31:33Z

Repeat is kind of normal in LLM models. Here is some possible solution:

Try to use do_sample=True in generate api ?

change woq_config args : compute dtype from int8 to bf16

Increase reptition_penalty value

Increase top_k value

Can you help me to sovle this problem?

Zhenzhong1 · 2024-01-22T05:33:39Z

I think I did not add duplicate questions caused by Qwen's Prompt Template format. By the way, how should Baichuan's Prompt Template be written?I tried BAICHUAN_PROMPT_FORMAT = "{prompt} "，but failed

@fengenbao Hi, Baichuan does not need to add extra prompt templates.

Try to use do_sample=True in generate api ?
change woq_config args : compute dtype from int8 to bf16
Increase reptition_penalty value
Increase top_k valueCan you help me to sovle this problem?

These are all input args that you can modify.

do_sample = True is a args of the API. For example:
outputs = model.generate(inputs, streamer=streamer, max_new_tokens=30, do_sample=True)

Please check this README.md. https://github.com/intel/neural-speed/tree/main

woq_config please check this: https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md#llm-runtime-example-code.

reptition_penalty & top_k value please check https://github.com/intel/neural-speed/blob/main/docs/advanced_usage.md

Storm8878 · 2024-01-22T05:59:36Z

I think I did not add duplicate questions caused by Qwen's Prompt Template format. By the way, how should Baichuan's Prompt Template be written?I tried BAICHUAN_PROMPT_FORMAT = "{prompt} "，but failed

@fengenbao Hi, Baichuan does not need to add extra prompt templates.

Try to use do_sample=True in generate api ?
change woq_config args : compute dtype from int8 to bf16
Increase reptition_penalty value
Increase top_k valueCan you help me to sovle this problem?

These are all input args that you can modify.

do_sample = True is a args of the API. For example: outputs = model.generate(inputs, streamer=streamer, max_new_tokens=30, do_sample=True)

Please check this README.md. https://github.com/intel/neural-speed/tree/main

woq_config please check this: https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md#llm-runtime-example-code.

reptition_penalty & top_k value please check https://github.com/intel/neural-speed/blob/main/docs/advanced_usage.md

Thanks for your attention! I classify this question in detail in another issue #1148, please help check if the parameters are set correctly

airMeng assigned Zhenzhong1 Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen-14B-Chat inference repeat #1144

Qwen-14B-Chat inference repeat #1144

Storm0921 commented Jan 16, 2024

a32543254 commented Jan 16, 2024

Storm0921 commented Jan 16, 2024

Storm0921 commented Jan 17, 2024

Zhenzhong1 commented Jan 22, 2024 •

edited

Storm8878 commented Jan 22, 2024 •

edited

Qwen-14B-Chat inference repeat #1144

Qwen-14B-Chat inference repeat #1144

Comments

Storm0921 commented Jan 16, 2024

a32543254 commented Jan 16, 2024

Storm0921 commented Jan 16, 2024

Storm0921 commented Jan 17, 2024

Zhenzhong1 commented Jan 22, 2024 • edited

Storm8878 commented Jan 22, 2024 • edited

Zhenzhong1 commented Jan 22, 2024 •

edited

Storm8878 commented Jan 22, 2024 •

edited