Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen-7B build failed on Windows with trtllm-0.9.0 #1571

Open
4 tasks
bigbigQI opened this issue May 10, 2024 · 2 comments
Open
4 tasks

Qwen-7B build failed on Windows with trtllm-0.9.0 #1571

bigbigQI opened this issue May 10, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@bigbigQI
Copy link

System Info

  • Platform: Windows
  • version: trtllm-0.9.0

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I have built the qwen-7b engine successfully with trtllm-0.7.1, but when i upgrade the trtllm to 0.9.0 version, i can build the qwen engine successfully, but there is a problem when infer with it.

when run this:

python ../run.py --input_text "你好,请问你叫什么?" --max_output_len=50 --tokenizer_dir ./tmp/Qwen/7B/ --engine_dir=./trt_engines/weight_only_int4/

Expected behavior

the error messages are follows:

no entry found for key
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "D:\apps\trtllm_0.9\TensorRT-LLM\examples\run.py", line 564, in <module>
    main(args)
  File "D:\apps\trtllm_0.9\TensorRT-LLM\examples\run.py", line 484, in main
    print_output(tokenizer,
  File "D:\apps\trtllm_0.9\TensorRT-LLM\examples\run.py", line 278, in print_output
    output_text = tokenizer.decode(outputs)
  File "D:\anaconda\envs\trtllm_env_new\lib\site-packages\transformers\tokenization_utils_base.py", line 3782, in decode
    return self._decode(
  File "C:\Users\nv\.cache\huggingface\modules\transformers_modules\Qwen-7B-Chat\tokenization_qwen.py", line 276, in _decode
    return self.tokenizer.decode(token_ids, errors=errors or self.errors)
  File "D:\anaconda\envs\trtllm_env_new\lib\site-packages\tiktoken\core.py", line 258, in decode
    return self._core_bpe.decode_bytes(tokens).decode("utf-8", errors=errors)
pyo3_runtime.PanicException: no entry found for key
[TensorRT-LLM][ERROR] class tensorrt_llm::common::TllmException: [TensorRT-LLM][ERROR] CUDA runtime error in ::cudaFreeHost(ptr): driver shutting down (C:\Users\tejaswinp\workspace\tekit\cpp\tensorrt_llm/runtime/tllmBuffers.h:169)

actual behavior

I found it may because the generated token by qwen engine is beyond qwen tokenizer scope.

additional notes

What i can do to solve this problem?

@bigbigQI bigbigQI added the bug Something isn't working label May 10, 2024
@zhangyu68
Copy link

应该是量化模型有问题,或者微调阶段就出问题了

@bigbigQI
Copy link
Author

应该是量化模型有问题,或者微调阶段就出问题了

这里的微调指的是sft吗,我是直接用huggingface上的model转的,并没有经过微调

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants