Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TllmXqaJit runtime error when build Yi-6B fp8 with TRTLLM-0.10.0.dev2024050700 #1586

Closed
2 of 4 tasks
kimbaol opened this issue May 13, 2024 · 4 comments
Closed
2 of 4 tasks
Labels
bug Something isn't working

Comments

@kimbaol
Copy link

kimbaol commented May 13, 2024

System Info

GPU:RTX4090
OS:docker(tensorrt-llm make to produce image)
TensorRT-LLM version: 0.10.0.dev2024050700
driver:535.171.04
CUDA Version: 12.4

Who can help?

@byshiue

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. python quantize.py --model_dir Yi-6B/ --qformat fp8 --kv_cache_dtype fp8 --output_dir test_ckpt
  2. trtllm-build --checkpoint_dir test_ckpt --output_dir test_engine --strongly_typed

Expected behavior

build success

actual behavior

[05/13/2024-07:00:20] [TRT] [W] [RemoveDeadLayers] Input Tensor position_ids is unused or used only at compile-time, but is not being removed.
[05/13/2024-07:00:20] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[05/13/2024-07:01:41] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called.
[05/13/2024-07:01:41] [TRT] [I] Detected 14 inputs and 1 output network tensors.
terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'
what(): [TensorRT-LLM][ERROR] TllmXqaJit runtime error in tllmXqaJitCreateAndCompileProgram(&program, &context): NVRTC Internal Error (/src/tensorrt_llm/cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/compileEngine.cpp:65)
1 0x7fcba6c4e5b4 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x6935b4) [0x7fcba6c4e5b4]
2 0x7fcba6d8ca59 tensorrt_llm::kernels::jit::CompileEngine::compile() const + 169
3 0x7fcba6d8e63b tensorrt_llm::kernels::jit::CubinObjRegistryTemplate<tensorrt_llm::kernels::XQAKernelFullHashKey, tensorrt_llm::kernels::XQAKernelFullHasher>::getCubin(tensorrt_llm::kernels::XQAKernelFullHashKey const&, tensorrt_llm::kernels::jit::CompileEngine*) + 267
4 0x7fcba6d8e077 tensorrt_llm::kernels::DecoderXQAImplJIT::prepare(tensorrt_llm::kernels::XQAParams const&) + 87
5 0x7fcb6aa94efb /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0xbdefb) [0x7fcb6aa94efb]
6 0x7fcb6aab140d /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0xda40d) [0x7fcb6aab140d]
7 0x7fcbc9cbbf38 /usr/local/tensorrt/lib/libnvinfer.so.10(+0xd87f38) [0x7fcbc9cbbf38]
8 0x7fcbc9cbc85c /usr/local/tensorrt/lib/libnvinfer.so.10(+0xd8885c) [0x7fcbc9cbc85c]
9 0x7fcbc9d35caf /usr/local/tensorrt/lib/libnvinfer.so.10(+0xe01caf) [0x7fcbc9d35caf]
10 0x7fcbc9d0e4e0 /usr/local/tensorrt/lib/libnvinfer.so.10(+0xdda4e0) [0x7fcbc9d0e4e0]
11 0x7fcbc9d1507c /usr/local/tensorrt/lib/libnvinfer.so.10(+0xde107c) [0x7fcbc9d1507c]
12 0x7fcbc9d17071 /usr/local/tensorrt/lib/libnvinfer.so.10(+0xde3071) [0x7fcbc9d17071]
13 0x7fcbc995c61c /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa2861c) [0x7fcbc995c61c]
14 0x7fcbc9961837 /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa2d837) [0x7fcbc9961837]
15 0x7fcbc99621af /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa2e1af) [0x7fcbc99621af]
16 0x7fcbd78a6478 /usr/local/lib/python3.10/dist-packages/tensorrt/tensorrt.so(+0xa6478) [0x7fcbd78a6478]
17 0x7fcbd78457a3 /usr/local/lib/python3.10/dist-packages/tensorrt/tensorrt.so(+0x457a3) [0x7fcbd78457a3]

additional notes

Yi-9B also encountered the same problem.

@kimbaol kimbaol added the bug Something isn't working label May 13, 2024
@haichuan1221
Copy link

same issue, did you solve it?

@kimbaol
Copy link
Author

kimbaol commented May 13, 2024

same issue, did you solve it?

I tired an older version 20240305, it's OK.

And I also read the recent commits, maybe this error is brought in with 20240507 commit, which changed XQA kernel compilation to JIT.

@haichuan1221
Copy link

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)

I checkout the version 20240430 and the problem is solved. Thank you for your help

@kimbaol
Copy link
Author

kimbaol commented May 15, 2024

This bug is fix in 0.11.0.dev2024051400

@kimbaol kimbaol closed this as completed May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants