We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.10.0.dev2024050700
A100 * 4
@Tracin
examples
build success with following script on Mixtral 8x7b
set -ex export MODEL_DIR=/mnt/memory export MODEL_NAME=Mixtral-8x7B-Instruct-v0.1 export LD_LIBRARY_PATH=/usr/local/tensorrt/lib:$LD_LIBRARY_PATH export PATH=/usr/local/tensorrt/bin:$PATH export PRECISION=W4A16 export DTYPE=bfloat16 export PYTHONPATH=/app/tensorrt-llm:$PYTHONPATH export TP_SIZE=4 export CUDA_VISIBLE_DEVICES=0,1,2,3 python ../llama/convert_checkpoint.py \ --model_dir $MODEL_DIR/${MODEL_NAME} \ --output_dir $MODEL_DIR/tmp/trt_models/${MODEL_NAME}/$PRECISION/${TP_SIZE}-gpu-tp \ --dtype $DTYPE \ --use_weight_only \ --tp_size ${TP_SIZE} \ --weight_only_precision int4 trtllm-build \ --checkpoint_dir $MODEL_DIR/tmp/trt_models/${MODEL_NAME}/$PRECISION/${TP_SIZE}-gpu-tp \ --output_dir $MODEL_DIR/tmp/trt_engines/${MODEL_NAME}/$PRECISION/${TP_SIZE}-gpu-tp \ --gemm_plugin $DTYPE \ --gpt_attention_plugin $DTYPE \ --max_batch_size 1 \ --max_input_len 2048 \ --max_output_len 1024 \ --max_multimodal_len 576
run with command:
mpirun --allow-run-as-root -n 4 python3 /app/tensorrt-llm/examples/run.py --engine_dir /mnt/memory/tmp/trt_engines/Mixtral-8x7B-Instruct-v0.1/W4A16/4-gpu-tp --tokenizer_dir /mnt/memory/Mixtral-8x7B-Instruct-v0.1 --max_output_len 1024 --input_text "I love french quiche" --run_profiling
run success
get error
[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024050700 [TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024050700 [TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024050700 [TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024050700 [TensorRT-LLM][INFO] Engine version 0.10.0.dev2024050700 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found [TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set. [TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found [TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set. [TensorRT-LLM][INFO] MPI size: 4, rank: 3 [TensorRT-LLM][INFO] Engine version 0.10.0.dev2024050700 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found [TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set. [TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found [TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set. [TensorRT-LLM][INFO] MPI size: 4, rank: 1 [TensorRT-LLM][INFO] Engine version 0.10.0.dev2024050700 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][INFO] Engine version 0.10.0.dev2024050700 found in the config file, assuming engine(s) built by new builder API. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found [TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'cross_attention' not found [TensorRT-LLM][WARNING] Optional value for parameter cross_attention will not be set. [TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found [TensorRT-LLM][WARNING] Parameter layer_types cannot be read from json: [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'layer_types' not found [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.type_error.302] type must be string, but is null [TensorRT-LLM][WARNING] Optional value for parameter kv_cache_quant_algo will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found [TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'num_medusa_heads' not found [TensorRT-LLM][WARNING] Optional value for parameter num_medusa_heads will not be set. [TensorRT-LLM][WARNING] [json.exception.out_of_range.403] key 'max_draft_len' not found [TensorRT-LLM][WARNING] Optional value for parameter max_draft_len will not be set. [TensorRT-LLM][INFO] MPI size: 4, rank: 2 [TensorRT-LLM][INFO] MPI size: 4, rank: 0 [TensorRT-LLM][INFO] Loaded engine size: 5878 MiB [TensorRT-LLM][INFO] Loaded engine size: 5878 MiB [TensorRT-LLM][WARNING] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors. [TensorRT-LLM][INFO] Loaded engine size: 5878 MiB [TensorRT-LLM][INFO] Loaded engine size: 5878 MiB [TensorRT-LLM][INFO] Allocated 244.15 MiB for execution context memory. [TensorRT-LLM][INFO] Allocated 244.15 MiB for execution context memory. [TensorRT-LLM][INFO] Allocated 244.15 MiB for execution context memory. [TensorRT-LLM][INFO] Allocated 244.15 MiB for execution context memory. [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 5874 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 5874 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 5874 (MiB) [TensorRT-LLM][INFO] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 5874 (MiB) [TensorRT-LLM][INFO] Max KV cache pages per sequence: 9 [TensorRT-LLM][INFO] Max KV cache pages per sequence: 9 [TensorRT-LLM][INFO] Max KV cache pages per sequence: 9 [TensorRT-LLM][INFO] Max KV cache pages per sequence: 9 [TensorRT-LLM][INFO] Max tokens in paged KV cache: 930560. Allocating 30492590080 bytes. [TensorRT-LLM][INFO] Max tokens in paged KV cache: 930560. Allocating 30492590080 bytes. [TensorRT-LLM][INFO] Max tokens in paged KV cache: 930560. Allocating 30492590080 bytes. [TensorRT-LLM][INFO] Max tokens in paged KV cache: 930560. Allocating 30492590080 bytes. [TensorRT-LLM][WARNING] prompt_embedding_table: expected dim[1] = 4096, provided dim[1] = 1024 [TensorRT-LLM][WARNING] prompt_embedding_table: expected dim[1] = 4096, provided dim[1] = 1024 [TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) Traceback (most recent call last): File "/app/tensorrt-llm/examples/run.py", line 595, in <module> main(args) File "/app/tensorrt-llm/examples/run.py", line 426, in main outputs = runner.generate( File "/app/tensorrt-llm/tensorrt_llm/runtime/model_runner_cpp.py", line 368, in generate self.session.generate(generation_output, generation_input, RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: Tensor 'prompt_embedding_table' has invalid shape (1, 1024), expected (-1, 4096) (/app/tensorrt-llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:150) 1 0x7f5e3eccf65a tensorrt_llm::common::throwRuntimeError(char const*, int, std::string const&) + 102 2 0x7f5e407bf224 tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::string, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 1380 3 0x7f5e4076a1bb tensorrt_llm::runtime::GptSession::executeContextStep(std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<int, std::allocator<int> > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const*) + 891 4 0x7f5e4076b044 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&, std::shared_ptr<tensorrt_llm::runtime::GptSession::GenerationProfiler>) + 2148 5 0x7f5e4076ca35 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&, std::shared_ptr<tensorrt_llm::runtime::GptSession::GenerationProfiler>) + 2261 6 0x7f5eb4375d58 /app/tensorrt-llm/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xb5d58) [0x7f5eb4375d58] 7 0x7f5eb43927ea /app/tensorrt-llm/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xd27ea) [0x7f5eb43927ea] 8 0x53bd79 python3() [0x53bd79] 9 0x629d24 _PyObject_MakeTpCall + 356 10 0x549c2e python3() [0x549c2e] 11 0x5ae603 _PyEval_EvalFrameDefault + 19699 12 0x548efa python3() [0x548efa] 13 0x62893c PyObject_Call + 172 14 0x5ac51b _PyEval_EvalFrameDefault + 11275 15 0x628d60 _PyFunction_Vectorcall + 592 16 0x5a9c1b _PyEval_EvalFrameDefault + 779 17 0x5a8bf1 python3() [0x5a8bf1] 18 0x6d77cf PyEval_EvalCode + 127 19 0x6bb91b python3() [0x6bb91b] 20 0x6bb9a4 python3() [0x6bb9a4] 21 0x6bbde6 python3() [0x6bbde6] 22 0x6c0c84 _PyRun_SimpleFileObject + 404 23 0x6c0d57 _PyRun_AnyFileObject + 71 24 0x7042dd Py_RunMain + 877 25 0x7044bd Py_BytesMain + 45 26 0x7f606c86e083 __libc_start_main + 243 27 0x62ff4e _start + 46 [TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) Traceback (most recent call last): File "/app/tensorrt-llm/examples/run.py", line 595, in <module> [TensorRT-LLM][WARNING] prompt_embedding_table: expected dim[1] = 4096, provided dim[1] = 1024 main(args) File "/app/tensorrt-llm/examples/run.py", line 426, in main outputs = runner.generate( File "/app/tensorrt-llm/tensorrt_llm/runtime/model_runner_cpp.py", line 368, in generate self.session.generate(generation_output, generation_input, RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: Tensor 'prompt_embedding_table' has invalid shape (1, 1024), expected (-1, 4096) (/app/tensorrt-llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:150) 1 0x7fafe4b1f65a tensorrt_llm::common::throwRuntimeError(char const*, int, std::string const&) + 102 2 0x7fafe660f224 tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::string, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 1380 3 0x7fafe65ba1bb tensorrt_llm::runtime::GptSession::executeContextStep(std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<int, std::allocator<int> > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const*) + 891 4 0x7fafe65bb044 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&, std::shared_ptr<tensorrt_llm::runtime::GptSession::GenerationProfiler>) + 2148 5 0x7fafe65bca35 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&, std::shared_ptr<tensorrt_llm::runtime::GptSession::GenerationProfiler>) + 2261 6 0x7fb05a1c5d58 /app/tensorrt-llm/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xb5d58) [0x7fb05a1c5d58] 7 0x7fb05a1e27ea /app/tensorrt-llm/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xd27ea) [0x7fb05a1e27ea] 8 0x53bd79 python3() [0x53bd79] 9 0x629d24 _PyObject_MakeTpCall + 356 10 0x549c2e python3() [0x549c2e] 11 0x5ae603 _PyEval_EvalFrameDefault + 19699 12 0x548efa python3() [0x548efa] 13 0x62893c PyObject_Call + 172 14 0x5ac51b _PyEval_EvalFrameDefault + 11275 15 0x628d60 _PyFunction_Vectorcall + 592 16 0x5a9c1b _PyEval_EvalFrameDefault + 779 17 0x5a8bf1 python3() [0x5a8bf1] 18 0x6d77cf PyEval_EvalCode + 127 19 0x6bb91b python3() [0x6bb91b] 20 0x6bb9a4 python3() [0x6bb9a4] 21 0x6bbde6 python3() [0x6bbde6] 22 0x6c0c84 _PyRun_SimpleFileObject + 404 23 0x6c0d57 _PyRun_AnyFileObject + 71 24 0x7042dd Py_RunMain + 877 25 0x7044bd Py_BytesMain + 45 26 0x7fb2126be083 __libc_start_main + 243 27 0x62ff4e _start + 46 [TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) [TensorRT-LLM][WARNING] prompt_embedding_table: expected dim[1] = 4096, provided dim[1] = 1024 Traceback (most recent call last): File "/app/tensorrt-llm/examples/run.py", line 595, in <module> main(args) File "/app/tensorrt-llm/examples/run.py", line 426, in main outputs = runner.generate( File "/app/tensorrt-llm/tensorrt_llm/runtime/model_runner_cpp.py", line 368, in generate self.session.generate(generation_output, generation_input, RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: Tensor 'prompt_embedding_table' has invalid shape (1, 1024), expected (-1, 4096) (/app/tensorrt-llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:150) 1 0x7f2c2067d65a tensorrt_llm::common::throwRuntimeError(char const*, int, std::string const&) + 102 2 0x7f2c2216d224 tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::string, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 1380 3 0x7f2c221181bb tensorrt_llm::runtime::GptSession::executeContextStep(std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<int, std::allocator<int> > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const*) + 891 4 0x7f2c22119044 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&, std::shared_ptr<tensorrt_llm::runtime::GptSession::GenerationProfiler>) + 2148 5 0x7f2c2211aa35 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&, std::shared_ptr<tensorrt_llm::runtime::GptSession::GenerationProfiler>) + 2261 6 0x7f2c95d23d58 /app/tensorrt-llm/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xb5d58) [0x7f2c95d23d58] 7 0x7f2c95d407ea /app/tensorrt-llm/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xd27ea) [0x7f2c95d407ea] 8 0x53bd79 python3() [0x53bd79] 9 0x629d24 _PyObject_MakeTpCall + 356 10 0x549c2e python3() [0x549c2e] 11 0x5ae603 _PyEval_EvalFrameDefault + 19699 12 0x548efa python3() [0x548efa] 13 0x62893c PyObject_Call + 172 14 0x5ac51b _PyEval_EvalFrameDefault + 11275 15 0x628d60 _PyFunction_Vectorcall + 592 16 0x5a9c1b _PyEval_EvalFrameDefault + 779 17 0x5a8bf1 python3() [0x5a8bf1] 18 0x6d77cf PyEval_EvalCode + 127 19 0x6bb91b python3() [0x6bb91b] 20 0x6bb9a4 python3() [0x6bb9a4] 21 0x6bbde6 python3() [0x6bbde6] 22 0x6c0c84 _PyRun_SimpleFileObject + 404 23 0x6c0d57 _PyRun_AnyFileObject + 71 24 0x7042dd Py_RunMain + 877 25 0x7044bd Py_BytesMain + 45 26 0x7f2e4e21c083 __libc_start_main + 243 27 0x62ff4e _start + 46 [TensorRT-LLM][ERROR] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.) Traceback (most recent call last): File "/app/tensorrt-llm/examples/run.py", line 595, in <module> main(args) File "/app/tensorrt-llm/examples/run.py", line 426, in main outputs = runner.generate( File "/app/tensorrt-llm/tensorrt_llm/runtime/model_runner_cpp.py", line 368, in generate self.session.generate(generation_output, generation_input, RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: Tensor 'prompt_embedding_table' has invalid shape (1, 1024), expected (-1, 4096) (/app/tensorrt-llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:150) 1 0x7f422613d65a tensorrt_llm::common::throwRuntimeError(char const*, int, std::string const&) + 102 2 0x7f4227c2d224 tensorrt_llm::runtime::TllmRuntime::setInputTensors(int, std::unordered_map<std::string, std::shared_ptr<tensorrt_llm::runtime::ITensor>, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::shared_ptr<tensorrt_llm::runtime::ITensor> > > > const&) + 1380 3 0x7f4227bd81bb tensorrt_llm::runtime::GptSession::executeContextStep(std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<int, std::allocator<int> > const&, tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager const*) + 891 4 0x7f4227bd9044 tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::vector<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> const&, std::shared_ptr<tensorrt_llm::runtime::GptSession::GenerationProfiler>) + 2148 5 0x7f4227bdaa35 tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::GenerationOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&, std::shared_ptr<tensorrt_llm::runtime::GptSession::GenerationProfiler>) + 2261 6 0x7f429b7e3d58 /app/tensorrt-llm/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xb5d58) [0x7f429b7e3d58] 7 0x7f429b8007ea /app/tensorrt-llm/tensorrt_llm/bindings.cpython-310-x86_64-linux-gnu.so(+0xd27ea) [0x7f429b8007ea] 8 0x53bd79 python3() [0x53bd79] 9 0x629d24 _PyObject_MakeTpCall + 356 10 0x549c2e python3() [0x549c2e] 11 0x5ae603 _PyEval_EvalFrameDefault + 19699 12 0x548efa python3() [0x548efa] 13 0x62893c PyObject_Call + 172 14 0x5ac51b _PyEval_EvalFrameDefault + 11275 15 0x628d60 _PyFunction_Vectorcall + 592 16 0x5a9c1b _PyEval_EvalFrameDefault + 779 17 0x5a8bf1 python3() [0x5a8bf1] 18 0x6d77cf PyEval_EvalCode + 127 19 0x6bb91b python3() [0x6bb91b] 20 0x6bb9a4 python3() [0x6bb9a4] 21 0x6bbde6 python3() [0x6bbde6] 22 0x6c0c84 _PyRun_SimpleFileObject + 404 23 0x6c0d57 _PyRun_AnyFileObject + 71 24 0x7042dd Py_RunMain + 877 25 0x7044bd Py_BytesMain + 45 26 0x7f4453cdc083 __libc_start_main + 243 27 0x62ff4e _start + 46 -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[57329,1],1] Exit code: 1 --------------------------------------------------------------------------
if run with tp size 8, get similar error:
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: Tensor 'prompt_embedding_table' has invalid shape (1, 512), expected (-1, 4096)
The text was updated successfully, but these errors were encountered:
remove the --max_multimodal_len 576 option resolve the problem
Sorry, something went wrong.
No branches or pull requests
System Info
tensorrt 10.0.1
tensorrt-cu12 10.0.1
tensorrt-cu12-bindings 10.0.1
tensorrt-cu12-libs 10.0.1
tensorrt-llm 0.10.0.dev2024050700
A100 * 4
Who can help?
@Tracin
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
build success with following script on Mixtral 8x7b
run with command:
Expected behavior
run success
actual behavior
get error
additional notes
if run with tp size 8, get similar error:
The text was updated successfully, but these errors were encountered: