How to test the benchmark of Llama3 and Vicuna2 of TensorRT-LLM by benchmark.py #1597

Ourspolaire1 · 2024-05-14T02:21:20Z

I need to test the benchmark of different models, but it does not in the allowed_configs.py. How to do it? Thanks

kaiyux · 2024-05-15T09:26:47Z

Hi @Ourspolaire1 , the most suggested way currently is use trtllm-build command to build the models you want to benchmark, and use gptManagerBenchmark to benchmark it, please see the documents:

LLaMA 3 for example: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#llama-v3-updates
gptManagerBenchmark: https://github.com/NVIDIA/TensorRT-LLM/tree/main/benchmarks/cpp

Ourspolaire1 · 2024-05-17T09:49:42Z

Hi @Ourspolaire1 , the most suggested way currently is use trtllm-build command to build the models you want to benchmark, and use gptManagerBenchmark to benchmark it, please see the documents:

LLaMA 3 for example: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#llama-v3-updates

gptManagerBenchmark: https://github.com/NVIDIA/TensorRT-LLM/tree/main/benchmarks/cpp
Thanks @kaiyux for reply. I met a new error when running python benchmark.

[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024042300
[iZf8ziv3jfzkf1sys9b2ikZ:418418] *** Process received signal ***
[iZf8ziv3jfzkf1sys9b2ikZ:418418] Signal: Segmentation fault (11)
[iZf8ziv3jfzkf1sys9b2ikZ:418418] Signal code: Address not mapped (1)
[iZf8ziv3jfzkf1sys9b2ikZ:418418] Failing at address: 0x18
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3c050)[0x7f8c267bc050]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 1] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(_ZN12tensorrt_llm4thop14TorchAllocator6mallocEmb+0x88)[0x7f8b71883af8]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 2] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6common10IAllocator8reMallocIiEEPT_S4_mb+0xb4)[0x7f8a1ca6cab4]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 3] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerIfE14allocateBufferEv+0x38)[0x7f8a1ca6d868]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 4] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerIfE10initializeEv+0x1c6)[0x7f8a1ca724d6]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 5] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(ZN12tensorrt_llm6layers18DynamicDecodeLayerIfEC1ERKNS_7runtime12DecodingModeEiiiiP11CUstream_stSt10shared_ptrINS_6common10IAllocatorEEP14cudaDevicePropSt8optionalIiESG+0x225)[0x7f8a1ca72975]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 6] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15FtDynamicDecodeIfEC2Emmmmii+0x2f8)[0x7f8b71861d18]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 7] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15DynamicDecodeOp14createInstanceEv+0x10f)[0x7f8b71846f6f]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 8] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15DynamicDecodeOpC1EllllllN3c1010ScalarTypeE+0x84)[0x7f8b71847034]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 9] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(ZNSt17_Function_handlerIFvRSt6vectorIN3c106IValueESaIS2_EEEZN5torch6class_IN9torch_ext15DynamicDecodeOpEE12defineMethodIZNSB_3defIJllllllNS1_10ScalarTypeEEEERSB_NS7_6detail5typesIvJDpT_EEESsSt16initializer_listINS7_3argEEEUlNS1_14tagged_capsuleISA_EEllllllSE_E_EEPNS7_3jit8FunctionESsT_SsSN_EUlS5_E_E9_M_invokeERKSt9_Any_dataS5+0xf8)[0x7f8b71862588]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [10] /home/shared/trtllm/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0xa0f34e)[0x7f8c2440f34e]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [11] /home/shared/trtllm/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0xa0c8df)[0x7f8c2440c8df]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [12] /home/shared/trtllm/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0xa0e929)[0x7f8c2440e929]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [13] /home/shared/trtllm/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0x47de04)[0x7f8c23e7de04]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [14] python3(+0x1b2c86)[0x5639836adc86]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [15] python3(_PyObject_MakeTpCall+0x70)[0x56398361bb50]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [16] python3(+0xe1f19)[0x5639835dcf19]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [17] python3(+0x7511a)[0x56398357011a]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [18] python3(_PyObject_MakeTpCall+0x70)[0x56398361bb50]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [19] python3(_PyEval_EvalFrameDefault+0x53bf)[0x563983676daf]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [20] python3(+0x175c30)[0x563983670c30]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [21] python3(_PyObject_Call_Prepend+0x1ac)[0x56398361c8fc]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [22] python3(+0x153fd1)[0x56398364efd1]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [23] python3(+0x15140b)[0x56398364c40b]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [24] python3(_PyObject_MakeTpCall+0x1f7)[0x56398361bcd7]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [25] python3(_PyEval_EvalFrameDefault+0x562e)[0x56398367701e]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [26] python3(+0x175c30)[0x563983670c30]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [27] python3(_PyObject_Call_Prepend+0xd9)[0x56398361c829]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [28] python3(+0x153fd1)[0x56398364efd1]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [29] python3(+0x15140b)[0x56398364c40b]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] *** End of error message ***
Segmentation fault

I do not know how to solve it. Are there any ways? Thanks you

raoofnaushad · 2024-05-20T00:12:50Z

@Ourspolaire1

The way I did is almost what similar to @kaiyux mentioned above.

Follow the README.md to Download llama3 8B checkpoint
Convert checkpoint & Build TensorRT LLM Engine

Update these scripts based on your required batch_size and all the other things for example

python3 convert_checkpoint.py \
    --meta_ckpt_dir /wkdir/Meta-Llama-3-8B \
    --output_dir ./tllm_checkpoint_2gpu_tp2 \
    --dtype float16 \
    --tp_size 2

trtllm-build \
    --checkpoint_dir ./tllm_checkpoint_2gpu_tp2 \
    --output_dir ./tmp/llama/8B/trt_engines/fp16/2-gpu/ \
    --gemm_plugin float16 \
    --max_batch_size 384 \
    --max_input_len 512 \
    --max_output_len 512 \
    --tp_size 2 \
    --profiling_verbosity detailed

Finally use this Python Benchmarking Script to do the benchmarking. Tweek the commands little bit.

For example:

python3 benchmark.py \
    --engine_dir "/wkdir/TensorRT-LLM/examples/llama/tmp/llama/8B/trt_engines/fp16/1-gpu" \
    --mode plugin \
    --max_batch_size 384 \
    --max_input_len 128 \
    --max_output_len 128 \
    --batch_size 384 \
    --input_output_len "128,128" 



mpirun -n 2 \
python3 benchmark.py \
    --engine_dir "/wkdir/TensorRT-LLM/examples/llama/tmp/llama/8B/trt_engines/fp16/2-gpu" \
    --mode plugin \
    --max_batch_size 384 \
    --max_input_len 128 \
    --max_output_len 128 \
    --batch_size 384 \
    --input_output_len "128,128"

Ourspolaire1 · 2024-05-22T03:49:06Z

@raoofnaushad I met a new error. When I test the benchmark.py with batch_size=1 it works, but when I change the batch_size=2and other numbers the error occurs. Do you know how to solve it? Thank you very much!
"
python3 benchmark.py --engine_dir "/home/shared/TensorRT-LLM-main/examples/llama/tmp/llama/8B/trt_engines/fp16/1-gpu"
--mode plugin
--batch_size "2"
--input_output_len "2,2"
"

[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024051400
Allocated 124.01 MiB for execution context memory.
/home/shared/trtl/lib/python3.10/site-packages/torch/nested/init.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ../aten/src/ATen/NestedTensorImpl.cpp:177.)
return _nested.nested_tensor(
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2842] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2842, condition: allInputDimensionsSpecified(routine) )
Traceback (most recent call last):
File "/home/shared/TensorRT-LLM-main/benchmarks/python/benchmark.py", line 419, in main
benchmarker.run(inputs, config)
File "/home/shared/TensorRT-LLM-main/benchmarks/python/gpt_benchmark.py", line 240, in run
self.decoder.decode_batch(inputs[0],
File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 3210, in decode_batch
return self.decode(input_ids,
File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 930, in wrapper
ret = func(self, *args, **kwargs)
File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 3431, in decode
return self.decode_regular(
File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 3045, in decode_regular
should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, context_logits, generation_logits, encoder_input_lengths = self.handle_per_step(
File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 2704, in handle_per_step
raise RuntimeError(f"Executing TRT engine failed step={step}!")
RuntimeError: Executing TRT engine failed step=0!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/shared/TensorRT-LLM-main/benchmarks/python/benchmark.py", line 518, in
main(args)
File "/home/shared/TensorRT-LLM-main/benchmarks/python/benchmark.py", line 444, in main
e.with_traceback())
TypeError: BaseException.with_traceback() takes exactly one argument (0 given)
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024051400
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/usr/local/lib/python3.10/multiprocessing/synchronize.py", line 110, in setstate
self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory

byshiue assigned kaiyux May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to test the benchmark of Llama3 and Vicuna2 of TensorRT-LLM by benchmark.py #1597

How to test the benchmark of Llama3 and Vicuna2 of TensorRT-LLM by benchmark.py #1597

Ourspolaire1 commented May 14, 2024

kaiyux commented May 15, 2024

Ourspolaire1 commented May 17, 2024

raoofnaushad commented May 20, 2024

Ourspolaire1 commented May 22, 2024

How to test the benchmark of Llama3 and Vicuna2 of TensorRT-LLM by benchmark.py #1597

How to test the benchmark of Llama3 and Vicuna2 of TensorRT-LLM by benchmark.py #1597

Comments

Ourspolaire1 commented May 14, 2024

kaiyux commented May 15, 2024

Ourspolaire1 commented May 17, 2024

raoofnaushad commented May 20, 2024

Ourspolaire1 commented May 22, 2024