Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to test the benchmark of Llama3 and Vicuna2 of TensorRT-LLM by benchmark.py #1597

Open
Ourspolaire1 opened this issue May 14, 2024 · 4 comments
Assignees

Comments

@Ourspolaire1
Copy link

I need to test the benchmark of different models, but it does not in the allowed_configs.py. How to do it? Thanks

@kaiyux
Copy link
Member

kaiyux commented May 15, 2024

Hi @Ourspolaire1 , the most suggested way currently is use trtllm-build command to build the models you want to benchmark, and use gptManagerBenchmark to benchmark it, please see the documents:

@Ourspolaire1
Copy link
Author

Hi @Ourspolaire1 , the most suggested way currently is use trtllm-build command to build the models you want to benchmark, and use gptManagerBenchmark to benchmark it, please see the documents:

[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024042300
[iZf8ziv3jfzkf1sys9b2ikZ:418418] *** Process received signal ***
[iZf8ziv3jfzkf1sys9b2ikZ:418418] Signal: Segmentation fault (11)
[iZf8ziv3jfzkf1sys9b2ikZ:418418] Signal code: Address not mapped (1)
[iZf8ziv3jfzkf1sys9b2ikZ:418418] Failing at address: 0x18
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3c050)[0x7f8c267bc050]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 1] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(_ZN12tensorrt_llm4thop14TorchAllocator6mallocEmb+0x88)[0x7f8b71883af8]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 2] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6common10IAllocator8reMallocIiEEPT_S4_mb+0xb4)[0x7f8a1ca6cab4]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 3] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerIfE14allocateBufferEv+0x38)[0x7f8a1ca6d868]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 4] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6layers18DynamicDecodeLayerIfE10initializeEv+0x1c6)[0x7f8a1ca724d6]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 5] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libtensorrt_llm.so(ZN12tensorrt_llm6layers18DynamicDecodeLayerIfEC1ERKNS_7runtime12DecodingModeEiiiiP11CUstream_stSt10shared_ptrINS_6common10IAllocatorEEP14cudaDevicePropSt8optionalIiESG+0x225)[0x7f8a1ca72975]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 6] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15FtDynamicDecodeIfEC2Emmmmii+0x2f8)[0x7f8b71861d18]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 7] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15DynamicDecodeOp14createInstanceEv+0x10f)[0x7f8b71846f6f]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 8] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(_ZN9torch_ext15DynamicDecodeOpC1EllllllN3c1010ScalarTypeE+0x84)[0x7f8b71847034]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [ 9] /home/shared/trtllm/lib/python3.10/site-packages/tensorrt_llm/libs/libth_common.so(ZNSt17_Function_handlerIFvRSt6vectorIN3c106IValueESaIS2_EEEZN5torch6class_IN9torch_ext15DynamicDecodeOpEE12defineMethodIZNSB_3defIJllllllNS1_10ScalarTypeEEEERSB_NS7_6detail5typesIvJDpT_EEESsSt16initializer_listINS7_3argEEEUlNS1_14tagged_capsuleISA_EEllllllSE_E_EEPNS7_3jit8FunctionESsT_SsSN_EUlS5_E_E9_M_invokeERKSt9_Any_dataS5+0xf8)[0x7f8b71862588]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [10] /home/shared/trtllm/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0xa0f34e)[0x7f8c2440f34e]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [11] /home/shared/trtllm/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0xa0c8df)[0x7f8c2440c8df]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [12] /home/shared/trtllm/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0xa0e929)[0x7f8c2440e929]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [13] /home/shared/trtllm/lib/python3.10/site-packages/torch/lib/libtorch_python.so(+0x47de04)[0x7f8c23e7de04]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [14] python3(+0x1b2c86)[0x5639836adc86]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [15] python3(_PyObject_MakeTpCall+0x70)[0x56398361bb50]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [16] python3(+0xe1f19)[0x5639835dcf19]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [17] python3(+0x7511a)[0x56398357011a]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [18] python3(_PyObject_MakeTpCall+0x70)[0x56398361bb50]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [19] python3(_PyEval_EvalFrameDefault+0x53bf)[0x563983676daf]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [20] python3(+0x175c30)[0x563983670c30]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [21] python3(_PyObject_Call_Prepend+0x1ac)[0x56398361c8fc]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [22] python3(+0x153fd1)[0x56398364efd1]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [23] python3(+0x15140b)[0x56398364c40b]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [24] python3(_PyObject_MakeTpCall+0x1f7)[0x56398361bcd7]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [25] python3(_PyEval_EvalFrameDefault+0x562e)[0x56398367701e]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [26] python3(+0x175c30)[0x563983670c30]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [27] python3(_PyObject_Call_Prepend+0xd9)[0x56398361c829]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [28] python3(+0x153fd1)[0x56398364efd1]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] [29] python3(+0x15140b)[0x56398364c40b]
[iZf8ziv3jfzkf1sys9b2ikZ:418418] *** End of error message ***
Segmentation fault

I do not know how to solve it. Are there any ways? Thanks you

@raoofnaushad
Copy link

@Ourspolaire1

The way I did is almost what similar to @kaiyux mentioned above.

Update these scripts based on your required batch_size and all the other things for example

python3 convert_checkpoint.py \
    --meta_ckpt_dir /wkdir/Meta-Llama-3-8B \
    --output_dir ./tllm_checkpoint_2gpu_tp2 \
    --dtype float16 \
    --tp_size 2

trtllm-build \
    --checkpoint_dir ./tllm_checkpoint_2gpu_tp2 \
    --output_dir ./tmp/llama/8B/trt_engines/fp16/2-gpu/ \
    --gemm_plugin float16 \
    --max_batch_size 384 \
    --max_input_len 512 \
    --max_output_len 512 \
    --tp_size 2 \
    --profiling_verbosity detailed

For example:

python3 benchmark.py \
    --engine_dir "/wkdir/TensorRT-LLM/examples/llama/tmp/llama/8B/trt_engines/fp16/1-gpu" \
    --mode plugin \
    --max_batch_size 384 \
    --max_input_len 128 \
    --max_output_len 128 \
    --batch_size 384 \
    --input_output_len "128,128" 



mpirun -n 2 \
python3 benchmark.py \
    --engine_dir "/wkdir/TensorRT-LLM/examples/llama/tmp/llama/8B/trt_engines/fp16/2-gpu" \
    --mode plugin \
    --max_batch_size 384 \
    --max_input_len 128 \
    --max_output_len 128 \
    --batch_size 384 \
    --input_output_len "128,128" 

@Ourspolaire1
Copy link
Author

@raoofnaushad I met a new error. When I test the benchmark.py with batch_size=1 it works, but when I change the batch_size=2and other numbers the error occurs. Do you know how to solve it? Thank you very much!
"
python3 benchmark.py --engine_dir "/home/shared/TensorRT-LLM-main/examples/llama/tmp/llama/8B/trt_engines/fp16/1-gpu"
--mode plugin
--batch_size "2"
--input_output_len "2,2"
"

[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024051400
Allocated 124.01 MiB for execution context memory.
/home/shared/trtl/lib/python3.10/site-packages/torch/nested/init.py:166: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ../aten/src/ATen/NestedTensorImpl.cpp:177.)
return _nested.nested_tensor(
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
[05/22/2024-11:36:26] [TRT] [E] 3: [executionContext.cpp::resolveSlots::2842] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::resolveSlots::2842, condition: allInputDimensionsSpecified(routine) )
Traceback (most recent call last):
File "/home/shared/TensorRT-LLM-main/benchmarks/python/benchmark.py", line 419, in main
benchmarker.run(inputs, config)
File "/home/shared/TensorRT-LLM-main/benchmarks/python/gpt_benchmark.py", line 240, in run
self.decoder.decode_batch(inputs[0],
File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 3210, in decode_batch
return self.decode(input_ids,
File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 930, in wrapper
ret = func(self, *args, **kwargs)
File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 3431, in decode
return self.decode_regular(
File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 3045, in decode_regular
should_stop, next_step_tensors, tasks, context_lengths, host_context_lengths, attention_mask, context_logits, generation_logits, encoder_input_lengths = self.handle_per_step(
File "/home/shared/trtl/lib/python3.10/site-packages/tensorrt_llm/runtime/generation.py", line 2704, in handle_per_step
raise RuntimeError(f"Executing TRT engine failed step={step}!")
RuntimeError: Executing TRT engine failed step=0!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/shared/TensorRT-LLM-main/benchmarks/python/benchmark.py", line 518, in
main(args)
File "/home/shared/TensorRT-LLM-main/benchmarks/python/benchmark.py", line 444, in main
e.with_traceback())
TypeError: BaseException.with_traceback() takes exactly one argument (0 given)
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024051400
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/local/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/usr/local/lib/python3.10/multiprocessing/synchronize.py", line 110, in setstate
self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants