Inference Qwen1.5-14B with 2x RTX4090D failed based main branch #1588

Fred-cell · 2024-05-13T10:17:58Z

inference with main branch and error log is as below:
]# mpirun -n 2 --allow-run-as-root /app/tensorrt_llm/benchmarks/cpp/gptSessionBenchmark --engine_dir ./examples/qwen/trtModel/fp16 --warm_up 2 --batch_size 1 --duration 0 --num_runs 3 --input_output_len 32,1 --log_level info

byshiue · 2024-05-15T03:24:22Z

Your environment does not support peer access and you need to disable the use_custom_all_reduce during building engine.

Fred-cell · 2024-05-19T04:41:35Z

thanks, it works. and what's mean about use_custom_all_reduce?

byshiue · 2024-05-23T07:34:41Z

It use a customized all reduce kernel instead of NCCL all reduce API.

byshiue self-assigned this May 15, 2024

byshiue added triaged Issue has been triaged by maintainers not a bug Some known limitation, but not a bug. labels May 15, 2024

byshiue closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference Qwen1.5-14B with 2x RTX4090D failed based main branch #1588

Inference Qwen1.5-14B with 2x RTX4090D failed based main branch #1588

Fred-cell commented May 13, 2024 •

edited

byshiue commented May 15, 2024

Fred-cell commented May 19, 2024

byshiue commented May 23, 2024

Inference Qwen1.5-14B with 2x RTX4090D failed based main branch #1588

Inference Qwen1.5-14B with 2x RTX4090D failed based main branch #1588

Comments

Fred-cell commented May 13, 2024 • edited

byshiue commented May 15, 2024

Fred-cell commented May 19, 2024

byshiue commented May 23, 2024

Fred-cell commented May 13, 2024 •

edited