Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference Qwen1.5-14B with 2x RTX4090D failed based main branch #1588

Closed
Fred-cell opened this issue May 13, 2024 · 3 comments
Closed

Inference Qwen1.5-14B with 2x RTX4090D failed based main branch #1588

Fred-cell opened this issue May 13, 2024 · 3 comments
Assignees
Labels
not a bug Some known limitation, but not a bug. triaged Issue has been triaged by maintainers

Comments

@Fred-cell
Copy link

Fred-cell commented May 13, 2024

inference with main branch and error log is as below:
]# mpirun -n 2 --allow-run-as-root /app/tensorrt_llm/benchmarks/cpp/gptSessionBenchmark --engine_dir ./examples/qwen/trtModel/fp16 --warm_up 2 --batch_size 1 --duration 0 --num_runs 3 --input_output_len 32,1 --log_level info
image

@byshiue
Copy link
Collaborator

byshiue commented May 15, 2024

Your environment does not support peer access and you need to disable the use_custom_all_reduce during building engine.

@byshiue byshiue self-assigned this May 15, 2024
@byshiue byshiue added triaged Issue has been triaged by maintainers not a bug Some known limitation, but not a bug. labels May 15, 2024
@Fred-cell
Copy link
Author

thanks, it works. and what's mean about use_custom_all_reduce?

@byshiue
Copy link
Collaborator

byshiue commented May 23, 2024

It use a customized all reduce kernel instead of NCCL all reduce API.

@byshiue byshiue closed this as completed May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not a bug Some known limitation, but not a bug. triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants