🐛 [Bug] error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr #2827

geraldstanje · 2024-05-10T21:36:10Z

Bug Description

hi i see the following error - it looks like the torch.compile worked fine but when i invoke the prediction after that it errors out:

[INFO ] W-9001-model_1.0-stdout MODEL_LOG - [05/10/2024-[W] Unable to determine GPU memory usage
[INFO ] W-9001-model_1.0-stdout MODEL_LOG - [05/10/2024-[TRT] [W] Unable to determine GPU memory usage
[INFO ] W-9001-model_1.0-stdout MODEL_LOG - [05/10/2024-[TRT] [I] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 1104, GPU 0 (MiB)
[INFO ] W-9001-model_1.0-stdout MODEL_LOG - [05/10/2024-[TRT] [W] CUDA initialization failure with error: 35. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
predict_fn error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr

does pytorch-tensorrt work with a g4dn.xlarge? why i get this: CUDA initialization failure with error: 35?

full log:
tensorrt_torch_error.txt

To Reproduce

Steps to reproduce the behavior:

build container with tensorrt

# use sagemaker DLC
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.1.0-gpu-py310-cu118-ubuntu20.04-sagemaker

# Install additional dependencies
RUN python -m pip install torch torch-tensorrt tensorrt --extra-index-ur https://download.pytorch.org/whl/cu118

how was the model compiled?

model.model_body[0].auto_model = torch.compile(model.model_body[0].auto_model, backend="torch_tensorrt", dynamic=False,
                                options={"truncate_long_and_double": True,
                                         "precision": torch.half,
                                         "debug": True,
                                         "min_block_size": 1,
                                         "optimization_level": 4,
                                         "use_python_runtime": False})

to rule out that the issue is somewhere else - i tested with the following torch.compile - this works fine:

model.model_body[0].auto_model = torch.compile(model.model_body[0].auto_model, mode="reduce-overhead")

should i try some other settings for torch.compile(model.model_body[0].auto_model, backend="torch_tensorrt" ?

could the error be related to NVIDIA/TensorRT#308 ?

Expected behavior

no error

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Torch-TensorRT Version (e.g. 1.0.0):
PyTorch Version (e.g. 1.0): 2.1
CPU Architecture: g4dn.xlarge
OS (e.g., Linux):
How you installed PyTorch (conda, pip, libtorch, source):
Build command you used (if compiling from source):
Are you using local sources or building from archives:
Python version:
CUDA version:
GPU models and configuration:
Any other relevant information:

Additional context

The text was updated successfully, but these errors were encountered:

narendasan · 2024-05-14T00:02:00Z

Can you share something like the NVIDIA-SMI print out that can show us the driver version and status?

geraldstanje · 2024-05-14T19:04:02Z

@narendasan sure. in the meantime where can i check compatibility of cuda driver, pytorch version, pytorch/TensorRT version etc.?

narendasan · 2024-05-14T21:25:10Z

For PyTorch vs Torch-TensorRT compatibility, the versions are aligned, so PyTorch v2.2.0 <-> Torch-TensorRT v2.2.0 (prior to PyTorch 2.0, it would be something like PyTorch 1.13 <-> Torch-TensorRT 1.3.0). For driver compatibility this is based on CUDA https://docs.nvidia.com/deploy/cuda-compatibility/index.html. So if your PyTorch build targets CUDA 11.8 you need >= 450.80.02. If you are using a 12.1 PyTorch then you need to use >=525.60.13. NVIDIA-SMI can help you determine if your CUDA and CUDA-Driver are aligned.

geraldstanje · 2024-05-17T22:05:59Z

@narendasan looks like the cuda driver is: cuda-11-4-11.4.0-1 - which means i cannot use tensorrt without upgrading the cuda driver?

geraldstanje added the bug Something isn't working label May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 [Bug] error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr #2827

🐛 [Bug] error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr #2827

geraldstanje commented May 10, 2024 •

edited

narendasan commented May 14, 2024

geraldstanje commented May 14, 2024

narendasan commented May 14, 2024

geraldstanje commented May 17, 2024

🐛 [Bug] error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr #2827

🐛 [Bug] error: backend='torch_tensorrt' raised: TypeError: pybind11::init(): factory function returned nullptr #2827

Comments

geraldstanje commented May 10, 2024 • edited

Bug Description

To Reproduce

Expected behavior

Environment

Additional context

narendasan commented May 14, 2024

geraldstanje commented May 14, 2024

narendasan commented May 14, 2024

geraldstanje commented May 17, 2024

geraldstanje commented May 10, 2024 •

edited