Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃悰 [Bug] RuntimeError: [Error thrown at core/runtime/execute_engine.cpp:132] Expected inputs[i].is_cuda() to be true but got false Expected input tensors to have device cuda, found device cpu #2744

Open
airalcorn2 opened this issue Apr 11, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@airalcorn2
Copy link

airalcorn2 commented Apr 11, 2024

Bug Description

The code below produces the following error:

RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: [Error thrown at core/runtime/execute_engine.cpp:132] Expected inputs[i].is_cuda() to be true but got false
Expected input tensors to have device cuda, found device cpu

This same code works fine with Torch-TensorRT 1.4.0. When using the Dynamo backend, I get the following error:

Unsupported: dynamic shape operator: aten.masked_select.default

To Reproduce

import torch
import torch_tensorrt

from torch import nn

DEVICE = "cuda:0"


class Indexer(nn.Module):
    def __init__(self, side_cells):
        super().__init__()
        self.side_cells = side_cells

    def forward(self, pn_feats, pillar_pixels):
        (N, P, C) = pn_feats.shape
        pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats)
        batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels)
        rows = pillar_pixels[..., 0].flatten()
        cols = pillar_pixels[..., 1].flatten()
        mask = rows != -1
        batch_idxs = torch.masked_select(batch_idxs, mask)
        rows = torch.masked_select(rows, mask)
        cols = torch.masked_select(cols, mask)
        pn_feats = torch.masked_select(pn_feats.reshape(-1, C), mask[:, None])
        pn_feats = pn_feats.reshape(len(rows), C)
        pseudo_images[batch_idxs, rows, cols] = pn_feats
        return pseudo_images


def main():
    side_cells = 200
    pn_feats = torch.rand((1, 12000, 64)).to(DEVICE)
    pillar_pixels = torch.randint(0, side_cells, (1, 12000, 2)).to(DEVICE)
    pillar_pixels[0, 800:] = -1

    model = Indexer(side_cells).to(DEVICE)
    model.eval()
    with torch.no_grad():
        pt_preds = model(pn_feats, pillar_pixels)

    inputs = [
        torch_tensorrt.Input(pn_feats.shape),
        torch_tensorrt.Input(pillar_pixels.shape, dtype=torch.int32),
    ]
    enabled_precisions = {torch.half, torch.float32}
    trt_model = torch_tensorrt.compile(
        model,
        inputs=inputs,
        enabled_precisions=enabled_precisions,
        truncate_long_and_double=True,
        ir="torchscript",
    )
    trt_preds = trt_model(pn_feats, pillar_pixels.int())


if __name__ == "__main__":
    main()

Expected behavior

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): 2.0.0dev and 2.2.0
  • PyTorch Version (e.g. 1.0): 2.2
  • CPU Architecture: i7-12800H
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, libtorch, source): nvcr.io/nvidia/pytorch:24.01-py3
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version: 3.10.12
  • CUDA version: 12.2
  • GPU models and configuration: GeForce RTX 3080 Ti
  • Any other relevant information:

Additional context

@airalcorn2 airalcorn2 added the bug Something isn't working label Apr 11, 2024
@gs-olive
Copy link
Collaborator

Hi - thanks for the report - I am able to reproduce the issue. For a quick workaround, try one of the following replacements:

pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats)
batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels)


##### Replace the above with:


pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).cuda()
batch_idxs = torch.arange(N).repeat_interleave(P).cuda()

##### or

pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C, device=pn_feats.device)
batch_idxs = torch.arange(N, device=pillar_pixels.device).repeat_interleave(P)

##### or 

pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats.device)
batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels.device)

It seems that the .to with the tensor itself as input is not being interpreted correctly here.

With respect to the Dynamo path, I have added #2747 to add support for aten.masked_select.default. Does the model successfully compile with fallback (running aten.masked_select.default in Torch) when using ir="dynamo"?

@airalcorn2
Copy link
Author

airalcorn2 commented Apr 12, 2024

Thanks for the workarounds, @gs-olive! I went with:

pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(pn_feats.device)
batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels.device)

for consistency's sake and that worked for me.

For the Dynamo path, I tried enabling fallback with:

trt_model = torch_tensorrt.compile(
    model,
    inputs=inputs,
    enabled_precisions=enabled_precisions,
    truncate_long_and_double=True,
    torch_executed_ops=["aten::masked_select"],
)

but was getting the same error. However, I just noticed there's actually an earlier error raised before the second error:

DynamicOutputShapeException: aten.masked_select.default

The above exception was the direct cause of the following exception:

@airalcorn2
Copy link
Author

airalcorn2 commented Apr 12, 2024

That newly discovered error in the Dynamo path led me to this issue and this issue, which was fixed here according to the comments.

@airalcorn2
Copy link
Author

airalcorn2 commented Apr 12, 2024

Interestingly, this code, which just uses normal boolean indexing, seems to work with the Dynamo path:

import torch
import torch_tensorrt

from torch import nn

DEVICE = "cuda:0"


class Indexer(nn.Module):
    def __init__(self, side_cells):
        super().__init__()
        self.side_cells = side_cells

    def forward(self, pn_feats, pillar_pixels):
        (N, P, C) = pn_feats.shape
        pseudo_images = torch.zeros(N, self.side_cells, self.side_cells, C).to(
            pn_feats.device
        )
        batch_idxs = torch.arange(N).repeat_interleave(P).to(pillar_pixels.device)
        rows = pillar_pixels[..., 0].flatten()
        cols = pillar_pixels[..., 1].flatten()
        mask = rows != -1
        batch_idxs = batch_idxs[mask]
        rows = rows[mask]
        cols = cols[mask]
        pn_feats = pn_feats.reshape(-1, C)[mask]
        pseudo_images[batch_idxs, rows, cols] = pn_feats
        return pseudo_images


def main():
    side_cells = 200
    pn_feats = torch.rand((1, 12000, 64)).to(DEVICE)
    pillar_pixels = torch.randint(0, side_cells, (1, 12000, 2)).to(DEVICE)
    pillar_pixels[0, 800:] = -1

    model = Indexer(side_cells).to(DEVICE)
    model.eval()
    with torch.no_grad():
        pt_preds = model(pn_feats, pillar_pixels)

    inputs = [
        torch_tensorrt.Input(pn_feats.shape),
        torch_tensorrt.Input(pillar_pixels.shape, dtype=torch.int32),
    ]
    enabled_precisions = {torch.half, torch.float32}
    trt_model = torch_tensorrt.compile(
        model,
        inputs=inputs,
        enabled_precisions=enabled_precisions,
        truncate_long_and_double=True,
        min_block_size=1,
    )
    trt_preds = trt_model(pn_feats, pillar_pixels.int())
    print((pt_preds == trt_preds[0]).sum())


if __name__ == "__main__":
    main()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants