Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support aten.atan2.out converter #2829

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

chohk88
Copy link
Collaborator

@chohk88 chohk88 commented May 11, 2024

Description

The aten.atan2.out operation calculates the element-wise arctangent of two tensors and stores the results in a specified output tensor, out. This does not alter the input tensors, meaning it isn't an inplace operation in the traditional sense.

However, I've encountered two issues:

First, when the shape of the input out tensor does not match the actual output shape, PyTorch issues a warning but does not produce an error, and still outputs the correct results. I've attached the results of running the atan2.out operation in PyTorch below:

image

Second, even though TensorRT displays an error message when above warning occurs, the test case still passes. I've attached the results of the executed test case below.

[W511 17:09:06.345646983 Resize.cpp:28] Warning: An output with one or more elements was resized since it had shape [5], which does not match the required output shape [10]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function _resize_output_check)
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 145, GPU 1474 (MiB)
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] Init builder kernel library: CPU +1750, GPU +317, now: CPU 2031, GPU 1791 (MiB)
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.013850
WARNING:torch_tensorrt [TensorRT Conversion Context]:Unused Input: out
WARNING:torch_tensorrt [TensorRT Conversion Context]:[RemoveDeadLayers] Input Tensor out is unused or used only at compile-time, but is not being removed.
INFO:torch_tensorrt [TensorRT Conversion Context]:Global timing cache in use. Profiling results in this builder pass will be stored.
INFO:torch_tensorrt [TensorRT Conversion Context]:Detected 3 inputs and 1 output network tensors.
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Host Persistent Memory: 32
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Device Persistent Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Scratch Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Activation Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Weights Memory: 296
INFO:torch_tensorrt [TensorRT Conversion Context]:Engine generation completed in 0.0770512 seconds.
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 1 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3581 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 26 bytes of code generator cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 7750 bytes of compilation cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 0 timing cache entries
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:00.081009
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 13980 bytes of Memory
INFO:harness:Interpreter run time(s): 0.09525678399950266
INFO:torch_tensorrt [TensorRT Conversion Context]:Loaded engine size: 0 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[W511 17:09:08.253573766 Resize.cpp:28] Warning: An output with one or more elements was resized since it had shape [5], which does not match the required output shape [10]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function _resize_output_check)
ERROR:torch_tensorrt [TensorRT Conversion Context]:3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
WARNING:torch_tensorrt [TensorRT Conversion Context]:Using default stream in enqueueV3() may lead to performance issues due to additional calls to cudaStreamSynchronize() by TensorRT to ensure correct synchronization. Please use non-default stream instead.
INFO:harness:TRT run time(s)= 0.010920960426330567
.INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 286, GPU 1483 (MiB)
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] Init builder kernel library: CPU +1748, GPU +310, now: CPU 2034, GPU 1793 (MiB)
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.009873
WARNING:torch_tensorrt [TensorRT Conversion Context]:Unused Input: out
WARNING:torch_tensorrt [TensorRT Conversion Context]:[RemoveDeadLayers] Input Tensor out is unused or used only at compile-time, but is not being removed.
INFO:torch_tensorrt [TensorRT Conversion Context]:Global timing cache in use. Profiling results in this builder pass will be stored.
INFO:torch_tensorrt [TensorRT Conversion Context]:Detected 3 inputs and 1 output network tensors.
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Host Persistent Memory: 32
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Device Persistent Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Scratch Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Activation Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Weights Memory: 296
INFO:torch_tensorrt [TensorRT Conversion Context]:Engine generation completed in 0.0711228 seconds.
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 1 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3659 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 26 bytes of code generator cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 7750 bytes of compilation cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 0 timing cache entries
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:00.074702
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 13980 bytes of Memory
INFO:harness:Interpreter run time(s): 0.08486600499600172
INFO:torch_tensorrt [TensorRT Conversion Context]:Loaded engine size: 0 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
WARNING:torch_tensorrt [TensorRT Conversion Context]:Using default stream in enqueueV3() may lead to performance issues due to additional calls to cudaStreamSynchronize() by TensorRT to ensure correct synchronization. Please use non-default stream instead.
INFO:harness:TRT run time(s)= 0.00047513601183891296
.
----------------------------------------------------------------------
Ran 2 tests in 3.744s

OK

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

@chohk88 chohk88 added the component: converters Issues re: Specific op converters label May 11, 2024
@chohk88 chohk88 self-assigned this May 11, 2024
@github-actions github-actions bot added component: tests Issues re: Tests component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths and removed component: converters Issues re: Specific op converters labels May 11, 2024
Copy link
Collaborator

@zewenli98 zewenli98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to have this warning. We just keep the same behavior as pytorch. A minor comment below, the others LGTM

from .harness import DispatchTestCase


class TestAtan2OutConverter(DispatchTestCase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the test class into the test_atan2_aten.py considering they are just variants of atan2?

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_atan2_aten.py	2024-05-24 08:10:32.451612+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_atan2_aten.py	2024-05-24 08:12:34.674439+00:00
@@ -125,10 +125,11 @@
        self.run_test(
            atan2(),
            inputs,
        )

+
class TestAtan2OutConverter(DispatchTestCase):
    @parameterized.expand(
        [
            ((10,), (5,), torch.float),
            ((10,), (10,), torch.float),
@@ -150,7 +151,8 @@
        self.run_test(
            atan2_out(),
            inputs,
        )

+
if __name__ == "__main__":
    run_tests()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed component: api [Python] Issues re: Python API component: conversion Issues re: Conversion stage component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: tests Issues re: Tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants