feat: support aten.atan2.out converter #2829

chohk88 · 2024-05-11T08:27:21Z

Description

The aten.atan2.out operation calculates the element-wise arctangent of two tensors and stores the results in a specified output tensor, out. This does not alter the input tensors, meaning it isn't an inplace operation in the traditional sense.

However, I've encountered two issues:

First, when the shape of the input out tensor does not match the actual output shape, PyTorch issues a warning but does not produce an error, and still outputs the correct results. I've attached the results of running the atan2.out operation in PyTorch below:

Second, even though TensorRT displays an error message when above warning occurs, the test case still passes. I've attached the results of the executed test case below.

[W511 17:09:06.345646983 Resize.cpp:28] Warning: An output with one or more elements was resized since it had shape [5], which does not match the required output shape [10]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function _resize_output_check)
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] Init CUDA: CPU +1, GPU +0, now: CPU 145, GPU 1474 (MiB)
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] Init builder kernel library: CPU +1750, GPU +317, now: CPU 2031, GPU 1791 (MiB)
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.013850
WARNING:torch_tensorrt [TensorRT Conversion Context]:Unused Input: out
WARNING:torch_tensorrt [TensorRT Conversion Context]:[RemoveDeadLayers] Input Tensor out is unused or used only at compile-time, but is not being removed.
INFO:torch_tensorrt [TensorRT Conversion Context]:Global timing cache in use. Profiling results in this builder pass will be stored.
INFO:torch_tensorrt [TensorRT Conversion Context]:Detected 3 inputs and 1 output network tensors.
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Host Persistent Memory: 32
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Device Persistent Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Scratch Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Activation Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Weights Memory: 296
INFO:torch_tensorrt [TensorRT Conversion Context]:Engine generation completed in 0.0770512 seconds.
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 1 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3581 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 26 bytes of code generator cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 7750 bytes of compilation cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 0 timing cache entries
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:00.081009
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 13980 bytes of Memory
INFO:harness:Interpreter run time(s): 0.09525678399950266
INFO:torch_tensorrt [TensorRT Conversion Context]:Loaded engine size: 0 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[W511 17:09:08.253573766 Resize.cpp:28] Warning: An output with one or more elements was resized since it had shape [5], which does not match the required output shape [10]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (function _resize_output_check)
ERROR:torch_tensorrt [TensorRT Conversion Context]:3: [executionContext.cpp::setInputShape::2037] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::setInputShape::2037, condition: engineDims.d[i] == dims.d[i] Static dimension mismatch while setting input shape.)
WARNING:torch_tensorrt [TensorRT Conversion Context]:Using default stream in enqueueV3() may lead to performance issues due to additional calls to cudaStreamSynchronize() by TensorRT to ensure correct synchronization. Please use non-default stream instead.
INFO:harness:TRT run time(s)= 0.010920960426330567
.INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 286, GPU 1483 (MiB)
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] Init builder kernel library: CPU +1748, GPU +310, now: CPU 2034, GPU 1793 (MiB)
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.009873
WARNING:torch_tensorrt [TensorRT Conversion Context]:Unused Input: out
WARNING:torch_tensorrt [TensorRT Conversion Context]:[RemoveDeadLayers] Input Tensor out is unused or used only at compile-time, but is not being removed.
INFO:torch_tensorrt [TensorRT Conversion Context]:Global timing cache in use. Profiling results in this builder pass will be stored.
INFO:torch_tensorrt [TensorRT Conversion Context]:Detected 3 inputs and 1 output network tensors.
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Host Persistent Memory: 32
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Device Persistent Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Scratch Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Activation Memory: 0
INFO:torch_tensorrt [TensorRT Conversion Context]:Total Weights Memory: 296
INFO:torch_tensorrt [TensorRT Conversion Context]:Engine generation completed in 0.0711228 seconds.
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 1 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 3659 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 26 bytes of code generator cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 7750 bytes of compilation cache.
INFO:torch_tensorrt [TensorRT Conversion Context]:Serialized 0 timing cache entries
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:00.074702
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 13980 bytes of Memory
INFO:harness:Interpreter run time(s): 0.08486600499600172
INFO:torch_tensorrt [TensorRT Conversion Context]:Loaded engine size: 0 MiB
INFO:torch_tensorrt [TensorRT Conversion Context]:[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
WARNING:torch_tensorrt [TensorRT Conversion Context]:Using default stream in enqueueV3() may lead to performance issues due to additional calls to cudaStreamSynchronize() by TensorRT to ensure correct synchronization. Please use non-default stream instead.
INFO:harness:TRT run time(s)= 0.00047513601183891296
.
----------------------------------------------------------------------
Ran 2 tests in 3.744s

OK

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

zewenli98

I think it's fine to have this warning. We just keep the same behavior as pytorch. A minor comment below, the others LGTM

zewenli98 · 2024-05-22T18:10:03Z

tests/py/dynamo/conversion/test_atan2_out_aten.py

+from .harness import DispatchTestCase
+
+
+class TestAtan2OutConverter(DispatchTestCase):


Can you move the test class into the test_atan2_aten.py considering they are just variants of atan2?

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_atan2_aten.py	2024-05-24 08:10:32.451612+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_atan2_aten.py	2024-05-24 08:12:34.674439+00:00
@@ -125,10 +125,11 @@
        self.run_test(
            atan2(),
            inputs,
        )

+
class TestAtan2OutConverter(DispatchTestCase):
    @parameterized.expand(
        [
            ((10,), (5,), torch.float),
            ((10,), (10,), torch.float),
@@ -150,7 +151,8 @@
        self.run_test(
            atan2_out(),
            inputs,
        )

+
if __name__ == "__main__":
    run_tests()

feat: support aten.atan2.out converter

c2dc3bc

chohk88 added the component: converters Issues re: Specific op converters label May 11, 2024

chohk88 requested review from zewenli98, apbose and gs-olive May 11, 2024 08:27

chohk88 self-assigned this May 11, 2024

facebook-github-bot added the cla signed label May 11, 2024

zewenli98 reviewed May 22, 2024

View reviewed changes

chore: combine atan2.default and atan2.out test files

444eea7

github-actions bot requested changes May 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support aten.atan2.out converter #2829

feat: support aten.atan2.out converter #2829

chohk88 commented May 11, 2024 •

edited

zewenli98 left a comment

zewenli98 May 22, 2024

github-actions bot left a comment

		from .harness import DispatchTestCase


		class TestAtan2OutConverter(DispatchTestCase):

feat: support aten.atan2.out converter #2829

Are you sure you want to change the base?

feat: support aten.atan2.out converter #2829

Conversation

chohk88 commented May 11, 2024 • edited

Description

Type of change

Checklist:

zewenli98 left a comment

Choose a reason for hiding this comment

zewenli98 May 22, 2024

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

chohk88 commented May 11, 2024 •

edited