Add propagate_real_tensors mode for unbacked #125115

ezyang · 2024-04-28T03:58:35Z

Stack from ghstack (oldest at bottom):

-> Add propagate_real_tensors mode for unbacked #125115

A common complaint when working with data-dependent code in PyTorch is that it's hard to tell how far you are from the finish line: every time a GuardOnDataDependentSymNode error is hit, you have to somehow fix or workaround it to see the next one.

This PR adds a new mode torch._functorch.config.fake_tensor_propagate_real_tensors which modifies fake tensors to also propagate real tensors. This means that when we try to guard on a data-dependent SymNode, we can actually produce a real result. We also produce a warning which you should consult to figure out what the crux points are.

I ran this on vision_maskrcnn. In the baseline (without this mode), the model has 27 graph breaks, resulting in 40 graphs. With this mode on, the model has only 11 graph breaks, resulting in 15 graphs (the remaining graph breaks are due to missing functionality for item() on float tensor and some other Dynamo missing features.) You get a list of things that would have errored like this:

WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> True                                                
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True                                             
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True                                             
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> False                                            
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> True                                                
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True                                             
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True                                             
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u0), 1)) -> False                                            
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> True                                                
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True                                             
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True                                             
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u0), 1)) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> True
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> False
WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> False

Potential later follow ups:

Improve the warning messages (in particular, should provide user frames)
GC real tensors when they are no longer needed by tracing. Right now, this will use A LOT of memory, equal to as if your GC was broken and every intermediate tensor was kept live

Signed-off-by: Edward Z. Yang ezyang@meta.com

cc @msaroufim @bdhirsh @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng

[ghstack-poisoned]

pytorch-bot · 2024-04-28T03:58:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125115

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 8bb7f84 with merge base c1a3fcf ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 5, 5, linux.4xlarge.nvidia.gpu) (gh) (similar failure)
test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_div_cuda_bool

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 5ac57facea69da09303b9708e0ed85ef567b5df5 Pull Request resolved: #125115

[ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 8bb80b5522e80b619bdf436bd3087e8cc8514203 Pull Request resolved: #125115

[ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 681ec87050dcac63dcdd001347f6bf2d9a55965e Pull Request resolved: #125115

[ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: c3a8e2925ff4f6f3b72de884b5dc472d342e1b46 Pull Request resolved: #125115

[ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 5309d4133d565d2a0dde661c195ba52373d1e9d2 Pull Request resolved: #125115

[ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 8e5355ff3d0f8f2807de3158fab14318ab8435db Pull Request resolved: #125115

[ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 1f97e037a55711b1db7ce2f805d6b24be3ef8f5c Pull Request resolved: #125115

[ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 2eeb18f9e64aaa13b1178678907d1536290a10dc Pull Request resolved: #125115

[ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 286bdada4b86fe1273015a0e16ab4c630e2a07fd Pull Request resolved: #125115

[ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 614a79b06a398a71666725dc3101b4331f081cab Pull Request resolved: #125115

[ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 91e4a445db2aeb871b014b541aac6f322691370c Pull Request resolved: #125115

[ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: d90e89f0eabad52bb912c2a511b0e854e7809b9c Pull Request resolved: #125115

IvanKobzarev · 2024-05-02T14:45:04Z

torch/_subclasses/fake_tensor.py

+            def go(t, real_t):
+                if isinstance(t, FakeTensor):
+                    # NB: unconditionally overwrite
+                    t.real_tensor = real_t


In constructor there is assert that real_tensor is not a FakeTensor.
So here it is breaking this invariant?

No, real_t is a real tensor :)

IvanKobzarev · 2024-05-02T15:18:12Z

I think will be also good to have a separate dynamo.config.use_propagate_real_tensors for data dep solving.
To separate propagation and data dep solving to be able to use real_tensor_propagation for other functionality independently.

ezyang · 2024-05-02T15:26:07Z

What I wrote in chat:

The current design for real tensor prop doesn't really work for anything else.

I know you were thinking of real tensor prop as a solution for other problems you have, but I don't really understand how these can work

ezyang · 2024-05-02T15:26:16Z

@pytorchbot merge

pytorchmergebot · 2024-05-02T15:28:05Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

A common complaint when working with data-dependent code in PyTorch is that it's hard to tell how far you are from the finish line: every time a GuardOnDataDependentSymNode error is hit, you have to somehow fix or workaround it to see the next one. This PR adds a new mode `torch._functorch.config.fake_tensor_propagate_real_tensors` which modifies fake tensors to also propagate real tensors. This means that when we try to guard on a data-dependent SymNode, we can actually produce a real result. We also produce a warning which you should consult to figure out what the crux points are. I ran this on vision_maskrcnn. In the baseline (without this mode), the model has 27 graph breaks, resulting in 40 graphs. With this mode on, the model has only 11 graph breaks, resulting in 15 graphs (the remaining graph breaks are due to missing functionality for item() on float tensor and some other Dynamo missing features.) You get a list of things that would have errored like this: ``` WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u0), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u0), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> False ``` Potential later follow ups: * Improve the warning messages (in particular, should provide user frames) * GC real tensors when they are no longer needed by tracing. Right now, this will use A LOT of memory, equal to as if your GC was broken and every intermediate tensor was kept live Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: #125115 Approved by: https://github.com/IvanKobzarev

Summary: A common complaint when working with data-dependent code in PyTorch is that it's hard to tell how far you are from the finish line: every time a GuardOnDataDependentSymNode error is hit, you have to somehow fix or workaround it to see the next one. This PR adds a new mode `torch._functorch.config.fake_tensor_propagate_real_tensors` which modifies fake tensors to also propagate real tensors. This means that when we try to guard on a data-dependent SymNode, we can actually produce a real result. We also produce a warning which you should consult to figure out what the crux points are. I ran this on vision_maskrcnn. In the baseline (without this mode), the model has 27 graph breaks, resulting in 40 graphs. With this mode on, the model has only 11 graph breaks, resulting in 15 graphs (the remaining graph breaks are due to missing functionality for item() on float tensor and some other Dynamo missing features.) You get a list of things that would have errored like this: ``` WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u0), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u0), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u1) < 2) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u1), 1)) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Ne(Max(1, u1), 1)) -> True WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Max(1, u0) < 2) -> False WARNING:torch.fx.experimental.symbolic_shapes:propagate_real_tensors evaluate_expr(Eq(Max(1, u0), 1)) -> False ``` Potential later follow ups: * Improve the warning messages (in particular, should provide user frames) * GC real tensors when they are no longer needed by tracing. Right now, this will use A LOT of memory, equal to as if your GC was broken and every intermediate tensor was kept live Signed-off-by: Edward Z. Yang <ezyang@meta.com> X-link: pytorch/pytorch#125115 Approved by: https://github.com/IvanKobzarev Reviewed By: kit1980 Differential Revision: D56915030 Pulled By: ezyang fbshipit-source-id: f04687107bbd3e35e7bdba45998f75be6388debf

Update

bb886a1

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor release notes: fx release notes category ci-td-distributed labels Apr 28, 2024

ezyang added a commit that referenced this pull request Apr 28, 2024

Add propagate_real_tensors mode for unbacked

8dc892e

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 5ac57facea69da09303b9708e0ed85ef567b5df5 Pull Request resolved: #125115

github-actions bot requested review from albanD, antoniojkim, bdhirsh, miladm and SherlockNoMad April 28, 2024 03:58

ezyang requested review from suo, avikchaudhuri, IvanKobzarev and eellison April 28, 2024 04:00

Update

7320c61

[ghstack-poisoned]

ezyang added a commit that referenced this pull request Apr 29, 2024

Add propagate_real_tensors mode for unbacked

2337dfd

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 8bb80b5522e80b619bdf436bd3087e8cc8514203 Pull Request resolved: #125115

Update

35fe305

[ghstack-poisoned]

ezyang added a commit that referenced this pull request Apr 29, 2024

Add propagate_real_tensors mode for unbacked

345f5a8

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 681ec87050dcac63dcdd001347f6bf2d9a55965e Pull Request resolved: #125115

Update

4cdaa37

[ghstack-poisoned]

ezyang added a commit that referenced this pull request Apr 29, 2024

Add propagate_real_tensors mode for unbacked

690ea75

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: c3a8e2925ff4f6f3b72de884b5dc472d342e1b46 Pull Request resolved: #125115

Update

510b18d

[ghstack-poisoned]

ezyang added a commit that referenced this pull request Apr 29, 2024

Add propagate_real_tensors mode for unbacked

be86c38

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 5309d4133d565d2a0dde661c195ba52373d1e9d2 Pull Request resolved: #125115

albanD removed their request for review April 29, 2024 15:33

Update

dd025c1

[ghstack-poisoned]

pytorch-bot bot added the module: dynamo label Apr 29, 2024

ezyang added a commit that referenced this pull request Apr 29, 2024

Add propagate_real_tensors mode for unbacked

3c6821c

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 8e5355ff3d0f8f2807de3158fab14318ab8435db Pull Request resolved: #125115

pytorch-bot bot added the oncall: pt2 label Apr 29, 2024

Update

6d8ab49

[ghstack-poisoned]

ezyang added a commit that referenced this pull request Apr 29, 2024

Add propagate_real_tensors mode for unbacked

e08e813

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 1f97e037a55711b1db7ce2f805d6b24be3ef8f5c Pull Request resolved: #125115

Update

c30d478

[ghstack-poisoned]

ezyang added a commit that referenced this pull request Apr 29, 2024

Add propagate_real_tensors mode for unbacked

abe66fe

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 2eeb18f9e64aaa13b1178678907d1536290a10dc Pull Request resolved: #125115

Update

8d4d7f6

[ghstack-poisoned]

ezyang added a commit that referenced this pull request Apr 30, 2024

Add propagate_real_tensors mode for unbacked

9769ac4

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 286bdada4b86fe1273015a0e16ab4c630e2a07fd Pull Request resolved: #125115

Update

f34325a

[ghstack-poisoned]

ezyang added a commit that referenced this pull request Apr 30, 2024

Add propagate_real_tensors mode for unbacked

1dddb54

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 614a79b06a398a71666725dc3101b4331f081cab Pull Request resolved: #125115

ezyang mentioned this pull request Apr 30, 2024

torch.Library can easily cause segfault on loading/unloading #125234

Open

Update

9d2b0a6

[ghstack-poisoned]

ezyang added a commit that referenced this pull request Apr 30, 2024

Add propagate_real_tensors mode for unbacked

6474736

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 91e4a445db2aeb871b014b541aac6f322691370c Pull Request resolved: #125115

Update

8bb7f84

[ghstack-poisoned]

ezyang added a commit that referenced this pull request May 1, 2024

Add propagate_real_tensors mode for unbacked

12eb77b

Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: d90e89f0eabad52bb912c2a511b0e854e7809b9c Pull Request resolved: #125115

ezyang added the ciflow/trunk Trigger trunk jobs on your pull request label May 1, 2024

IvanKobzarev reviewed May 2, 2024

View reviewed changes

IvanKobzarev approved these changes May 2, 2024

View reviewed changes

ezyang added topic: new features topic category release notes: dynamo labels May 2, 2024

pytorchmergebot added the merging label May 2, 2024

pytorchmergebot added the Merged label May 2, 2024

pytorchmergebot closed this in e93b57a May 2, 2024

pytorchmergebot removed the merging label May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add propagate_real_tensors mode for unbacked #125115

Add propagate_real_tensors mode for unbacked #125115

ezyang commented Apr 28, 2024 •

edited

pytorch-bot bot commented Apr 28, 2024 •

edited

IvanKobzarev May 2, 2024

ezyang May 2, 2024

IvanKobzarev commented May 2, 2024

ezyang commented May 2, 2024

ezyang commented May 2, 2024

pytorchmergebot commented May 2, 2024

Add propagate_real_tensors mode for unbacked #125115

Add propagate_real_tensors mode for unbacked #125115

Conversation

ezyang commented Apr 28, 2024 • edited

pytorch-bot bot commented Apr 28, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125115

✅ You can merge normally! (1 Unrelated Failure)

IvanKobzarev May 2, 2024

Choose a reason for hiding this comment

ezyang May 2, 2024

Choose a reason for hiding this comment

IvanKobzarev commented May 2, 2024

ezyang commented May 2, 2024

ezyang commented May 2, 2024

pytorchmergebot commented May 2, 2024

Merge started

ezyang commented Apr 28, 2024 •

edited

pytorch-bot bot commented Apr 28, 2024 •

edited