-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Dynamo] make bytecode of resume function resemble natural bytecode #126630
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126630
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (3 Unrelated Failures)As of commit b7ad5f3 with merge base a8195f2 (): FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
When we deal with natural bytecode (Python bytecode that is generated by Python compiler, from source code), we can observe that def helper_outer_function():
data_ptr = None
last_dim_size = None
last_two_dims_size = None
shape = None
stride = None
qkv_format = None
def dummy1():
nonlocal data_ptr, last_dim_size, last_two_dims_size, shape, stride, qkv_format
# Access variables in the order they were declared
_ = data_ptr
_ = last_dim_size
_ = last_two_dims_size
_ = shape
_ = stride
_ = qkv_format
print(data_ptr)
print(last_dim_size)
print(last_two_dims_size)
print(shape)
print(stride)
print(qkv_format)
dummy1()
print(helper_outer_function.__code__.co_consts[1].co_freevars) The output is:
Note that our declaration order is If Dynamo generated bytecode does not obey this rule, it means we cannot generate source code that can compile the same bytecode as Dynamo, which makes it impossible to understand Dynamo bytecode by decompiling it into source code. |
Note: we require the new bytecode has exactly the same |
reference:
indexes of free and cell var storage are sorted. |
TODO: how to add test for this. We need to generate a resume function with freevars. |
Trying @williamwen42 as reviewer, but this looks pretty harmless, shout if it gets lost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - just add a test
@williamwen42 thanks for the reply. Can you give me some guidance on how to add tests? In particular, I don't know how to manually instruct Dynamo to generate a resume function as I want. |
You can compile a function with a graph break and then search import torch
def fn(x):
x = x + 1
torch._dynamo.graph_break()
x = torch.sin(x)
return x
opt_fn = torch.compile(fn, backend="eager")
opt_fn(torch.randn(10))
for k, v in list(globals().items()):
if k.startswith("__resume_at"):
print(k)
print(v) |
This kind of resume function does not have freevars:
|
You can use a different function that has freevars - compile a function with a closure? |
Still no freevars: import torch
def fn(x):
x = x + 1
@torch.compile(backend="eager")
def inner(x):
x = x + 1
torch._dynamo.graph_break()
x = x * 2
return x
y = inner(torch.sin(x))
return y
fn(torch.randn(10))
for k, v in list(globals().items()):
if k.startswith("__resume_at"):
print(k)
print(v) |
When will resume function contain free vars? |
Looks like resume functions with freevars are not exposed to the global scope, so we'll need to add something like # expose code object for debugging purposes
self.output.install_global_unsafe(name, new_code) before the Then a function like this should work: import torch
def create():
cl = 1
def fn(x):
x = x + 1
torch._dynamo.graph_break()
x = x + cl
return x
return fn
fn = create()
opt_fn = torch.compile(fn, backend="eager")
print(opt_fn(torch.randn(10)))
breakpoint()
for k, v in list(globals().items()):
if k.startswith("__resume_at"):
print(k)
print(v)
print(v.co_freevars)
print(v.co_cellvars) |
@williamwen42 thanks for the guidance! do you know if the test failures in the commit are related with the changes in this PR? |
They don't look related, but we'll see what CI shows. |
@williamwen42 can you take a look at whether ci test failures are related? |
They don't look related - we're having some issues with CI atm. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@williamwen42 then how can we merge this? do we need to wait until the ci team fixes the issue? |
@pytorchbot merge -r |
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
Successfully rebased |
50903e5
to
b7ad5f3
Compare
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng