Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISABLED test_issue106555 (__main__.TestCompiledAutograd) #125228

Closed
pytorch-bot bot opened this issue Apr 30, 2024 · 3 comments
Closed

DISABLED test_issue106555 (__main__.TestCompiledAutograd) #125228

pytorch-bot bot opened this issue Apr 30, 2024 · 3 comments
Labels
module: flaky-tests Problem is a flaky test in CI module: inductor oncall: pt2 skipped Denotes a (flaky) test currently skipped in CI. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@pytorch-bot
Copy link

pytorch-bot bot commented Apr 30, 2024

Platforms: linux, rocm

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Over the past 3 hours, it has been determined flaky in 3 workflow(s) with 9 failures and 3 successes.

Debugging instructions (after clicking on the recent samples link):
DO NOT ASSUME THINGS ARE OKAY IF THE CI IS GREEN. We now shield flaky tests from developers so CI will thus be green but it will be harder to parse the logs.
To find relevant log snippets:

  1. Click on the workflow logs linked above
  2. Click on the Test step of the job so that it is expanded. Otherwise, the grepping will not work.
  3. Grep for test_issue106555
  4. There should be several instances run (as flaky tests are rerun in CI) from which you can study the logs.
Sample error message
Traceback (most recent call last):
  File "inductor/test_compiled_autograd.py", line 541, in test_issue106555
    output_tensor = model(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "inductor/test_compiled_autograd.py", line 512, in forward
    y = torch.utils.checkpoint.checkpoint(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 403, in _fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 481, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/autograd/function.py", line 571, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/utils/checkpoint.py", line 254, in forward
    outputs = run_function(*args)
  File "inductor/test_compiled_autograd.py", line 520, in _forward
    x = x + self.module_with_jit_1(x)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "inductor/test_compiled_autograd.py", line 501, in forward
    output = bias_sigmoid_mul_jit(x1, x2, self.linear_2_bias)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 403, in _fn
    return fn(*args, **kwargs)
  File "inductor/test_compiled_autograd.py", line 484, in bias_sigmoid_mul
    def bias_sigmoid_mul(x1, x2, bias):
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 403, in _fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_functorch/aot_autograd.py", line 991, in forward
    return compiled_fn(full_args)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 130, in runtime_wrapper
    all_outs = call_func_at_runtime_with_args(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_functorch/_aot_autograd/utils.py", line 118, in call_func_at_runtime_with_args
    out = normalize_as_list(f(args))
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 188, in rng_functionalization_wrapper
    return compiled_fw(args)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/codecache.py", line 968, in __call__
    return self.current_callable(inputs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/compile_fx.py", line 946, in run
    return compiled_fn(new_inputs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/cudagraph_trees.py", line 368, in deferred_cudagraphify
    fn, out = cudagraphify(model, inputs, new_static_input_idxs, *args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/cudagraph_trees.py", line 390, in cudagraphify
    manager = get_container(device_index).get_tree_manager()
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/cudagraph_trees.py", line 325, in get_container
    lock = get_obj(local, "tree_manager_locks")[device_index]
TypeError: 'builtin_function_or_method' object is not subscriptable

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_ROCM=1 python test_compiled_autograd.py -k test_issue106555

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Test file path: inductor/test_compiled_autograd.py

cc @clee2000 @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire

@pytorch-bot pytorch-bot bot added module: flaky-tests Problem is a flaky test in CI module: inductor skipped Denotes a (flaky) test currently skipped in CI. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 30, 2024
Copy link
Author

pytorch-bot bot commented Apr 30, 2024

Hello there! From the DISABLED prefix in this issue title, it looks like you are attempting to disable a test in PyTorch CI. The information I have parsed is below:
  • Test name: test_issue106555 (__main__.TestCompiledAutograd)
  • Platforms for which to skip the test: linux, rocm
  • Disabled by pytorch-bot[bot]

Within ~15 minutes, test_issue106555 (__main__.TestCompiledAutograd) will be disabled in PyTorch CI for these platforms: linux, rocm. Please verify that your test name looks correct, e.g., test_cuda_assert_async (__main__.TestCuda).

To modify the platforms list, please include a line in the issue body, like below. The default action will disable the test for all platforms if no platforms list is specified.

Platforms: case-insensitive, list, of, platforms

We currently support the following platforms: asan, dynamo, inductor, linux, mac, macos, rocm, slow, win, windows.

Copy link
Author

pytorch-bot bot commented Apr 30, 2024

Another case of trunk flakiness has been found here. The list of platforms [linux, rocm] appears to contain all the recently affected platforms [linux, rocm]. Either the change didn't propogate fast enough or disable bot might be broken.

Copy link
Author

pytorch-bot bot commented May 17, 2024

Resolving the issue because the test is not flaky anymore after 1000 reruns without any failures and the issue hasn't been updated in 14 days. Please reopen the issue to re-disable the test if you think this is a false positive

@pytorch-bot pytorch-bot bot closed this as completed May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: flaky-tests Problem is a flaky test in CI module: inductor oncall: pt2 skipped Denotes a (flaky) test currently skipped in CI. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

0 participants