Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[autocast] using new autocast api with device name. #125225

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Shan19900305
Copy link
Contributor

@Shan19900305 Shan19900305 commented Apr 30, 2024

Using new autocast APIs with device type name.

cc @mcarilli @ptrblck @leslie-fang-intel @jgong5

Copy link

pytorch-bot bot commented Apr 30, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125225

Note: Links to docs will display an error until the docs builds have been completed.

❌ 41 New Failures, 3 Unrelated Failures

As of commit e751f58 with merge base 4693837 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@cpuhrsch cpuhrsch added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module module: amp (automated mixed precision) autocast labels Apr 30, 2024
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me
FYI @drisspg @mikaylagawarecki since this is a non-trivial change in mha

@@ -375,7 +375,7 @@ def forward(
why_not_sparsity_fast_path = "NestedTensor input is not supported"
elif mask is not None:
why_not_sparsity_fast_path = "src_key_padding_mask and mask were both supplied"
elif torch.is_autocast_enabled():
elif torch.is_autocast_enabled(src.device.type):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @drisspg the subtle difference here is that the old code was checking only for autocast being enabled on cuda while the updated code will check for autograd being enabled on the device of the input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be fine, since there aren't any mixed input devices

@huydhn huydhn added the ciflow/trunk Trigger trunk jobs on your pull request label May 1, 2024
huydhn added a commit to pytorch/test-infra that referenced this pull request May 2, 2024
#5160)

The context is in #5151. This
reland PR adds 2 more fixes:

* Do a left join from `workflow_job` to `push`, so that Dr.CI can always
find all the jobs from the PR even when the commit SHA is not found on
`push` in the case of forked PRs. The `head_sha_timestamp` field will be
empty then.
* When the `head_sha_timestamp` is empty, call `fetchCommitTimestamp` to
get the timestamp directly from GitHub. This is done once per commit.

Note that if the GitHub query fails and `head_sha_timestamp` is still
empty. Dr.CI won't apply similar flaky search to avoid FP, the search
query would expand to the current date otherwise.

### Testing

```
curl --request POST \
--url "http://localhost:3000/api/drci/drci?prNumber=PR_NUMBER" \
--header "Authorization: TOKEN" \
--data 'repo=pytorch'
```

1. pytorch/pytorch#125271, new forked PR, no
ciflow. `head_sha_timestamp` from Rockset is empty and
`fetchCommitTimestamp` is invoked. Dr.CI continues to work.

<details open><summary><b>NEW FAILURES</b> - The following jobs have
failed:</summary><p>

* [Lint / lintrunner-clang /
linux-job](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449212917)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585059/job/24449212917))
    `>>> Lint for torch/csrc/utils/tensor_memoryformats.cpp:`
* [pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 2, 5,
linux.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449643728)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449643728))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5,
linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24450124622)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24450124622))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.11-clang10 / test (crossref, 2, 2,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449335282)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449335282))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.11-clang10 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449334520)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449334520))

`test_tensor_creation_ops.py::TestTensorCreationCPU::test_constructor_dtypes_cpu`
* [pull / linux-focal-py3.11-clang10 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449334757)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449334757))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.11-clang10 / test (dynamo, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449335837)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449335837))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.12-clang10 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449281229)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449281229))

`test_tensor_creation_ops.py::TestTensorCreationCPU::test_constructor_dtypes_cpu`
* [pull / linux-focal-py3.12-clang10 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449281368)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449281368))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.12-clang10 / test (dynamo, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449282003)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449282003))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.8-clang10 / test (crossref, 2, 2,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449309061)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449309061))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.8-clang10 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449308208)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449308208))

`test_tensor_creation_ops.py::TestTensorCreationCPU::test_constructor_dtypes_cpu`
* [pull / linux-focal-py3.8-clang10 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449308391)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449308391))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-focal-py3.8-clang10 / test (dynamo, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449309632)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449309632))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-jammy-py3.10-clang15-asan / test (default, 2, 6,
linux.4xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449403443)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449403443))
    `test_autograd.py::TestAutograd::test_type_conversions`
* [pull / linux-jammy-py3.8-gcc11 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449357342)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449357342))

`test_tensor_creation_ops.py::TestTensorCreationCPU::test_constructor_dtypes_cpu`
* [pull / linux-jammy-py3.8-gcc11 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125271#24449357569)
([gh](https://github.com/pytorch/pytorch/actions/runs/8902585046/job/24449357569))
    `test_autograd.py::TestAutograd::test_type_conversions`
</p></details>

2. pytorch/pytorch#125225. Another forked PR
with `ciflow/trunk`. `head_sha_timestamp` is now available from Rockset
and `fetchCommitTimestamp` is not needed

<details open><summary><b>NEW FAILURES</b> - The following jobs have
failed:</summary><p>

* [pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 1, 5,
linux.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445851668)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445851668))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 2, 5,
linux.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445852045)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445852045))

`test_transformers.py::TestTransformersCUDA::test_script_encoder_subclass_cuda`
* [pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 3, 5,
linux.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445852311)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445852311))

`dynamo/test_autograd_function.py::AutogradFunctionTests::test_amp_custom_fwd_bwd`
* [pull / linux-focal-cuda12.1-py3.10-gcc9 / test (default, 4, 5,
linux.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445852638)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445852638))
`test_jit.py::TestScript::test_torchscript_multi_head_attn_fast_path`
* [pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 1, 5,
linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24446408907)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24446408907))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5,
linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24446409189)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24446409189))
`test_jit.py::TestScript::test_torchscript_multi_head_attn_fast_path`
* [pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 3, 5,
linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24446409446)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24446409446))

`test_transformers.py::TestTransformersCUDA::test_script_encoder_subclass_cuda`
* [pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 4, 5,
linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24446409676)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24446409676))

`test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_subclass_cuda`
* [pull / linux-focal-py3.11-clang10 / test (crossref, 1, 2,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445471589)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445471589))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-focal-py3.11-clang10 / test (crossref, 2, 2,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445471884)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445471884))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-focal-py3.11-clang10 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445470929)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445470929))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-focal-py3.11-clang10 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445471168)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445471168))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-focal-py3.11-clang10 / test (default, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445471397)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445471397))
`test_jit.py::TestScript::test_torchscript_multi_head_attn_fast_path`
* [pull / linux-focal-py3.11-clang10 / test (dynamo, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445472530)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445472530))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-focal-py3.12-clang10 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445428834)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445428834))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-focal-py3.12-clang10 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445429085)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445429085))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-focal-py3.12-clang10 / test (dynamo, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445429974)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445429974))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-focal-py3.8-clang10 / test (crossref, 1, 2,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445479567)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445479567))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-focal-py3.8-clang10 / test (crossref, 2, 2,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445479782)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445479782))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-focal-py3.8-clang10 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445478904)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445478904))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-focal-py3.8-clang10 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445479120)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445479120))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-focal-py3.8-clang10 / test (dynamo, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445480497)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445480497))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-jammy-py3.10-clang15-asan / test (default, 1, 6,
linux.4xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445500236)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445500236))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-jammy-py3.10-clang15-asan / test (default, 3, 6,
linux.4xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445500673)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445500673))

`test_transformers.py::TestTransformersCPU::test_transformerencoderlayer_subclass_model_cpu`
* [pull / linux-jammy-py3.10-clang15-asan / test (default, 4, 6,
linux.4xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445500892)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445500892))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-jammy-py3.10-clang15-asan / test (default, 5, 6,
linux.4xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445501108)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445501108))
`test_jit.py::TestScript::test_torchscript_multi_head_attn_fast_path`
* [pull / linux-jammy-py3.8-gcc11 / test (default, 1, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445495672)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445495672))
    `test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16`
* [pull / linux-jammy-py3.8-gcc11 / test (default, 2, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445495930)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445495930))

`test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu`
* [pull / linux-jammy-py3.8-gcc11 / test (default, 3, 3,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445496144)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445496144))
`test_jit.py::TestScript::test_torchscript_multi_head_attn_fast_path`
* [pull / linux-jammy-py3.8-gcc11 / test (jit_legacy, 1, 1,
linux.2xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/125225#24445496582)
([gh](https://github.com/pytorch/pytorch/actions/runs/8893561548/job/24445496582))

`test_jit_legacy.py::TestScript::test_torchscript_multi_head_attn_fast_path`
</p></details>

3. pytorch/executorch#3353, non-ghstack,
non-forked PR.

`{"3353":{"FAILED":[],"FLAKY":[],"BROKEN_TRUNK":[],"UNSTABLE":[]}}`

4. pytorch/pytorch#125292, ghstack, non-forked
PR.

<details open><summary><b>NEW FAILURE</b> - The following job has
failed:</summary><p>

* [inductor / cuda12.1-py3.10-gcc9-sm86 / test
(dynamic_inductor_torchbench, 2, 2,
linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/125292#24455309482)
([gh](https://github.com/pytorch/pytorch/actions/runs/8904802497/job/24455309482))
    `resnet18`
</p></details>
@Shan19900305 Shan19900305 force-pushed the main_using_autocast_with_device_string branch from 34e2afd to e751f58 Compare May 8, 2024 09:30
@Shan19900305 Shan19900305 requested a review from eqy as a code owner May 8, 2024 09:30
@Shan19900305
Copy link
Contributor Author

I could not reproduced those failed testcases locally, so rebase main branch and push again.

@albanD
Copy link
Collaborator

albanD commented May 14, 2024

cc @drisspg how carefully should we look into torchscript failures vs just skip and ignore for mha?

@drisspg
Copy link
Contributor

drisspg commented May 14, 2024

I think we should investigate these

2024-05-08T14:01:45.4716454Z FAILED [0.2009s] test_transformers.py::TestTransformersCPU::test_script_encoder_subclass_cpu - RuntimeError: 
2024-05-08T14:01:45.4717122Z 
2024-05-08T14:01:45.4717291Z aten::is_autocast_enabled() -> bool:
2024-05-08T14:01:45.4717830Z Expected at most 0 arguments but found 1 positional arguments.
2024-05-08T14:01:45.4718339Z :
2024-05-08T14:01:45.4719028Z   File "/opt/conda/envs/py_3.11/lib/python3.11/site-packages/torch/nn/modules/transformer.py", line 378
2024-05-08T14:01:45.4719810Z         elif mask is not None:
2024-05-08T14:01:45.4720386Z             why_not_sparsity_fast_path = "src_key_padding_mask and mask were both supplied"
2024-05-08T14:01:45.4721080Z         elif torch.is_autocast_enabled(src.device.type):
2024-05-08T14:01:45.4721629Z              ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
2024-05-08T14:01:45.4722127Z             why_not_sparsity_fast_path = "autocast is enabled"
2024-05-08T14:01:45.4722572Z     

it might be something as simple as mistyped annotation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request module: amp (automated mixed precision) autocast open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants