New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DO NOT MERGE YET Replacing nightly CUDA11.0 builds with 11.1 #47938
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
💊 CI failures summary and remediationsAs of commit bb1c18e (more details on the Dr. CI page):
🕵️ 4 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages: binary_linux_libtorch_3_7m_cu111_gcc5_4_cxx11-abi_nightly_static-with-deps_build (1/4)Step: "Build" (full log | diagnosis details | 🔁 rerun)
|
Job | Step | Action |
---|---|---|
binary_windows_libtorch_3_7_cu111_release_nightly_test | Test | 🔁 rerun |
binary_linux_conda_3_9_cu111_devtoolset7_nightly_test | Run in docker | 🔁 rerun |
binary_linux_conda_3_7_cu111_devtoolset7_nightly_test | Run in docker | 🔁 rerun |
binary_linux_conda_3_8_cu111_devtoolset7_nightly_test | Run in docker | 🔁 rerun |
binary_linux_conda_3_6_cu111_devtoolset7_nightly_test | Run in docker | 🔁 rerun |
❄️ 1 failure tentatively classified as flaky
but reruns have not yet been triggered to confirm:
binary_windows_libtorch_3_7_cu111_debug_nightly_build (1/1)
Step: "Persisting to workspace" (full log | diagnosis details | 🔁 rerun) ❄️
Error archiving workspace files: Error archiving files to tarball C:\Users\circleci\AppData\Local\Temp\workspace-layer-365ff348-af85-4679-b593-f8e215809d88531392954 : stdout: No space left on device
gzip: /c/Program Files/Git/usr/bin/tar: C:\Users\circleci\AppData\Local\Temp\workspace-layer-365ff348-af85-4679-b593-f8e215809d88531392954: Cannot write: Broken pipe /c/Program Files/Git/usr/bin/tar: Child returned status 1 /c/Program Files/Git/usr/bin/tar: Error is not recoverable: exiting now : exit status 2
gzip: /c/Program Files/Git/usr/bin/tar: C:\Users\circleci\AppData\Local\Temp\workspace-layer-365ff348-af85-4679-b593-f8e215809d88531392954: Cannot write: Broken pipe /c/Program Files/Git/usr/bin/tar: Child returned status 1 /c/Program Files/Git/usr/bin/tar: Error is not recoverable: exiting now : exit status 2
Creating workspace archive...
Error archiving workspace files: Error archiving files to tarball C:\Users\circleci\AppData\Local\Temp\workspace-layer-365ff348-af85-4679-b593-f8e215809d88531392954 : stdout: No space left on device
gzip: /c/Program Files/Git/usr/bin/tar: C\:\\Users\\circleci\\AppData\\Local\\Temp\\workspace-layer-365ff348-af85-4679-b593-f8e215809d88531392954: Cannot write: Broken pipe /c/Program Files/Git/usr/bin/tar: Child returned status 1 /c/Program Files/Git/usr/bin/tar: Error is not recoverable: exiting now : exit status 2
This comment was automatically generated by Dr. CI (expand for details).
Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janeyx99 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
017d06c
to
782f631
Compare
8e966ff
to
929e976
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please convert it to a draft if it is intended for testing purposes?
a7b7962
to
8ab61b0
Compare
updating driver link testing only 11.1 binaries prepping for torch_cuda split
8ab61b0
to
9a70348
Compare
f0b4f82
to
d2acea9
Compare
As for the Windows jobs, the problems are listed below.
|
Hey @peterjc123 thanks so much for looking into the Windows side!
Just curious; how and where is this normally done?
I thought #574 in builder (pytorch/builder#574) updated the driver here...do we need an even newer version?
I just checked conda/pytorch-nightly/bld.bat and noticed it's been (partially?) updated for CUDA 11.1. I will look into this more but if you have any ideas on what needs to be updated, let me know! |
91a37f0
to
099c6ee
Compare
099c6ee
to
bb1c18e
Compare
It should be
We may log on to the CircleCI machine and use sth like Space Sniffer to get those directories. |
Will be addressed in #51405. |
This will not be used as we have 11.2 |
Based on #43366
Testing CUDA 11.1 build with split torch_cuda (#49050), previously, linking failed due to big binary size.
Using builder PR #627 to integrate the changes.