-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mingw(clang32) github runner is broken #780
Comments
Correct. LLVM was updated from version 17 to version 18 recently in MSYS2. If I understand correctly some of the tests for LAGraph are failing since that update. Is that correct? Or is there another issue? Maybe, the LLVM update broke their compiler. Ideally, we could report that upstream with some context how to reproduce the error. |
Some background: MSYS2 is in the process of dropping support for 32-bit platforms: But iiuc, they didn't plan on dropping support for the compiler already. |
Yes, there are 4 tests that fail in LAGraph. I thought at first it was because of some of my changes in GraphBLAS (9.0.3 to 9.1.0). But then I tried the stable branch and it failed in the identical manner. The errors are strange but are repeatable. One method fails with "OMP: out of heap memory" which makes no sense. I'm guessing it's a bug in the update to OpenMP. Perhaps CLANG32 with no OpenMP would work. The code in the stable branch passed the CI about a week ago. It also passed here: When the CLANG32 CI failed on dev2, I tried running it manually on the same d4dad6c version in the stable branch, but it failed: Between these 2 CI runs of the stable branch, on d4dad6c, no code of my changed. The only thing changed was the github runner used. I diff'd the logs and saw that these 2 runs use different github runners. I had to process the logs to strip the leading text on each line first. Here is the good output from 3 weeks ago, with the time stamp stripped from each line: Here is the bad output from just yesterday: and the diff: Here is a trimmed diff with just the pertinent problems: In the summary.txt file, the 4 failed tests are the same that fail when using the latest GraphBLAS 9.1.0 with LAgraph 1.1.3, in the dev2 and now dev branches. So it's not my code that's broken. Something broke in github. |
I agree that it is pretty unlikely that this is an error in the SuiteSparse sources that only shows up in that build configuration. Do the four failing tests have anything in common? Like do they use the same omp pragma or something similar? Or do they test the same functions in GraphBLAS or LAGraph that might get miscompiled? LLVM 18.1.2 has been released recently. MSYS2 will probably update to that version soon. Maybe, they've already fixed it? |
I haven't figured out why those 4 tests fail. They seem to have nothing in common. They likely do, I just don't know what it is. It's hard to track down since I'm not even sure which calls to GraphBLAS are failing, since each of these failed LAgraph methods makes lots of calls to GraphBLAS. It's probably a bug in OpenMP that is causing GraphBLAS to fail in some weird way. Yes, when I say "the github runner is broken" I meant something in github or in the packages it uses is broken. I'm guessing it's either the 32-bit clang compiler or its openmp library that's broken. The first few lines of the summary.txt shows the 2 github runner versions:
and later on, you see the llvm and openmp differences:
and this one:
Those packages are the only things that differ between the two runs. My code is the same. The one with I haven't tried CLANG32 with OpenMP disabled. If that works then the bug is in The simplest thing for now is to just disable MINGW(CLANG32) entirely. I can renable it sometime in the future, once github switches to a fixed MSYS2 distribution for this case. |
To preserve this error, I will make a copy of the stable branch in SuiteSparse, and archive it: |
I'm able to reproduce the errors locally in a CLANG32 build environment.
|
I'm struggling to read the output of the failing tests. |
No, they just show that the test failed. I would need to add more printf's to do that. I did add some to the test_BF. It showed that the expected values for some d were finite, like 1 or 2, while the computed result was +infinity, which in this case means it was missing in the result (the result vector d was supposed to be full but it was returned sparse). That's very strange, and I didn't dig any deeper once I saw that the stable code also failed in the same way. |
On the off-chance that this would make a difference, I tried again after MSYS2 updated to LLVM 18.1.2: Still the same failing tests in the CLANG32 environment with that version. |
Thanks for checking it. It would be a difficult issue for me to track down to find the specific place in GraphBLAS where the compiler is failing, since I don't have a simple way to replicate this problem on my side. Even if I did, it would be very slow for me since I don't use Windows at all. Hopefully a future version of LLVM will not have this problem. |
Github broke its mingw(clang32) runner.
The stable branch CI worked fine on March 2, 2024. The same CI fails on March 21, 2024. Nothing changed in the meantime, SuiteSparse and its .github/workflow files were unchanged. What did change was the github runner. Github switched the mingw(clang32) runner, and changed clang and openmp from 17.* to 18.*. Something broke, and it's not SuiteSparse.
See this update, which disables the mingw(clang32) tests:
b1bd9cc
The latest dev2 code breaks in the same way as the stable branch now breaks, with 4 test failures in LAGraph. One is an "OMP: out of heap memory" error, which is very strange since the problems being solved are very small.
Once the github runner is fixed, the above change to the SuiteSparse/.github/workflow/build.yaml file can be restored to its original state.
The text was updated successfully, but these errors were encountered: