-
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: linalg.eigh segfault on Windows with OpenBLAS 0.3.16 #19469
Comments
Okay looks like the first build with the I'll go ahead and close and comment over in #16913 that the fix might be problematic now, but feel free to reopen if something actually seems useful to do at the NumPy end! |
We are using ILP64 BLAS with the latest pre-wheels, so that might also lead to some issues. |
Thanks for testing against the nightly builds. Using |
There have been reports of LAPACK testsuite segfaults on x86_64 with some operating systems (namely OSX), which may be linked to PR 3250 (adding a shortcut in SGEMV/DGEMV for small cases that "should not" need buffer allocation) and AVX512 targets (which is what Azure runs on AFAIK). |
This is troubling for 1.21.2, I'd like to backport this for the arm64 fixes, but I'd also like it to work for Prescott. EDIT: ARM64 wheels don't build without 0.3.16, so that forces my hand. |
0.3.17 released now with the fixes |
Let me know if NumPy rolls out a wheel with 0.3.17 and I'm happy to put |
@martin-frbg Great. I note that 64 bit OpenBLAS on arm64 hangs when testing the dot product.
See https://travis-ci.com/github/MacPython/numpy-wheels/jobs/523962484 |
Hm. I don't think anybody hurt dot in recent releases... anything else that test is exercising ? |
Could it be that the test just bails out because it is extremely slow due to swapping? I am not sure how reliable our |
It may be a segfault and not a hang |
Hm. Passes all the simple tests including xianyi's BLAS-Tester (ATLAS testsuite) on the MACmini in the gcc compile farm. I do not think I want to try building python there though - can you just restart the travis job to see if it could have been some unrelated fault ? |
@martin It happened consistently: three tests/push, many pushes.
Hmm, could be, the travis machine may incorrectly report memory. I don't expect any of the test machines to actually run that test on account of too little memory. In fact |
@martin Looks like a test problem, the testing process is oom killed. Default travis-ci memory ranges 2-4 GB, so the test should not normally run. |
Over in MNE we test against the nightly NumPy builds, and as of the last few hours it looks like we're hitting an
eigh
segfault on Windows Azure on 1.22.0.dev0+442.g89c80ba60 on code that worked on other builds (e.g., 1.22.0.dev0+405.g8eaceff8a):https://dev.azure.com/mne-tools/mne-python/_build/results?buildId=14294&view=logs&j=a017e066-62ca-5289-ad0b-8f57c84a089f&t=de70cabd-1dad-599c-0751-4f1f50c17e0f
I assume it's due to #19462 / OpenBLAS 0.3.16. Tomorrow I can try to reproduce locally on my Windows machine and dump the offending array to make a minimal example assuming it reproduces there. If I can't do that, I'll figure out a way to dump it on Azure and get it as a binary blob. But in the meantime I figured I'd open this in case others hit the same issue...
We also have the env var
OPENBLAS_CORETYPE=Prescott
in that build (from #16913), I'll first try removing that to see if it makes things work.The text was updated successfully, but these errors were encountered: