New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reset the fpu stack during OpenBLAS initialization when on Windows #2889
Conversation
This does not solve the issue, the same "illegal value" error messages appear as before this PR. |
Wouldn't the fix need to be on every exported function to be effective? I think this only resets the registers once when OpenBLAS starts up, not on every call, but I probably am wrong. |
You are probably right, and unfortunately I do not see at the moment how to add it to every invocation without massive changes. Actually your concept of adding it to some of the BLAS kernels may also be working only in the specific sequence of calls used by the numpy tests ? |
I preface this with a big caveat: I am not really a x86 specialist programmer, so feel free to ignore all this. I think the assembler kernels I prefaced with
I did find this quote in this document from Intel, note the mention of register aliasing
That is from 1996. The last sentence suggests NumPy should be protecting OpenBLAS from the faulty |
The hand-coded fpu assembly is probably all more or less directly from GotoBLAS, written 10 to 15 years ago I have merged your patch now (though the SSUM and DSUM are about to be superseded by Qiyu8's new intrinsics code) - thanks for that and your patience. |
🤷 Thanks for taking this on OpenBLAS. I hope my compilations/test runs were all correct and that gh-2881 will help. |
Comment to the closed PR and #2889 (comment): FPU code is still used under the hood in Windows 64bit: mingw-w64 math libraries, OpenLIBM and the math dll from the MSVC UCRT redistributables are using FPU instructions. The FPU provides extended precision which is sometimes needed for intermediate steps in calculating trigonometric functions. The only open sourced math library I'm aware of without using FPU is sleef. Higher intermediate precision is performed with composed doubles/floats pairs in this case. |
Yes, we cannot control use of the fpu outside our own source. I linked Fog's paper in the numpy issue earlier, but to me there is nothing in it that suggests Win10 build 19041 is working correctly in this regard ? |
a more global attempt to fix #2709 as discussed in #2881