Reset the fpu stack during OpenBLAS initialization when on Windows #2889

martin-frbg · 2020-10-12T17:05:53Z

a more global attempt to fix #2709 as discussed in #2881

rebase

mattip · 2020-10-12T19:42:41Z

This does not solve the issue, the same "illegal value" error messages appear as before this PR.

mattip · 2020-10-12T19:46:35Z

Wouldn't the fix need to be on every exported function to be effective? I think this only resets the registers once when OpenBLAS starts up, not on every call, but I probably am wrong.

martin-frbg · 2020-10-12T20:14:34Z

You are probably right, and unfortunately I do not see at the moment how to add it to every invocation without massive changes. Actually your concept of adding it to some of the BLAS kernels may also be working only in the specific sequence of calls used by the numpy tests ?

mattip · 2020-10-12T21:40:10Z

I preface this with a big caveat: I am not really a x86 specialist programmer, so feel free to ignore all this.

I think the assembler kernels I prefaced with emms are a bit exceptional for modern x86_64 code. They use the older fpu registers and none of the xmm registers. Playing a bit with the compiler explorer and plugging in the code from arm/nrm2.c, I could not get it to compile to using the fpu registers I see in the kernels, modern compilers seem to use the xmms registers. So one theory could be

none of the code compiled by flang/msvc/mingw/gcc is using the ST FPU registers
one of the assembly kernels besides SUM and NRM2 is being used by some code path NumPy calls
PR add fninit to reset fpu registers before assembler routines #2881 sucessfully protects the vulnerable assembly registers, used only in these legacy-type kernels, from the invalid values left there by fmod.

I did find this quote in this document from Intel, note the mention of register aliasing

Since the MMXTM registers are aliased on the floating-point registers, it is very important to clear the
MMXTM register before issuing a floating-point instruction. Use the EMMS instruction to clear the
MMXTM register and set the value of the floating-point tag word (TW) to empty (that is, all ones). This
instruction should be inserted at the end of all MMXTM code segments to avoid an overflow exception
in the floating-point stack when a floating-point instruction is executed.

That is from 1996. The last sentence suggests NumPy should be protecting OpenBLAS from the faulty fmod function.

martin-frbg · 2020-10-12T22:02:49Z

The hand-coded fpu assembly is probably all more or less directly from GotoBLAS, written 10 to 15 years ago
(1) possible that no C or Fortran source compiles to fpu, though
(2) I tried by compressing all the suspect .S files to see which were actually required for compilation and tests (including gensymbol/linktest) and do not remember hitting anything except NRM2 and SUM. Should probably try that again.

I have merged your patch now (though the SSUM and DSUM are about to be superseded by Qiyu8's new intrinsics code) - thanks for that and your patience.

mattip · 2020-10-12T22:39:57Z

🤷 Thanks for taking this on OpenBLAS. I hope my compilations/test runs were all correct and that gh-2881 will help.

carlkl · 2020-10-13T11:51:59Z

Comment to the closed PR and #2889 (comment): FPU code is still used under the hood in Windows 64bit: mingw-w64 math libraries, OpenLIBM and the math dll from the MSVC UCRT redistributables are using FPU instructions. The FPU provides extended precision which is sometimes needed for intermediate steps in calculating trigonometric functions. The only open sourced math library I'm aware of without using FPU is sleef. Higher intermediate precision is performed with composed doubles/floats pairs in this case.
Concerning FPU and Windows 64bit: I found some more insight in this document from Agner Fog: calling conventions p.13

martin-frbg · 2020-10-13T12:27:58Z

Yes, we cannot control use of the fpu outside our own source. I linked Fog's paper in the numpy issue earlier, but to me there is nothing in it that suggests Win10 build 19041 is working correctly in this regard ?

martin-frbg added 2 commits October 11, 2020 23:34

Merge pull request #96 from xianyi/develop

f088374

rebase

Reset the FPU stack on Windows to work around a bug in Windows10.19041

fac9afe

martin-frbg mentioned this pull request Oct 12, 2020

add fninit to reset fpu registers before assembler routines #2881

Merged

martin-frbg closed this Oct 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reset the fpu stack during OpenBLAS initialization when on Windows #2889

Reset the fpu stack during OpenBLAS initialization when on Windows #2889

martin-frbg commented Oct 12, 2020

mattip commented Oct 12, 2020

mattip commented Oct 12, 2020

martin-frbg commented Oct 12, 2020

mattip commented Oct 12, 2020

martin-frbg commented Oct 12, 2020

mattip commented Oct 12, 2020

carlkl commented Oct 13, 2020

martin-frbg commented Oct 13, 2020

Reset the fpu stack during OpenBLAS initialization when on Windows #2889

Reset the fpu stack during OpenBLAS initialization when on Windows #2889

Conversation

martin-frbg commented Oct 12, 2020

mattip commented Oct 12, 2020

mattip commented Oct 12, 2020

martin-frbg commented Oct 12, 2020

mattip commented Oct 12, 2020

martin-frbg commented Oct 12, 2020

mattip commented Oct 12, 2020

carlkl commented Oct 13, 2020

martin-frbg commented Oct 13, 2020