Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset the fpu stack during OpenBLAS initialization when on Windows #2889

Closed
wants to merge 2 commits into from
Closed

Conversation

martin-frbg
Copy link
Collaborator

a more global attempt to fix #2709 as discussed in #2881

@mattip
Copy link
Contributor

mattip commented Oct 12, 2020

This does not solve the issue, the same "illegal value" error messages appear as before this PR.

@mattip
Copy link
Contributor

mattip commented Oct 12, 2020

Wouldn't the fix need to be on every exported function to be effective? I think this only resets the registers once when OpenBLAS starts up, not on every call, but I probably am wrong.

@martin-frbg
Copy link
Collaborator Author

You are probably right, and unfortunately I do not see at the moment how to add it to every invocation without massive changes. Actually your concept of adding it to some of the BLAS kernels may also be working only in the specific sequence of calls used by the numpy tests ?

@mattip
Copy link
Contributor

mattip commented Oct 12, 2020

I preface this with a big caveat: I am not really a x86 specialist programmer, so feel free to ignore all this.

I think the assembler kernels I prefaced with emms are a bit exceptional for modern x86_64 code. They use the older fpu registers and none of the xmm registers. Playing a bit with the compiler explorer and plugging in the code from arm/nrm2.c, I could not get it to compile to using the fpu registers I see in the kernels, modern compilers seem to use the xmms registers. So one theory could be

  • none of the code compiled by flang/msvc/mingw/gcc is using the ST FPU registers
  • one of the assembly kernels besides SUM and NRM2 is being used by some code path NumPy calls
  • PR add fninit to reset fpu registers before assembler routines #2881 sucessfully protects the vulnerable assembly registers, used only in these legacy-type kernels, from the invalid values left there by fmod.

I did find this quote in this document from Intel, note the mention of register aliasing

Since the MMXTM registers are aliased on the floating-point registers, it is very important to clear the
MMXTM register before issuing a floating-point instruction. Use the EMMS instruction to clear the
MMXTM register and set the value of the floating-point tag word (TW) to empty (that is, all ones). This
instruction should be inserted at the end of all MMXTM code segments to avoid an overflow exception
in the floating-point stack when a floating-point instruction is executed.

That is from 1996. The last sentence suggests NumPy should be protecting OpenBLAS from the faulty fmod function.

@martin-frbg
Copy link
Collaborator Author

The hand-coded fpu assembly is probably all more or less directly from GotoBLAS, written 10 to 15 years ago
(1) possible that no C or Fortran source compiles to fpu, though
(2) I tried by compressing all the suspect .S files to see which were actually required for compilation and tests (including gensymbol/linktest) and do not remember hitting anything except NRM2 and SUM. Should probably try that again.

I have merged your patch now (though the SSUM and DSUM are about to be superseded by Qiyu8's new intrinsics code) - thanks for that and your patience.

@mattip
Copy link
Contributor

mattip commented Oct 12, 2020

🤷 Thanks for taking this on OpenBLAS. I hope my compilations/test runs were all correct and that gh-2881 will help.

@carlkl
Copy link

carlkl commented Oct 13, 2020

Comment to the closed PR and #2889 (comment): FPU code is still used under the hood in Windows 64bit: mingw-w64 math libraries, OpenLIBM and the math dll from the MSVC UCRT redistributables are using FPU instructions. The FPU provides extended precision which is sometimes needed for intermediate steps in calculating trigonometric functions. The only open sourced math library I'm aware of without using FPU is sleef. Higher intermediate precision is performed with composed doubles/floats pairs in this case.
Concerning FPU and Windows 64bit: I found some more insight in this document from Agner Fog: calling conventions p.13

@martin-frbg
Copy link
Collaborator Author

Yes, we cannot control use of the fpu outside our own source. I linked Fog's paper in the numpy issue earlier, but to me there is nothing in it that suggests Win10 build 19041 is working correctly in this regard ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Errors when using OpenBLAS through NumPy on Windows 10 2004
3 participants