Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH, BUG: call emms after fmod on windows #17547

Closed
wants to merge 1 commit into from
Closed

Conversation

mattip
Copy link
Member

@mattip mattip commented Oct 13, 2020

fixes gh-16744, also see these two issues on the visualstudio forums.

A call to emms clears the FPU registers messed up by a call to fmod in windows as discussed extensively in the linked issue. Unfortunately there is no native assembler handling in distutils: it depends on the compiler "doing the right thing" with the file. So I added a small obj file for windows that is the result of using MASM to compile the code in the asm file. You can see the actual assembly by using objdump -d call_emms.obj.

On my updated windows system this PR ran the tests to completion. I would be happy if someone else could confirm that (don't forget to link with OpenBLAS as downloaded with python tools/openblas_support.py).

Copy link
Member Author

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether to label this a WIP since it does not handle debug builds with MSVC and perhaps also fails for mingw-not-msvc builds. I guess CI might smoke out some of the problems?

npymath_sources.append(join('src', 'npymath', 'call_emms.obj'))
elif platform.platform().startswith('Windows'):
npymath_sources.append(join('src', 'npymath', 'call_emms.asm'))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is asm the correct suffix to get mingw to compile assembly code? Also: I added a call_emms_d.obj for debug builds but it is not used here yet

# c_sources collects all the default sources, including pre-compiled
# object files
objects = [c_sources.pop(c_sources.index(c)) for c in c_sources[:]
if c.endswith(obj_ext)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yuk. sorry for this, if there is a better way please suggest it. The c_sources is from

c_sources, cxx_sources, f_sources, fmodule_sources \
            = filter_sources(sources)

above, and is a bit misnamed since it collects anything that is not cxx_sources, f_source or fmodule_sources. Also see line 264 where dispatch_sources does something similar

Copy link
Member

@pv pv Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be done instead by passing in a .lib file with libraries + library_dirs from setup.py, so distutils hacking is not needed?

Copy link
Member

@pv pv Oct 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can the .lib file be compiled from sources with MASM in the setup.py file, so that the binary blob doesn't need to be shipped with Numpy? I guess the only info you'd need from distutils for that is the location of the msvc executables --- the MASM command line can be hardcoded as it's anyway platform specific.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea, thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will require more hacking than the current solution. setup.py is not really the place to be running compilation, that is the job of config.add_library and friends. Is shipping a 1k binary object file that is easily dissasembled a deal breaker?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an example around of using MASM from python in a setup.py ? Maybe I am not seeing something obvious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, got it. There is an extra_objects arg to Extensions

Copy link
Member

@pv pv Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping it would be simple to do with running via subprocess.run the commands that you would do manually, i.e., minimize interfacing with distutils at all. And passing the generated file to the Extension. But maybe it's not so simple.

I'd guess shipping binary blobs is not a dealbreaker, given that hopefully this hack can be dropped at some point in future, after the CRT bug is fixed.

Copy link
Member

@carlkl carlkl Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe pycca can be used for that assembler snippet, but it would be another dependency.

@mattip mattip changed the title ENH: call emms after fmod on windows ENH, BUG: call emms after fmod on windows Oct 13, 2020
@eric-wieser
Copy link
Member

Any reason for not using inline assembly within the C code? I know it's not standardised across compilers, but we only need this for msvc anyway, right?

@mattip
Copy link
Member Author

mattip commented Oct 13, 2020

Any reason for not using inline assembly within the C code?

That would have been much easier, but MSVC does not support it for x86_64 ("Inline assembly is not supported on the ARM and x64 processors"). They used to support it for 32-bit but dropped the support. They do supply MASM, but distutils does not support it (AFAICT). We could switch to using mingw across the board for x86_64, but I was looking for a minimal change since this is a platform bug that the platform should fix.

@eric-wieser
Copy link
Member

I had no idea it wasn't supported at all - thanks for setting me straight.

@eric-wieser
Copy link
Member

eric-wieser commented Oct 13, 2020

Can you use _mm_empty from <mmintrin.h> instead, which appears to be the same thing? https://docs.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2010/ays9ef83(v=vs.100)?redirectedfrom=MSDN

@mattip
Copy link
Member Author

mattip commented Oct 13, 2020

This all would have been so much easier if MSVC did not drop support for MMX and assembler on 64-bit compiles. _mm_empty is available only for 32-bit builds, it is part of a if defined(_M_IX86) clause in mmintrinsics.h. That is why it took so long to put this PR together (with a lot of help from @carlkl)

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Oct 13, 2020
@bashtage
Copy link
Contributor

Let's move MS to clang-cl!

@bashtage
Copy link
Contributor

I suspect that this will get fixed in ucrt eventually and so this will be able to be reverted someday. But just a guess.

@bashtage
Copy link
Contributor

bashtage commented Oct 13, 2020

@mattip Does the OpenBLAS-only patch fix the problem without needing this patch?

IMO this is not a NumPy bug technically. This is why NumPy/MKL continues to pass despite the FPU being left in a bad state.

@mattip
Copy link
Member Author

mattip commented Oct 13, 2020

Does the OpenBLAS-only patch fix the problem

No, it only avoids it. MKL and other BLAS implementations would still not pass NumPy tests if we do not reset the registers here.

@bashtage
Copy link
Contributor

MKL and other BLAS implementations would still not pass NumPy tests if we do not reset the registers here.

MKL already passes the full test suite. Either they completely avoid the FPU (most likely) to they know to clean it up themselves (also possible).

@charris
Copy link
Member

charris commented Oct 13, 2020

MKL already passes the full test suite

One of the reports on the Microsoft site was about the bug affecting MKL as well. We may just be lucky.

@mattip
Copy link
Member Author

mattip commented Oct 13, 2020

Simplified and squashed to one commit. There is no more hacking in build_clibs.py, only adding the precompiled assembly to the extension module.

@mattip
Copy link
Member Author

mattip commented Oct 14, 2020

Nope, no good. Adding the new code as a separate library means that any downstream user of npymath.lib must now add the new library as well. I guess there might be ways to merge the two libraries together, but I think that would be more hacks than the original code. @pv any other ideas or demo code I could adapt? In he mean time I will revert this to the original PR.

@charris
Copy link
Member

charris commented Oct 19, 2020

1.19.3 will go out as soon as there is azure support. My impression is that this problem will not be fixed by then and we will need #17553. I'm tempted to only apply this for 1.19 and hope things are fixed by the time 1.12 comes out. Thoughts?

@bashtage
Copy link
Contributor

I think one can wait for 1.20 until the last minute to decide whether to forward port the 1.19 backport of this. There will always be come machined running windows version that are going to be buggy even if MS fixes ucrt IMO.

The big fix is really going to be the OpenBLAS. It is pretty hard (not impossible) to run into this, and a fixed OpenBLAS goes a long way.

@mattip
Copy link
Member Author

mattip commented Oct 20, 2020

I'm tempted to only apply this for 1.19

Yes, that sounds like a good plan to me. If this PR goes in, gh-17553 should not be needed. The OpenBLAS fixes are on the way in MacPython/openblas-libs#43. Once that is merged and the artifacts are available, we can update numpy/numpy to use them. But I think this PR might be needed to protect interactions with other libraries (anyone using numpy who calls fmod could hit this bug) until Microsoft fixes the problem.

@mattip mattip added the triage review Issue/PR to be discussed at the next triage meeting label Oct 21, 2020
@mattip
Copy link
Member Author

mattip commented Oct 22, 2020

Closing. As was pointed out by @charris in the triage meeting, this does nothing for other uses of fmod in code, such as math.fmod from Python.

@bashtage
Copy link
Contributor

bashtage commented Nov 5, 2020

@mattip What do you think of releasing this as a stand-alone Windows-only package on PyPI. (I'm not suggesting you do the work, closer to volunteering). Is seems like I could do something like

x = np.arange(18.) % 3%
emms()
np.linalg.svd(x)

without issues. This is a worst case workaround for someone who (a) must use pip for NumPy and SciPy and (b) Windows 2004/H2 and (c) Has access to inject emms() between the fmod and the linear algebra call.

@mattip
Copy link
Member Author

mattip commented Nov 5, 2020

The code is open source after all, feel free to go ahead. Another option is to create a package that will hook the divmod ufunc with PyUFunc_ReplaceLoopBySignature at import, and replace the original loop with a call to the original loop and then emms(). Then importig the package could be sufficient to monkey patch NumPy.

@bashtage
Copy link
Contributor

bashtage commented Nov 5, 2020

That second approach, while clearly superior, is beyond my paygrade.

@bashtage
Copy link
Contributor

bashtage commented Nov 5, 2020

The lazier approach of using set_numeric_ops might be doable for me.

@bstee615 bstee615 mentioned this pull request Nov 24, 2020
@IamKraZ
Copy link

IamKraZ commented Dec 7, 2020

I get this traceback and have no clue why or what it actually does.

Traceback (most recent call last):
File "C:/Users/iamkr/Desktop/Python Games/Tile_Based/cavemapgenerator.py", line 2, in
import numpy as np
File "C:\Users\iamkr\AppData\Local\Programs\Python\Python38\lib\site-packages\numpy_init_.py", line 305, in
win_os_check()
File "C:\Users\iamkr\AppData\Local\Programs\Python\Python38\lib\site-packages\numpy_init
.py", line 302, in _win_os_check
raise RuntimeError(msg.format(file)) from None
RuntimeError: The current Numpy installation ('C:\Users\iamkr\AppData\Local\Programs\Python\Python38\lib\site-packages\numpy\init.py') fails to pass a sanity check due to a bug in the windows runtime. See this issue for more information: https://tinyurl.com/y3dm3h86

When I put the link in the browser it points me to the fmod error in visual studio. I don't use Visual Studio. I use the regular IDLE that comes with Python. I pip installed NumPy and when to run a py file that imports NumPy and poof this error.

Any help would be greatly welcomed. As I'm no programming guru (still novice level here). I would like to find out if this is fixable or what I would need to do to get NumPy to work (provided a novice user can do it).
thanks in advance for your time and help in this matter.

PS I am on Windows 10.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 01 - Enhancement triage review Issue/PR to be discussed at the next triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

polyfit and eig regression tests fail after Windows 10 update to 2004
7 participants