New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH, BUG: call emms after fmod on windows #17547
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether to label this a WIP since it does not handle debug builds with MSVC and perhaps also fails for mingw-not-msvc builds. I guess CI might smoke out some of the problems?
numpy/core/setup.py
Outdated
npymath_sources.append(join('src', 'npymath', 'call_emms.obj')) | ||
elif platform.platform().startswith('Windows'): | ||
npymath_sources.append(join('src', 'npymath', 'call_emms.asm')) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is asm
the correct suffix to get mingw to compile assembly code? Also: I added a call_emms_d.obj
for debug builds but it is not used here yet
# c_sources collects all the default sources, including pre-compiled | ||
# object files | ||
objects = [c_sources.pop(c_sources.index(c)) for c in c_sources[:] | ||
if c.endswith(obj_ext)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yuk. sorry for this, if there is a better way please suggest it. The c_sources
is from
c_sources, cxx_sources, f_sources, fmodule_sources \
= filter_sources(sources)
above, and is a bit misnamed since it collects anything that is not cxx_sources
, f_source
or fmodule_sources
. Also see line 264 where dispatch_sources
does something similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be done instead by passing in a .lib
file with libraries + library_dirs
from setup.py
, so distutils hacking is not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can the .lib
file be compiled from sources with MASM in the setup.py
file, so that the binary blob doesn't need to be shipped with Numpy? I guess the only info you'd need from distutils for that is the location of the msvc executables --- the MASM command line can be hardcoded as it's anyway platform specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will require more hacking than the current solution. setup.py
is not really the place to be running compilation, that is the job of config.add_library
and friends. Is shipping a 1k binary object file that is easily dissasembled a deal breaker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an example around of using MASM from python in a setup.py ? Maybe I am not seeing something obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, got it. There is an extra_objects
arg to Extensions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping it would be simple to do with running via subprocess.run
the commands that you would do manually, i.e., minimize interfacing with distutils at all. And passing the generated file to the Extension. But maybe it's not so simple.
I'd guess shipping binary blobs is not a dealbreaker, given that hopefully this hack can be dropped at some point in future, after the CRT bug is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe pycca can be used for that assembler snippet, but it would be another dependency.
Any reason for not using inline assembly within the C code? I know it's not standardised across compilers, but we only need this for msvc anyway, right? |
That would have been much easier, but MSVC does not support it for x86_64 ("Inline assembly is not supported on the ARM and x64 processors"). They used to support it for 32-bit but dropped the support. They do supply MASM, but distutils does not support it (AFAICT). We could switch to using mingw across the board for x86_64, but I was looking for a minimal change since this is a platform bug that the platform should fix. |
I had no idea it wasn't supported at all - thanks for setting me straight. |
Can you use |
This all would have been so much easier if MSVC did not drop support for MMX and assembler on 64-bit compiles. |
Let's move MS to clang-cl! |
I suspect that this will get fixed in ucrt eventually and so this will be able to be reverted someday. But just a guess. |
@mattip Does the OpenBLAS-only patch fix the problem without needing this patch? IMO this is not a NumPy bug technically. This is why NumPy/MKL continues to pass despite the FPU being left in a bad state. |
No, it only avoids it. MKL and other BLAS implementations would still not pass NumPy tests if we do not reset the registers here. |
MKL already passes the full test suite. Either they completely avoid the FPU (most likely) to they know to clean it up themselves (also possible). |
One of the reports on the Microsoft site was about the bug affecting MKL as well. We may just be lucky. |
Simplified and squashed to one commit. There is no more hacking in |
Nope, no good. Adding the new code as a separate library means that any downstream user of |
1.19.3 will go out as soon as there is azure support. My impression is that this problem will not be fixed by then and we will need #17553. I'm tempted to only apply this for 1.19 and hope things are fixed by the time 1.12 comes out. Thoughts? |
I think one can wait for 1.20 until the last minute to decide whether to forward port the 1.19 backport of this. There will always be come machined running windows version that are going to be buggy even if MS fixes ucrt IMO. The big fix is really going to be the OpenBLAS. It is pretty hard (not impossible) to run into this, and a fixed OpenBLAS goes a long way. |
Yes, that sounds like a good plan to me. If this PR goes in, gh-17553 should not be needed. The OpenBLAS fixes are on the way in MacPython/openblas-libs#43. Once that is merged and the artifacts are available, we can update numpy/numpy to use them. But I think this PR might be needed to protect interactions with other libraries (anyone using numpy who calls |
Closing. As was pointed out by @charris in the triage meeting, this does nothing for other uses of |
@mattip What do you think of releasing this as a stand-alone Windows-only package on PyPI. (I'm not suggesting you do the work, closer to volunteering). Is seems like I could do something like
without issues. This is a worst case workaround for someone who (a) must use pip for NumPy and SciPy and (b) Windows 2004/H2 and (c) Has access to inject emms() between the fmod and the linear algebra call. |
The code is open source after all, feel free to go ahead. Another option is to create a package that will hook the divmod ufunc with |
That second approach, while clearly superior, is beyond my paygrade. |
The lazier approach of using |
I get this traceback and have no clue why or what it actually does. Traceback (most recent call last): When I put the link in the browser it points me to the fmod error in visual studio. I don't use Visual Studio. I use the regular IDLE that comes with Python. I pip installed NumPy and when to run a py file that imports NumPy and poof this error. Any help would be greatly welcomed. As I'm no programming guru (still novice level here). I would like to find out if this is fixable or what I would need to do to get NumPy to work (provided a novice user can do it). PS I am on Windows 10. |
fixes gh-16744, also see these two issues on the visualstudio forums.
A call to
emms
clears the FPU registers messed up by a call to fmod in windows as discussed extensively in the linked issue. Unfortunately there is no native assembler handling in distutils: it depends on the compiler "doing the right thing" with the file. So I added a smallobj
file for windows that is the result of using MASM to compile the code in theasm
file. You can see the actual assembly by usingobjdump -d call_emms.obj
.On my updated windows system this PR ran the tests to completion. I would be happy if someone else could confirm that (don't forget to link with OpenBLAS as downloaded with
python tools/openblas_support.py
).