New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
meaningless RuntimeWarning by clang-compiled np.float32.__mul__ #9007
Comments
same picture with |
This type of error is typical of arrays that contain |
@eric-wieser : it returns |
Does |
|
What about |
Also, if you could run |
|
Seems I forgot how |
Does this also fail? |
And, by the way, setting
|
|
Hmm, sage did something I didn't expect there. Same behaviour with |
Ok, so here's what I think is happening:
|
|
Worth noting that |
why doesn't |
Because it know that it can handle it as If you look closely, you'll find that |
so it looks as if
This is weird. A compiler bug, perhaps (as we never see it on gcc), but at which place? |
The FPU flag is for some reason being set on clang but not gcc within the sage code, it would seem. Numpy is to blame for making noise about it, but I highly doubt is to blame for setting it in the first place. Unfortunately, |
I assume this causes the same warning (needs numpy 1.12 to do so, I think)
|
not quite the same, but close:
|
OK, good, this seems to indicate that there is nothing specific with |
I don't know that the compiler authors would consider it a bug. The way that warning works is, there are some magic status flags that the processor keeps track of, that automatically get set whenever the corresponding event occurs. Numpy clears them before starting the computation, and then checks them again again at the end. So somewhere in between those points, the assembly generated by clang is doing some calculation that involves a NaN. But it's hard to track down (since the actual flag setting is done entirely in hardware), and most of the time people don't worry about how their code affects the fpu flags. (Libm implementations are also notoriously inconsistent about whether they set these flags.) And the exact results depend a lot on the exact asm being generated, so it's not surprising that you only see it in specific configurations and not others. |
Yep, that confirms my suspicions, and provides you with a way to debug. This code
Applied to a python function, allows you to isolate the warnings to within that chunk of code. |
I mean, it is pretty weird that it's happening at all; compilers don't usually invent and then throw away NaNs for no reason. If you're trying to track it down, then you should probably look at the code in sage that implements multiplication for those polynomials – it's likely that the weird flag setting is probably happening all the time, and numpy's only involvement is to make that visible. There's also a pretty good argument that numpy shouldn't even try to check these flags on object loops. (Or integer loops for that matter, but that's tricky because the way we report integer overflow is kinda gross and uses the fpu flags.) That's the only thing I can think of that numpy could do here. |
|
|
I get an error if I use python2's functools, in
|
Yeah, I'm guessing it's whatever multi-precision library implementing the arithmetic for the coefficients in |
That's MPFR for the record. |
We try to port Sagemath to clang+gfortran (mostly on OSX and FreeBSD, platforms where clang is the primary compiler), so that building and running it on OSX is easier and faster (FreeBSD is more of a tool to get a similar environment without the hassle of OSX and Apple hardware). All the comparisons I report here are for complete builds with clang/clang+++gfortran as opposed to gcc/g+++gfortran. |
the wrapper seems to tell us that
prints the warning, while |
Indeed - my assumption was that |
|
If there would be a simple way to change the warning to an error, you would get a traceback (Cython generates tracebacks for errors but not for warnings). |
@jdemeyer IMHO numpy warning is issued much later in the code path, i.e. it's result of an explicit check of FPU flags, not an interrupt set. numpy does provide an interface to change this warning to an error, but all you get is that you get back to the main iPython interpreter loop, without any backtrace whatsoever. |
cysignals would throw an exception is |
A similar warning: Again, the question is: what does Note that There are no warnings from |
The answer is the same here - numpy calls Most of the ndarray arithmetic/logical operators (with the exception of So once again, sage is setting these flags. Perhaps this is a sign of a bug, perhaps its not. I think there's a good argument here that numpy should not be checking fpu flags for these cases. @njsmith, do you think we should go ahead with removing the check for object types? |
As a matter of fact,
and thus I really doubt that it is called in the end, for one does get
|
I was able to pin our problem in Sagemath down to a particular C extension using fpectl Python module (which is somewhat, but not totally, broken on FreeBSD). It was actually very quick once I managed to get it installed. IMHO fpectl is so useful that it ought to be fixed; perhaps even used in numpy instead of, or in addition to, |
The difference between fpectl's approach and np.seterr is:
Some downsides of the Given all this, I don't think numpy is going to switch. In any case, it sounds like the original issue is solved, so closing this – feel free to open a new issue if you want to make a case for changes in |
Are we sure we don't want to disable checking of the FPU flags for object loops? That would seem like a pretty sensible change to numpy. |
@eric-wieser: oh, that's an interesting idea, yeah. maybe it's worth opening an issue for that :-). The "right thing" is pretty complicated though – ideally we shouldn't be special-casing the object dtype (think user dtypes), and integer loops also shouldn't use it either (this may be a real optimization on some architectures where checking/clearing the FPU flags is extremely slow), but integer loops do need a way to explicitly signal integer errors, which they currently do by explicitly setting the FPU flags, ... I'm not sure this is a case where there's easy low-hanging fruit? Or did I misunderstand, and sage has only identified the problem, and they still need a numpy change to actually fix it? |
@njsmith: I do not understand why you say it won't work on Windows. (This would be correct in pre-C99 era, though). Modern FPU-handling functions (fenv) are available as soon as your C compiler is C99-standard compliant. Apart from fenv all it needs is setjmp/longjmp (again, standard C feature). I am also curious to hear about a libm that causes one of FE exceptions in the course of a normal operation. |
@dimpase: You also need SIGFPE support, which is not specified in C99. (Well, C99 says that there should be a SIGFPE, but that's for divide-by-zero – it doesn't specify any way to hook it up to floating point exceptions.) That said, it looks like I misremembered, and though Windows doesn't support signals, MSVCRT emulates SIGFPE using structured exception handling, and provides the non-standard And FWIW, if a libm caused an FE exception and then cleared it again, I can't see why they would consider that a bug. I'm not sure that any such implementations exist, but it's plausible, and if they do then the way we would find out is b/c someone tells us that numpy is broken on platform X and the only fix would be to revert the change you suggested. Can you answer the the question I asked at the end of my previous comment? |
@njsmith : if a libm (or any other user code) needs to cause an FE exception and process it, it would set up its own FE exception handler, saving the previous one, and restoring the previous one upon exiting. Regarding MS support for this, they ship fenv.h since Visual C(++) 2013 or so. Regarding numpy's RuntimeWarning:
Regarding this issue in Sage - still fixing (hopefully it's limited to some issues in MPFR only). |
Sorry, this is going in circles and I need to move on to other things, so unless something new comes up on the fenv/sigfpe issue this will be my last message on the topic. (I'm still interested in if there's anything numpy needs to do for the sage bug).
What you're proposing is to take an operation that normally does not cause a signal handler to fire, and configure the processor in a non-standard mode where it does cause a signal handler to fire. It's totally reasonable for code to be performing this operation and expecting that it won't trigger a signal handler at all.
I can't figure out what you're talking about here. Afaict, the standard functionality in fenv.h is only useful for implementing numpy-style functionality, and MS sticks to the standard. I don't see any functions in there that could be used with setjmp/longjmp at all.
Carefully clearing a flag set by an intermediate calculation is the exact opposite of playing fast and loose with them. Also, the warnings are optional.
You're literally the first person in something like a decade to need SIGFPE to debug this kind of issue, and looking again at the sage bug comments, it looks like you didn't actually get fpectl working? It's not supposed to cause a core dump. (It looks like cysignals is overriding the fpectl code so it doesn't even run.) If this comes up again, what you need to do is make one C call to enable SIGFPE, then use a debugger to get a stack trace. You don't need a debug build to get a stack trace; all you need to do is not strip the symbols. And hey, now we know in case this does come up again. I understand this was really frustrating to debug, but it's not helpful to insist that other projects need to change or maintain basic infrastructure when you can't even explain clearly what this will accomplish. (I actually have no idea how you think numpy changing something here would even help you find this kind of bug faster – the whole idea of |
Finally, it turns out that it boils down to a long-standing bug in clang C compiler. Basically, in a certain range of
raises Incidentally, a related even more long-standing (since 2010, with a dozen closed duplicates) clang bug 8100 says that there is no hope of using clang's Not sure whether |
bug 8100 isn't relevant; that's for the C99 pragmas to disable floating point optimizations, and no mainstream compilers support those. numpy seems to (mostly) work anyway :-) |
The spirit of the bug 8100 is that clang does not care about FP operations being correctly compiled; although a lawyer might disagree. :-) OK, already mentioned bug 17686 is relevant for sure. |
In Sagemath we encounter in our ticket #22799
while multiplying a
numpy.float32
number with a non-numpy data; i.e. numpy ought to fail to do this multiplication silently, and indeed it does if one builds with gcc, or if instead ofnp.float32
it isnp.float
ornp.float128
.More precisely, one gets the warning from the Python call
where
x
is a Sagemath univariate polynomial with coefficients in Sagemath'sRealField
type. (and only this particular type of data triggers this).That is, potentially, such meaningless warnings can be emitted outside of Sagemath; we can reproduce it on OSX 11.12 with its stock cc (some derivative of clang 3.8), as well as on Linux with clang 4.0 and on FreeBSD 11.0 with clang 4.0 or clang 3.7.
Potentially we should be able to produce a way to reproduce this outside of Sagemath, although we'd need some tips where in numpy code this
__mul__
is actually implemented, to see what functions are applied to x...We see this on numpy 1.11 and on 1.12, too.
The text was updated successfully, but these errors were encountered: