New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: incorrect equality of np.int64 and np.uint64 #12525
Comments
The problem is that The workaround is to use This type of issue has been around for a while. Perhaps we should just emit a warning when this type of cast occurs, and force the user to perform the cast they actually wanted to silence it. |
@eric-wieser Also, your suggested workaround only works if both
In all seriousness though, this discrepancy is very hard to debug, and IMO clearly broken:
If Actually, the consistent solution would be to cast to object instead of float, which numpy does in other places as well:
|
I think you're hitting a number of different issues. The second one below seems relevant. As Eric said, one issue is that in Numpy's casting rules (different from C's casting rules) mixed operations like The other issue you hit has to do with mixing numpy ints and python ints. In this case I think there is something a bit weird: The array coercion of python-ints behaves differently from the scalar coercion. Here's another example that avoids using >>> v = 9223372036854775807 # python int
>>> a = np.uint64(v+1)
>>> a > v # both converted to float64
False
>>> np.array(a) > v # both converted to float64
False
>>> np.array([a]) > v # both converted to uint64
array([ True]) Perhps relevant: #2524 #944 #1051. Maybe this statement is relevant?:
|
Followed some links in #2524 and found more xrefs in a comment:
and then some other xrefs in #5668: #8809 #8853 #8851 #5746. Also found more related xrefs after searching issues for In #8851 (which is very similar; just addressing equality instead of all comparators), @eric-wieser observes after a nice (read: terrible) example in the OP that:
That issue also contains some more useful discussions. #11801 is another almost-duplicate of this issue.
A year ago, you made it sound less set-in-stone ;-)
This issue is clearly hitting many people in different situations (the above xrefs total 18 open issues!). Why not fix the fundamentals (surely there's other issues that would benefit from similar treatment), and use that occasion to bump to As an aside, I didn't even know that a In it, @charris notes that:
I'm making a bold prediction that this creature will eventually rear its head before PS. On third thought, maybe |
Well first thanks for all the time and effort collecting all the issues, I know it takes work. When considering these complex back-compat problems it's really important to be able to reference past discussion, and IMO in a project like numpy this kind of knowledge-organizing work is comparably important to the actual coding. Now after looking over all the issues, my personal opinion is that 1. I dislike numpy's casting behavior and wish it were C-like, but 2. I also think that at this point we are stuck with the current coercion rules unless we do a numpy 2.0, and I don't think this change alone justifies a numpy 2.0. The problem is numpy is now a foundation of a large computing ecosystem, and a change like this one would very likely break a large fraction of downstream projects in hard-to-debug ways. I would guess more than half. For the same reason that old decisions in the C language have been locked in, we get locked in too. Plus, it's perfectly possible to write correct algorithms in numpy despite this problem, by keeping track of the type of your variables or casting as needed. So this issue can be seen as more of an annoyance rather than a bug. But, I think we can add this problem to the list of things that would motivate a numpy 2.0. I just put this issue on the wiki at https://github.com/numpy/numpy/wiki/Backwards-incompatible-ideas-for-a-major-release. Once enough things get added to the list, there might be momentum for 2.0. I suspect that when the "epic dtype plan" gets moving the list will get longer. |
@h-vetinari - thanks also from me for the summary; it is very useful to have things in one place! Since you just went over them, were all those for casting, and were there examples for things other than @ahaldane - I worry that we may be letting parts that are truly difficult to fix get in the way of fixing simpler ones. For instance, in the example at hand, of operations on scalars, it is not obvious to me that much would break if we changed the rule for The bigger difficulty is what to do with arrays. One option there might be to move towards a solution by changing the casting rules for int (or more specifically, just for |
@mhvk
Arrays already work correctly in this case:
Furthermore, the scalar case only breaks between
|
I guess the question is how much these changes would break. I don't have hard evidence, but my impression is that it would break a lot of code and so any change like this is infeasible, even for the python scalars. To get a better idea of whether that's true or not I think someone would have to code it up just to try running some project's test suites. By the way, C coercion for ints does |
Agreed that some better data would be needed... But also good to simply collect how in an ideal world we would like things to work, so we can decide which are actually fine as is, which possible to move to and which are bad but unchangeable. |
But C does not have the open-ended python ints. It should at least be considered that this casts to object (which
This would - I believe - solve >90% of the issues I linked above.
Thanks for this! You bring up another good point that might be worth tackling at the same time: silent int overflow. I think this is terrible (silently changing results; very hard to debug), and really should either raise/warn/upcast. |
Proof of that statement: https://godbolt.org/z/CywY13 |
I think that for Furthermore, I think that these operations can and should dispatch on output type when used in compound assignment. That is to say, if we have Even though it might be difficult to change this behavior now because programs that depend on the incorrect behavior no doubt exist, it would be very nice to have heterogeneous compound assignments ( |
Worth revisiting and possibly closing after the change in casting rules: NumPy will now not automatically cast for you. |
There are other issues around this, but I had fixed the equality on main (actually I think alrady in 1.25). |
EDIT: since this issue has become somewhat of a stand-in for
uint64
-problems, here's a summary pulled up from the discussion further down:There's a range of values of class
np.uint64
that fail a trivial comparison test:v > v - 1
The issue is IMO that it effectively uses float-logic of some kind in the background, which is obviously neither desirable nor necessary.
Reproducing code example:
The following comparisons should all return
True
:The way I encountered this, is that
'uint64'
is the default dtype for values betweennp.iinfo('int64').max + 1
andnp.iinfo('uint64').max
(which makes sense), but leads to the following problem:Numpy/Python version information:
Numpy version:
1.15.4
Python version:
3.6.6 (conda-forge)
The text was updated successfully, but these errors were encountered: