BUG: incorrect equality of np.int64 and np.uint64 #12525

h-vetinari · 2018-12-10T07:48:16Z

EDIT: since this issue has become somewhat of a stand-in for uint64-problems, here's a summary pulled up from the discussion further down:

PS. Here's all the duckies in one neat row:

open related issues (all about uint64 breaking due to casting to float, or closely related): np.left_shift and np.right_shift fail on np.uint64 scalar types (Trac #1931) #2524, uint64 modulo arithmetic broken (Trac #1940) #2533, bitwise_and with uint64 #2955, r/l shift on uint64 scalar and python int failing #4151, shift operator does not work with numpy arrays with dtype=uint64 #5668, uint64 converted silently to float64 when adding an int #5745, Cannot reliably compare uint64 values to integers? #5746, NumPy's implementation of scalar integer ** integer has very weird casting #7449, bitshift typing on unsigned operands is tedious to work with #8002, BUG: Inconsistent behaviour of return type of int64.__pow__ #8809, BUG: incorrect equality of np.int64 and np.uint64 #8851, numpy.eye raises TypeError on numpy.uint64 input #9982, Unexpected integer division/modulus result when using np.uint64 #10148, DOC: Improve the description of the dtype parameter in numpy.array docstring #10614, .dtype.type class object is not preserved over some operations #11020, Invalid integer comparison #11801

open related PRs: BUG: Prevent power(uint64, int64) casting to float #8853

closed/duplicate issues (prob. very incomplete): np.left_shift and np.right_shift fail on np.uint64 scalar types (Trac #1930) #2523, np.uint64 + int == ... np.float64 ?? #7126, uint64 is "promoted" to float64 when adding an integer #8522, Numpy uint64 added to Python int yields float #8986

loosely related (prob. very incomplete): The epic dtype cleanup plan #2899, BUG: Inconsistent type resolution for 0d arrays #10322, Why does indexing an array after boolean comparison provide different results than doing a boolean comparison of an index? #10982, The type conversion problem of array operations #11485, WIP, NEP: add dtype design NEP - DtypeMeta, Dtype, and hierarchy #12660

There's a range of values of class np.uint64 that fail a trivial comparison test: v > v - 1

The issue is IMO that it effectively uses float-logic of some kind in the background, which is obviously neither desirable nor necessary.

Reproducing code example:

The following comparisons should all return True:

>>> import numpy as np
>>> for v in np.logspace(50, 63.9, num=20, base=2):
...     v = int(v)
...     c = np.array([v], dtype='uint64').max() > v - 1
...     print(c, v)
...
True 1125899906842624
True 1869506547705232
True 3104232188555715
True 5154439010814696
True 8558715940823230
False 14211365854171536
False 23597338764077220
False 39182327895891032
False 65060506809279480
False 108030067981851424
False 179379106627232000
False 297851001073046464
False 494568294537084288
False 821208581069482368
False 1363580199279397888
False 2264164065900722176
False 3759543384412024832
False 6242554005755503616
True 10365482328612339712
True 17211420807207079936

The way I encountered this, is that 'uint64' is the default dtype for values between np.iinfo('int64').max + 1 and np.iinfo('uint64').max (which makes sense), but leads to the following problem:

>>> v = np.iinfo('int64').max
>>> np.array([v + 1]) > v
array([ True])
>>> np.array([v + 1]).max() > v
False
>>> np.array([v + 1000]).max() == v
True

Numpy/Python version information:

Numpy version: 1.15.4
Python version: 3.6.6 (conda-forge)

The text was updated successfully, but these errors were encountered:

eric-wieser · 2018-12-10T08:18:45Z

The problem is that v - 1 is uint64 - int64 which due to the lack of int128, we panic and assign type float64 to.

The workaround is to use v - np.uint64(1), which will avoid this casting.

This type of issue has been around for a while. Perhaps we should just emit a warning when this type of cast occurs, and force the user to perform the cast they actually wanted to silence it.

h-vetinari · 2018-12-10T18:26:31Z

@eric-wieser
arrays don't panic, see below. ;-)

Also, your suggested workaround only works if both v and 1 are cast (but - to me - doesn't explain why it suddenly starts working again around 1e19):

>>> for v in np.logspace(50, 63.9, num=20, base=2):
...     v = int(v)
...     cmp_arr = (np.array([v], dtype='uint64') > v - 1)[0]
...     cmp_max = np.array([v], dtype='uint64').max() > v - 1
...     cmp_cast_1 = np.array([v], dtype='uint64').max() > v - np.uint64(1)
...     cmp_cast_2 = np.array([v], dtype='uint64').max() > np.uint64(v - 1)
...     print(cmp_arr, cmp_max, cmp_cast_1, cmp_cast_2, v, np.log10(v))
...
True True True True 1125899906842624 15.05149978319906
True True True True 1869506547705232 15.271726990553235
True True True True 3104232188555715 15.491954197907413
True True True True 5154439010814696 15.712181405261587
True True True True 8558715940823230 15.932408612615763
True False False True 14211365854171536 16.152635819969937
True False False True 23597338764077220 16.372863027324115
True False False True 39182327895891032 16.59309023467829
True False False True 65060506809279480 16.813317442032464
True False False True 108030067981851424 17.03354464938664
True False False True 179379106627232000 17.25377185674082
True False False True 297851001073046464 17.473999064094993
True False False True 494568294537084288 17.694226271449168
True False False True 821208581069482368 17.914453478803345
True False False True 1363580199279397888 18.13468068615752
True False False True 2264164065900722176 18.354907893511694
True False False True 3759543384412024832 18.57513510086587
True False False True 6242554005755503616 18.795362308220046
True True True True 10365482328612339712 19.015589515574224
True True True True 17211420807207079936 19.235816722928398

In all seriousness though, this discrepancy is very hard to debug, and IMO clearly broken:

>>> v = np.iinfo('int64').max
>>> np.array([v + 1]) > v
array([ True])
>>> np.array([v + 1]).max() > v
False
>>>
>>> # even more bizarre with the repr that looks like an int
>>> np.array([v + 1]).max()
9223372036854775808
>>> _ > 9223372036854775807
False

If np.array can do it, why not the dtype directly?

Actually, the consistent solution would be to cast to object instead of float, which numpy does in other places as well:

>>> np.array([np.iinfo('uint64').max + 1])
array([18446744073709551616], dtype=object)

ahaldane · 2018-12-27T17:52:42Z

I think you're hitting a number of different issues. The second one below seems relevant.

As Eric said, one issue is that in Numpy's casting rules (different from C's casting rules) mixed operations like uint > int casts both operands to float64, and that rule is set in stone (we can't change it). The logic is that only float64 can hold values in the range of both int and uint. Another part of the puzzle is that float64 cannot represent integers exactly, so that in many cases adding or subtracting 1 to the float does nothing. Eg note that 6242554005755503616. + 1 == 6242554005755503616.. I don't think anything special happens at 1e19, rather I think the values you chose just semi-randomly happened to round one way or the other. For instance 2264164065900722048 doesn't follow the pattern. I think that explains your first example.

The other issue you hit has to do with mixing numpy ints and python ints. In this case I think there is something a bit weird: The array coercion of python-ints behaves differently from the scalar coercion. Here's another example that avoids using max:

>>> v = 9223372036854775807  # python int
>>> a = np.uint64(v+1)
>>> a > v   # both converted to float64
False
>>> np.array(a) > v  # both converted to float64
False
>>> np.array([a]) > v # both converted to uint64
array([ True])

Perhps relevant: #2524 #944 #1051. Maybe this statement is relevant?:

The scalar/array coercion rule is that the type of the scalar will be ignored if the scalar is of the same kind or a lesser kind than the array.

h-vetinari · 2019-01-08T21:46:46Z

Followed some links in #2524 and found more xrefs in a comment:

I see quite a few issues with uint64 and all of them seem to be "related":
#5745, #7126, #4151, #2955, #2524 (since 2012!), #8002 - all open.

and then some other xrefs in #5668: #8809 #8853 #8851 #5746.

Also found more related xrefs after searching issues for uint64: #2533 #2955 #7449 #9982 #10148 #10614 #11020

In #8851 (which is very similar; just addressing equality instead of all comparators), @eric-wieser observes after a nice (read: terrible) example in the OP that:

Frankly, this is PHP-like behaviour, and not something I want to see happening quietly in python

That issue also contains some more useful discussions. #11801 is another almost-duplicate of this issue.

@ahaldane: [...] and that rule is set in stone (we can't change it).

A year ago, you made it sound less set-in-stone ;-)

Personally, if we could do everything over again, I would favor redoing all the casting/coercion so that numpy casting essentially behaves like C casting, and follows the spirit of C-casting for cases that don't exist in C. The numpy casting rules are kind of weird and sometimes confusing (eg the conversion of uint64 + int64 to float64). But we've discussed this a lot on the list, it seems hard to change.

This issue is clearly hitting many people in different situations (the above xrefs total 18 open issues!). Why not fix the fundamentals (surely there's other issues that would benefit from similar treatment), and use that occasion to bump to 2.0?

As an aside, I didn't even know that a NumPy 2.0 milestone exists (has been closed in 2015, even though open issues remain on it). Found an issue discussing this more: #9066

In it, @charris notes that:

At this point, NumPy 2.0 is a creature of mythology.

I'm making a bold prediction that this creature will eventually rear its head before numpy 1.50... ;-)

PS. On third thought, maybe numpy 2.0 needs to be limited to some selected few breaking changes; otherwise, trying to fit in all of the accumulated backpressure probably makes it an almost impossible undertaking.

h-vetinari · 2019-01-25T21:04:35Z

@ahaldane Any comments to the above?

PS. Here's all the duckies in one neat row:

open related issues (all about uint64 breaking due to casting to float, or closely related): np.left_shift and np.right_shift fail on np.uint64 scalar types (Trac #1931) #2524, uint64 modulo arithmetic broken (Trac #1940) #2533, bitwise_and with uint64 #2955, r/l shift on uint64 scalar and python int failing #4151, shift operator does not work with numpy arrays with dtype=uint64 #5668, uint64 converted silently to float64 when adding an int #5745, Cannot reliably compare uint64 values to integers? #5746, NumPy's implementation of scalar integer ** integer has very weird casting #7449, bitshift typing on unsigned operands is tedious to work with #8002, BUG: Inconsistent behaviour of return type of int64.__pow__ #8809, BUG: incorrect equality of np.int64 and np.uint64 #8851, numpy.eye raises TypeError on numpy.uint64 input #9982, Unexpected integer division/modulus result when using np.uint64 #10148, DOC: Improve the description of the dtype parameter in numpy.array docstring #10614, .dtype.type class object is not preserved over some operations #11020, Invalid integer comparison #11801
open related PRs: BUG: Prevent power(uint64, int64) casting to float #8853
closed/duplicate issues (prob. very incomplete): np.left_shift and np.right_shift fail on np.uint64 scalar types (Trac #1930) #2523, np.uint64 + int == ... np.float64 ?? #7126, uint64 is "promoted" to float64 when adding an integer #8522, Numpy uint64 added to Python int yields float #8986
loosely related (prob. very incomplete): The epic dtype cleanup plan #2899, BUG: Inconsistent type resolution for 0d arrays #10322, Why does indexing an array after boolean comparison provide different results than doing a boolean comparison of an index? #10982, The type conversion problem of array operations #11485, WIP, NEP: add dtype design NEP - DtypeMeta, Dtype, and hierarchy #12660

ahaldane · 2019-01-26T04:39:41Z

Well first thanks for all the time and effort collecting all the issues, I know it takes work. When considering these complex back-compat problems it's really important to be able to reference past discussion, and IMO in a project like numpy this kind of knowledge-organizing work is comparably important to the actual coding.

Now after looking over all the issues, my personal opinion is that 1. I dislike numpy's casting behavior and wish it were C-like, but 2. I also think that at this point we are stuck with the current coercion rules unless we do a numpy 2.0, and I don't think this change alone justifies a numpy 2.0. The problem is numpy is now a foundation of a large computing ecosystem, and a change like this one would very likely break a large fraction of downstream projects in hard-to-debug ways. I would guess more than half. For the same reason that old decisions in the C language have been locked in, we get locked in too.

Plus, it's perfectly possible to write correct algorithms in numpy despite this problem, by keeping track of the type of your variables or casting as needed. So this issue can be seen as more of an annoyance rather than a bug.

But, I think we can add this problem to the list of things that would motivate a numpy 2.0. I just put this issue on the wiki at https://github.com/numpy/numpy/wiki/Backwards-incompatible-ideas-for-a-major-release. Once enough things get added to the list, there might be momentum for 2.0. I suspect that when the "epic dtype plan" gets moving the list will get longer.

mhvk · 2019-01-27T18:58:37Z

@h-vetinari - thanks also from me for the summary; it is very useful to have things in one place! Since you just went over them, were all those for casting, and were there examples for things other than int64 combined with uint64? (If you can, maybe edit your comment above and add a bit of description.)

@ahaldane - I worry that we may be letting parts that are truly difficult to fix get in the way of fixing simpler ones. For instance, in the example at hand, of operations on scalars, it is not obvious to me that much would break if we changed the rule for uin64 with python int (or int64) from casting to float64 to casting both to python int. Indeed, arguably that is more consistent with the general attempt to ensure the result can be properly represented, as python int can have arbitrary length.

The bigger difficulty is what to do with arrays. One option there might be to move towards a solution by changing the casting rules for int (or more specifically, just for uint64 and int64, which I think is the only pair that is really problematic) to try a cast to int64 first, and if that fails for any member of the array, warn and recast to float. I think the check for failure while casting would not cause too much of a slow-down, and this would get us towards a situation where we can (eventually) abandon conversion to float altogether.

h-vetinari · 2019-01-27T20:08:54Z

@mhvk
I originally compiled that list 19 days ago, see previous comment. Just added the duplicates and some more loosely related issues on the second pass. AFAIR, all issues I tagged where about uint64 being wrongly cast to float in some operation.

The bigger difficulty is what to do with arrays.

Arrays already work correctly in this case:

>>> v = np.iinfo('int64').max
>>> arr = np.array([v + 1])
>>> arr.dtype
dtype('uint64')
>>>
>>> arr > v  # arrays are working
array([ True])
>>>
>>> arr.max() > v  # scalar case is the problem
False

Furthermore, the scalar case only breaks between np.iinfo('int64').max + 1 and np.iinfo('uint64').max

>>> v = np.iinfo('uint64').max
>>> arr = np.array([v + 1])
>>> arr.dtype
dtype('O')
>>>
>>> arr > v
array([ True])
>>>
>>> arr.max() > v  # correct again
True

ahaldane · 2019-01-29T16:59:59Z

[for] operations on scalars ... change the rule for uin64 with python int (or int64) from casting to float64 to casting both to python int.

[for arrays] changing the casting rules for int (or more specifically, just for uint64 and int64, which I think is the only pair that is really problematic) to try a cast to int64 first, and if that fails for any member of the array, warn and recast to float.

I guess the question is how much these changes would break. I don't have hard evidence, but my impression is that it would break a lot of code and so any change like this is infeasible, even for the python scalars. To get a better idea of whether that's true or not I think someone would have to code it up just to try running some project's test suites.

By the way, C coercion for ints does uint64 + int64 -> uint64, not int64. I have a gist with an outline for how C-style coercion would work in numpy here: https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76 discussing this. Not only int+uint would change in coercion behavior relative to current numpy, but also int+float coercions as well.

mhvk · 2019-01-29T17:22:57Z

Agreed that some better data would be needed... But also good to simply collect how in an ideal world we would like things to work, so we can decide which are actually fine as is, which possible to move to and which are bad but unchangeable.

h-vetinari · 2019-02-04T07:07:55Z

@ahaldane: By the way, C coercion for ints does uint64 + int64 -> uint64, not int64.

But C does not have the open-ended python ints. It should at least be considered that this casts to object (which array.__init__ already does).

>>> np.array([np.iinfo('uint64').max+1])
array([18446744073709551616], dtype=object)

This would - I believe - solve >90% of the issues I linked above.

I have a gist with an outline for how C-style coercion would work in numpy here: https://gist.github.com/ahaldane/0f5ade49730e1a5d16ff6df4303f2e76 discussing this.

Thanks for this! You bring up another good point that might be worth tackling at the same time: silent int overflow. I think this is terrible (silently changing results; very hard to debug), and really should either raise/warn/upcast.

eric-wieser · 2019-02-04T07:13:17Z

By the way, C coercion for ints does uint64 + int64 -> uint64, not int64.

Proof of that statement: https://godbolt.org/z/CywY13

hacatu · 2022-05-18T01:16:16Z

I think that for +, -, and *, switching to uint64 (as C does) instead of float64 as the result of combining a signed and unsigned value would not be likely to break existing programs. / is more tricky of course, because the exact result of division is not always an integer.

Furthermore, I think that these operations can and should dispatch on output type when used in compound assignment. That is to say, if we have arr[i] <<= 1, we don't have to do a whole lot of guesswork about what type we should make arr[i] << 1 in this context because it should obviously be the same type as arr[i].

Even though it might be difficult to change this behavior now because programs that depend on the incorrect behavior no doubt exist, it would be very nice to have heterogeneous compound assignments (signed += unsigned and visa versa) work correctly

mattip · 2023-09-18T05:28:16Z

Worth revisiting and possibly closing after the change in casting rules: NumPy will now not automatically cast for you.

seberg · 2023-09-18T11:04:56Z

There are other issues around this, but I had fixed the equality on main (actually I think alrady in 1.25).

mattip added 01 - Enhancement 23 - Wish List labels Jan 27, 2019

h-vetinari mentioned this issue Feb 24, 2019

REF: Fix maybe_promote pandas-dev/pandas#25425

Closed

3 tasks

ahaldane mentioned this issue Mar 23, 2019

Unexpected integer division/modulus result when using np.uint64 #10148

Closed

seberg added the 60 - Major release Issues that need or may be better addressed in a major release label May 2, 2019

shoyer added the 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. label May 10, 2019

h-vetinari mentioned this issue Oct 10, 2019

TST: Fix xfails for non-box maybe_promote on integer dtypes pandas-dev/pandas#28864

Merged

underchemist mentioned this issue Oct 16, 2019

Numpy casting of list of large python ints to np.float64 in cocoEval.evaluate cocodataset/cocoapi#330

Open

seberg mentioned this issue Dec 10, 2019

numpy.uint64 + int = float64 #15084

Closed

seberg mentioned this issue Nov 9, 2020

Unintended behaviour numpy 1.19.2 when using numpy.random.choice with a list of large integers #17722

Closed

h-vetinari mentioned this issue Jun 28, 2021

Multiplying a np.uint by an integer results in a float #19294

Closed

This was referenced Feb 5, 2022

MAINT: optimize.linprog: make HiGHS default and deprecate old methods scipy/scipy#15512

Merged

DISCUSS: scipy version number after 1.9 scipy/scipy#15528

Closed

h-vetinari mentioned this issue Mar 20, 2022

BUG: silent overflow in np.prod #21219

Open

This was referenced May 12, 2022

Invalid integer comparison #11801

Closed

BUG: incorrect equality of np.int64 and np.uint64 #8851

Closed

seberg changed the title ~~Broken comparators for uint64-dtype~~ BUG: incorrect equality of np.int64 and np.uint64 May 12, 2022

seberg mentioned this issue Oct 20, 2022

BUG: int by numpy.int64 division incorrect for 2**63 < dividend < 2**64 #22457

Closed

This was referenced Nov 19, 2022

uint64 modulo arithmetic broken (Trac #1940) #2533

Closed

BUG: Non-weak promotion of Python integers can be cumbersome for unsigned integers #22624

Open

ceesem mentioned this issue May 18, 2023

Relax uint64 casting of root ids seung-lab/cloud-volume#586

Open

seberg closed this as completed Sep 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: incorrect equality of np.int64 and np.uint64 #12525

BUG: incorrect equality of np.int64 and np.uint64 #12525

h-vetinari commented Dec 10, 2018 •

edited

eric-wieser commented Dec 10, 2018

h-vetinari commented Dec 10, 2018

ahaldane commented Dec 27, 2018

h-vetinari commented Jan 8, 2019

h-vetinari commented Jan 25, 2019 •

edited

ahaldane commented Jan 26, 2019

mhvk commented Jan 27, 2019

h-vetinari commented Jan 27, 2019

ahaldane commented Jan 29, 2019

mhvk commented Jan 29, 2019

h-vetinari commented Feb 4, 2019

eric-wieser commented Feb 4, 2019

hacatu commented May 18, 2022

mattip commented Sep 18, 2023

seberg commented Sep 18, 2023

BUG: incorrect equality of np.int64 and np.uint64 #12525

BUG: incorrect equality of np.int64 and np.uint64 #12525

Comments

h-vetinari commented Dec 10, 2018 • edited

Reproducing code example:

Numpy/Python version information:

eric-wieser commented Dec 10, 2018

h-vetinari commented Dec 10, 2018

ahaldane commented Dec 27, 2018

h-vetinari commented Jan 8, 2019

h-vetinari commented Jan 25, 2019 • edited

ahaldane commented Jan 26, 2019

mhvk commented Jan 27, 2019

h-vetinari commented Jan 27, 2019

ahaldane commented Jan 29, 2019

mhvk commented Jan 29, 2019

h-vetinari commented Feb 4, 2019

eric-wieser commented Feb 4, 2019

hacatu commented May 18, 2022

mattip commented Sep 18, 2023

seberg commented Sep 18, 2023

h-vetinari commented Dec 10, 2018 •

edited

h-vetinari commented Jan 25, 2019 •

edited