Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default int type is platform dependent #9464

Open
eric-wieser opened this issue Jul 26, 2017 · 28 comments
Open

Default int type is platform dependent #9464

eric-wieser opened this issue Jul 26, 2017 · 28 comments

Comments

@eric-wieser
Copy link
Member

eric-wieser commented Jul 26, 2017

np.array([1]).dtype is platform-dependant, presumably because it defaults to np.int_

  1. Is this by design?
  2. If not, can we force it to int64?
@njsmith
Copy link
Member

njsmith commented Jul 26, 2017

It is by design – the idea is that numpy's default int type matches the range of python 2's int, which in turn matches the platform C compiler's long.

Whether this is a good design is another question, especially since python 3 has eliminated this. There have been intermittent discussion about changing it before that you can probably dig up – especially the confusing and error prone way the default is 32 bits on win64.

I suppose one way to move that discussion forward would be to test whether any major packages break if you do make that change.

@pv
Copy link
Member

pv commented Jul 26, 2017

One thing that may break is if someone is using dtype=int and assumes this is somehow related to C long type...

@juliantaylor
Copy link
Contributor

juliantaylor commented Sep 1, 2017

Changing the default int type on windows 64 to 64 bit would imo be an important enough change to warrant breaking software.
The current behavior just causes too many bugs.

That the default int type on 32 bit is 32 bit int is probably not so bad, as it does at least cover the full addressable range and changing it could have performance impact.

@shoyer
Copy link
Member

shoyer commented Mar 20, 2018

We should seriously consider changing this.

In my experience, if a Python library of moderate complexity that uses NumPy does not run Windows specific tests, it probably broken for this reason.

@KelSolaar
Copy link
Contributor

@shoyer : We ran into this exact problem on Windows with @MichaelMauderer on colour-science/colour#431.

I was assuming incorrectly that np.int_ was platform independent.

@eric-wieser
Copy link
Member Author

eric-wieser commented Sep 15, 2018

Perhaps we should drop this default at the same time as python 2, since the sole reason for defaulting to np.int_ was that it matched the size of builtins.int, which in python 3 is not even true.

@lawrence858
Copy link

Ideally numpy should behave the same way across platforms. A colleague of mine uses Windows and recently had to spend some time trying to figure out why a program was yielding different results on his machine than on my Mac. IMO performance considerations pale in comparison to getting correct and consistent results.

@gojomo
Copy link

gojomo commented Jul 7, 2020

Is there any runtime workaround a user could execute, before their other code, to force numpy-on-Windows default types to the same widths as elsewhere? (Perhaps, a data-driven, tamperable mapping of Python types to numpy types?)

As a fresh example of some of the resulting craziness, specifically asking for a array of a type compatible with type(2**32) results in an array that can't store 2**32:

2020-07-07T06:53:20.9528159Z     def testTiny(self):
2020-07-07T06:53:20.9528423Z         a = np.empty(1, dtype=type(2**32))
2020-07-07T06:53:20.9529046Z >       a[0] = 2**32
2020-07-07T06:53:20.9529318Z E       OverflowError: Python int too large to convert to C long

@adeak
Copy link
Contributor

adeak commented Jul 7, 2020

@gojomo I'm not sure that's a right approach anyway. On python 3 type(2**32) is guaranteed to be int, so that's just a more complicated way of saying dtype=int. If you're using a literal like that anyway you could of course use explicit dtype=np.int64.

To make it more dynamic, does dtype=np.array(2**32).dtype work? (Odds are there are even more idiomatic ways to do this.)
EDIT: np.empty_like(2**32, shape=...) is probably it, assuming that works.

@seberg
Copy link
Member

seberg commented Jul 7, 2020

No, I had a PR to add one, maybe I can open that again now that we decided to start the deprecation on some of the aliases: #16535

So either use dtype=np.intp which gives you 32bit on 32bit systems and 64bit on 64bit systems, or use dtype=np.int64 to begin with. That PR made dtype=np.intp the default, which is the simpler change, because intp is fairly common in NumPy already.

@adeak
Copy link
Contributor

adeak commented Jul 7, 2020

I was thinking that if NEP 31 ever happens, it would also make this kind of replacing defaults easily opt-in.

@gojomo
Copy link

gojomo commented Jul 7, 2020

@adeak My snippet's not a literal example; my actual issue is that I've got a list of many ints, which eventually reach 2**32, but a numpy array typed based on the first int breaks on Windows when it reaches 2**32, but works everywhere else.

(I was hoping the snippet highlighted some of the on-the-face absurdity of the Python-to-numpy interaction: shouldn't a reported type for a specific number specifically-communicate a corresponding type wide enough to store it? But I suppose Python is an equal contributor to the problem, as 2**65 & 2**129 have the same problem of reporting as simple int. So it's more a brain-teaser than a guide to better behavior.)

I'd answer the "is numpy's choice a good design?" question in @njsmith's 2017 comment as: "Reasonable way back when, but not anymore, with Python3, & the primacy of 64bit systems, and Microsoft's own phasing-out of WIndows 10's support for 32-bit systems."

Traffic on this issue since looks like it has referenced many places this has caused problems for people, but not yet any extant examples of code that'd break with a changed default. (There's probably some, somewhere.)

If the plunge of changing the default in one swoop is too risky, a call that opts-in to some minimum-width default (or user-chosen default) for all subsequent mappings of Python's int might help. (And then at some later date with warning to Windows users, change the default, but give laggards an option to change it back for a while.)

jarulraj added a commit to georgia-tech-db/evadb that referenced this issue Jan 2, 2023
> Updated code to support both linux and windows (mainly signals and numpy array default type)
Default int type is platform dependent numpy/numpy#9464
> Removed lsof that is not available on windows -- using psutil to detect and kill eva server
> OCR and Facenet UDFs do not work on Windows.
@nicolascofrer
Copy link

Just realized this is the cause I'm getting different results on different machines. This is a very important issue. As others said, getting the same results seems way more important than "breaking something".

@eric-wieser
Copy link
Member Author

As others said, getting the same results seems way more important than "breaking something".

"breaking something" is just another type of "not getting the same results" (between "now" and "then", as opposed to "windows" and "Linux")

@nicolascofrer
Copy link

I think there is a difference between breaking something with an error or warning as opposed to silently give different results. Consider numpy random problems for instance, that might be hard to debug because you dont know if the source of the difference is randomness or not.

@seberg
Copy link
Member

seberg commented Feb 15, 2023

I think there is a difference between breaking something with an error or warning as opposed to silently give different results.

Yes, which is exactly why this is not so simple. You must expect such a change will break a decent amount of users without any error if you change it.

@njh219

This comment was marked as off-topic.

@fmaussion
Copy link

I agree that changing this behavior can only be done silently (all of a sudden, all windows users will get int64 instead of int32 and that it will break an undetermined amount of code X).

At the same time, it will probably uncover painful bugs in an undermined amount of code Y and greatly simplify the behavior for learners or library developers.

What is the typical way to decide on this, when there is no way to estimate X or Y? The path of least resistance is definitely to do nothing in this case...

@seberg
Copy link
Member

seberg commented Feb 15, 2023

@fmaussion writing a brief NEP would be good. At this point, we should probably consider including such change in a major release (but it might still be good to summarize things in a NEP!), since I hope that isn't too far off.

I also suspect that the sane choice is probably (unfortunately) to switch to intp as default (i.e. 64bit on 64bit windows and not attempting any change e.g. on 32bit linux). But NEP would be the place to summarize that.

I added a switch for "use NumPy 2 behavior" very recently, so once there is some general consensus to push for this, there is also a path to start implementing it as planned for the major release.

@fmaussion
Copy link

Thanks @seberg this sounds reasonable - I'll see if I can find the bandwith to get things started but I'll certainly need help.

@seberg
Copy link
Member

seberg commented Feb 15, 2023

Don't hesitate to get in touch with me. There are too many things for the core team to push, so someone helping championing such change makes it much likely to happen!

@njh219

This comment was marked as off-topic.

@xor2k
Copy link
Contributor

xor2k commented Feb 24, 2023

Hi everybody! I just experienced this problem in my project npy-append-array, compare

xor2k/npy-append-array#6

The problem was that while with MacOs and Linux, int64 is the default, for Windows it is int32, even if it is a 64 bit operating system. My solution was to basically replace all Numpy functions with their corresponding Python functions, like numpy.multiply.reduce with prod and numpy.ceil with ceil. I could also have specified dtype=np.int64 or dtype=np.int64 but that would be quite explicit and hopefully not necessary anymore in the future.

If this is API breaking or so, maybe it would be something for Numpy 2.0, wouldn't it?

@xor2k
Copy link
Contributor

xor2k commented Feb 24, 2023

@fmaussion writing a brief NEP would be good. At this point, we should probably consider including such change in a major release (but it might still be good to summarize things in a NEP!), since I hope that isn't too far off.

I also suspect that the sane choice is probably (unfortunately) to switch to intp as default (i.e. 64bit on 64bit windows and not attempting any change e.g. on 32bit linux). But NEP would be the place to summarize that.

I added a switch for "use NumPy 2 behavior" very recently, so once there is some general consensus to push for this, there is also a path to start implementing it as planned for the major release.

I have no bandwidth either, but if you need someone to write that NEP, I can give it a shot.

@joaoe
Copy link

joaoe commented Apr 17, 2023

hi.
This issue causes serious compatibility problems between Windows 64 and Linux/Mac. E.g., another one unionai-oss/pandera#726

@albertopasqualetto
Copy link

I think that this issue should be written in the numpy.array documentation page, because the phrase "[...] NumPy will try to use a default dtype that can represent the values (by applying promotion rules when necessary.)" is misleading.

leroyvn added a commit to eradiate/eradiate that referenced this issue Feb 27, 2024
This commit fixes a platform-specific bug detected on Windows. One major 
issue is that Numpy's default integer type is platform-dependent, and it 
is not the same on Linux/macOS or Windows. This has since a while been 
a subject of discussion and the Numpy community does not seem to agree 
to make the default integer type the same on all platforms
(see e.g. numpy/numpy#9464).

Consequently, one must be very careful when manipulating ints on 
Windows. It turned out that wavelength definitions in mono modes were 
not cast to a floating-point type and remained as int32, leading to 
incorrect calculus when pass to the air scattering coefficient 
computation routine (which computes wavelength ** 4 and overflows int32
value bounds).
leroyvn added a commit to eradiate/eradiate that referenced this issue Feb 27, 2024
This commit fixes a platform-specific bug detected on Windows. One major 
issue is that Numpy's default integer type is platform-dependent, and it 
is not the same on Linux/macOS or Windows. This has since a while been 
a subject of discussion and the Numpy community does not seem to agree 
to make the default integer type the same on all platforms
(see e.g. numpy/numpy#9464).

Consequently, one must be very careful when manipulating ints on 
Windows. It turned out that wavelength definitions in mono modes were 
not cast to a floating-point type and remained as int32, leading to 
incorrect calculus when pass to the air scattering coefficient 
computation routine (which computes wavelength ** 4 and overflows int32
value bounds).
leroyvn added a commit to eradiate/eradiate that referenced this issue Feb 27, 2024
This commit fixes a platform-specific bug detected on Windows. One major 
issue is that Numpy's default integer type is platform-dependent, and it 
is not the same on Linux/macOS or Windows. This has since a while been 
a subject of discussion and the Numpy community does not seem to agree 
to make the default integer type the same on all platforms
(see e.g. numpy/numpy#9464).

Consequently, one must be very careful when manipulating ints on 
Windows. It turned out that wavelength definitions in mono modes were 
not cast to a floating-point type and remained as int32, leading to 
incorrect calculus when pass to the air scattering coefficient 
computation routine (which computes wavelength ** 4 and overflows int32
value bounds).
westonpace added a commit to lancedb/lance that referenced this issue Apr 11, 2024
The test assumes that ray will infer the range as int64. However, it
uses numpy to do the inference and numpy's integer inference is platform
dependent: numpy/numpy#9464
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests