Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected output from arange with dtype=int #16159

Open
davrot opened this issue May 5, 2020 · 6 comments
Open

Unexpected output from arange with dtype=int #16159

davrot opened this issue May 5, 2020 · 6 comments
Labels
00 - Bug component: numpy._core triaged Issue/PR that was discussed in a triage meeting
Projects

Comments

@davrot
Copy link

davrot commented May 5, 2020

In [3]: np.arange(-3, 0, 0.5, dtype=int)
Out[3]: array([-3, -2, -1, 0, 1, 2])

Well, to see a "1" and a "2" was a bit unexpected for us since both numbers are a bit bigger than 0.

Normally, this is the result without dtype=int:

In [2]: np.arange(-3, 0, 0.5)                                                  
Out[2]: array([-3. , -2.5, -2. , -1.5, -1. , -0.5])
and we should get this with dtype=int:
In [4]: np.arange(-3, 0, 0.5).astype(int)                                      
Out[4]: array([-3, -2, -2, -1, -1,  0])

The numpy manual states:
dtype : dtype
The type of the output array. If dtype is not given, infer the data type from the other input arguments.

Thus it should only effect the output array, right?

import numpy as np
print(np.arange(-3, 0, 0.5))
print(np.arange(-3, 0, 0.5, dtype=int))
print(np.arange(-3, 0, 0.5).astype(int))

Error message:

No error message...

Numpy/Python version information:

We tested it under numpy '1.18.4' (pure Python 3.7.6) as well as '1.18.1' (Anaconda 3.7 with the latest update applied). Same result.

1.18.4 3.7.6 (default, Feb 28 2020, 15:25:38)
[Clang 11.0.0 (https://github.com/llvm/llvm-project.git eefbff0082c5228e01611f7

1.18.1 3.7.4 (default, Aug 13 2019, 20:35:49)
[GCC 7.3.0]

@eric-wieser
Copy link
Member

eric-wieser commented May 5, 2020

Bugs like this are reported over and over again. For reasons lost to time, I'm fairly confident the implementation of arange is something like:

def arange(start, stop, step, dtype):
    n = (start - stop) // step

    # dtype.type is a cast
    step = dtype.type(start + step) - dtype.type(start)

    # now do what you expect
    return [start + step*i for i in range(n)]

@mattip
Copy link
Member

mattip commented May 5, 2020

Perhaps we should add that pseudo-code to the documentation?

@seberg
Copy link
Member

seberg commented May 5, 2020

Yeah, that code is correct (not 100% sure about the n calculation though). This specific example is pretty extreme, and obviously broken, maybe we can actually get rid of it somehow?

arange is repeatedly hated for the arguably broken definition, but I cannot think of a really good proposal to address it (although maybe one came up before).
Its not like we can change arange behaviour for floats well (maybe precision fixups, but end-point changes are no good IMO). So we would need to create a new function... But then in most cases it seems to me that linspace is better than a "correct" float arange, I am not sure that a corrected float-arange actually has too many use-cases.

In the end, I guess I would like a well thought out proposal :/...

@davrot
Copy link
Author

davrot commented May 5, 2020

Getting values that are bigger than "stop" is really not nice and a bit unexpected. If arange is not for float, you could check for the floaty numpy types and raise an exception.

Also the manual entry for dtype really lets the user expect something like an astype(dtype) conversion of only the output.

How about:
1.) Exception for non-integer arguments (i.e. start, stop, step).
2.) Check if stop >= start, otherwise raise an exception
3.) Cast start, stop, step to int64 in the beginning of the function.
4.) astype(dtype) the output

Instead of 1.) you can redirect to linspace inside of arange if a non-integer input is found.

@aryanxk02
Copy link

Hey I'm a complete beginner to open source contribution. Thought of giving it a try. How about this snippet? @eric-wieser

x = []
for i in range(start, stop):
    x.append(i)
    x.append(i+step)
print(np.array(x, dtype))

Bugs like this are reported over and over again. For reasons lost to time, I'm fairly confident the implementation of arange is something like:

def arange(start, stop, step, dtype):
    n = (start - stop) // step

    # dtype.type is a cast
    step = dtype.type(start + step) - dtype.type(start)

    # now do what you expect
    return [start + step*i for i in range(n)]

@InessaPawson InessaPawson added the triage review Issue/PR to be discussed at the next triage meeting label Feb 21, 2023
@InessaPawson InessaPawson added triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Mar 8, 2023
@InessaPawson
Copy link
Member

@aryanxk02 We reviewed the solution you proposed at today's triage meeting. It wouldn't solve the issue. Thank you for giving it a go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: numpy._core triaged Issue/PR that was discussed in a triage meeting
Projects
Development

No branches or pull requests

7 participants