Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

np.random.choice returns out of range index on masked array prob. #17088

Open
shtse8 opened this issue Aug 15, 2020 · 6 comments
Open

np.random.choice returns out of range index on masked array prob. #17088

shtse8 opened this issue Aug 15, 2020 · 6 comments

Comments

@shtse8
Copy link

shtse8 commented Aug 15, 2020

import numpy as np

a = np.array([0.1, 0.2, 0.4, 0.3])
masked_a = np.ma.masked_array(a, [0, 0, 1, 1])
counter = np.zeros(len(a) + 1)
while True:
    action = np.random.choice(len(a), p=masked_a)
    counter[action] += 1
    print(counter / counter.sum())
    
# [0.09931198 0.20076697 0.         0.         0.69992105]

It shouldn't returns an index > 3, but when the last element of masked array is true, it may return index = 4

@WarrenWeckesser
Copy link
Member

WarrenWeckesser commented Aug 15, 2020

The issue also occurs in Generator.choice in the development version of numpy:

In [1]: import numpy as np

In [2]: np.__version__ 
Out[2]: '1.20.0.dev0+986e533'

In [3]: rng = np.random.default_rng()

In [4]: pm = np.ma.masked_array([0.1, 0.2, 0.4, 0.3], mask=[0, 0, 1, 1])

In [5]: rng.choice(4, p=pm, size=12)
Out[5]: array([4, 4, 4, 0, 4, 1, 4, 0, 0, 4, 4, 4])

@WarrenWeckesser
Copy link
Member

Earlier I added the "bug" label, but that's a subjective judgement. There are other functions in numpy that don't handle masked arrays correctly (or better, "correctly", since "correctness" can also be a judgement call) where we don't necessarily consider it a bug. Perhaps this is just an example of undefined behavior, and the short answer for the issue is "don't do that!". It would be nicer, however, if we could raise an exception instead of returning nonsensical results.

It would be easy enough to add an explicit check for a masked array, and raise an error if any of the values are actually masked, but that feels like a very specific, ad hoc fix.

@eric-wieser
Copy link
Member

Does Generator.choice dispatch to __array_function__? If it doesn't, then it probably should.

If it does, then I think we just throw this in with all the other "maskedarray decays because we haven't implemented __array_function__ yet" bugs.

@shtse8
Copy link
Author

shtse8 commented Aug 16, 2020

Earlier I added the "bug" label, but that's a subjective judgement. There are other functions in numpy that don't handle masked arrays correctly (or better, "correctly", since "correctness" can also be a judgement call) where we don't necessarily consider it a bug. Perhaps this is just an example of undefined behavior, and the short answer for the issue is "don't do that!". It would be nicer, however, if we could raise an exception instead of returning nonsensical results.

It would be easy enough to add an explicit check for a masked array, and raise an error if any of the values are actually masked, but that feels like a very specific, ad hoc fix.

I think if it doesn't handle it correctly, it shouldn't allow passing masked array (should raise an exception) but works correct mostly (it works well except some cases). it may cause potential issues for production environment.

@eric-wieser eric-wieser added the component: numpy.ma masked arrays label Aug 16, 2020
@bashtage
Copy link
Contributor

There is no use of __array_function__ in Generator. While many force inputs to be well behaved C contiguous, it is probably that case that some functions do not always force this religiously enough.

@bashtage
Copy link
Contributor

bashtage commented Aug 18, 2020

If you add NPY_ARRAY_ENSUREARRAY then you end up with a ndarray that is full and ignores the mask. Hard to say this is "correct" to do. In @WarrenWeckesser's example, this ends up using pm=np.array([.1,.2,.3,.4]) as the probabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants