Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcasting errors with multi-dimensional boolean masks #13255

Closed
qualiaa opened this issue Apr 3, 2019 · 3 comments
Closed

Broadcasting errors with multi-dimensional boolean masks #13255

qualiaa opened this issue Apr 3, 2019 · 3 comments

Comments

@qualiaa
Copy link

qualiaa commented Apr 3, 2019

Attempting to index a 2D array of shape [N, M] with two 1D boolean masks, shapes N and M, certain combinations of True and False lead to a broadcasting error (particularly when one is all false). I'm not sure if this behaviour is expected but it seems highly surprising and undesirable.

In the example below, x[[False, True, True], [True, True, True]] errors, while x[[False, True, True], True] and x[[False, True, True]] have the expected behavior.

Reproducing code example:

import numpy as np
from itertools import product

x = np.zeros((3,3))
mask_1d = [*product([True, False], repeat=3)]

for row_mask, col_mask in product(mask_1d, mask_1d):
    try:
        x[row_mask, col_mask]
    except IndexError as e:
        print(row_mask, col_mask)
        print(e)

Error message:

     (True, True, True) (True, True, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,) 
(True, True, True) (True, False, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,) 
(True, True, True) (False, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,) 
(True, True, True) (False, False, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (0,) 
(True, True, False) (True, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,) 
(True, True, False) (False, False, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (0,) 
(True, False, True) (True, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,) 
(True, False, True) (False, False, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (0,) 
(False, True, True) (True, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,) 
(False, True, True) (False, False, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (0,) 
(False, False, False) (True, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (3,) 
(False, False, False) (True, True, False)
shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (2,) 
(False, False, False) (True, False, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (2,) 
(False, False, False) (False, True, True)
shape mismatch: indexing arrays could not be broadcast together with shapes (0,) (2,)

Numpy/Python version information:

1.16.2 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) 
[GCC 7.3.0]
@mhvk
Copy link
Contributor

mhvk commented Apr 3, 2019

With boolean arrays, the code assumes you are trying to index either a single dimension or all elements at the same time - with the choice somewhat unfortunately guessed in a way that allows a single True to be removed. I.e., it turns your row_mask, col_mask into a (2,3) boolean array and then finds that it cannot index the (3,3) array.

Part of the problem is that tuples and lists are treated as equivalent, something we're trying to move away from. Eventually, you'd handle the boolean array index by ensuring the mask was a double list.

For now, though, I fear the only solution is to do x[row_mask][:, col_mask].

cc @eric-wieser, who has been working to deprecate the "treat tuple as list" for indexing operations.

p.s. Most annoying I find this difference:

x = np.arange(9).reshape(3, 3)
# x[[False, True, True], True]
# array([[3, 4, 5],
#        [6, 7, 8]])
x[[False, True, True], False]
# IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (0,) 

@qualiaa
Copy link
Author

qualiaa commented Apr 3, 2019

Yes, x[row_mask][:, col_mask] is what I ended up doing. Thanks for the explanation, I'm glad it's something that's being looked into.

@seberg
Copy link
Member

seberg commented Apr 4, 2019

I think arr[np.ix_(index)] is what you want/are expecting here, or in outher words on outer indexing logic as is in NEP 21: https://github.com/numpy/numpy/blob/master/doc/neps/nep-0021-advanced-indexing.rst

Maybe that will be picked up some time. The NEP also says that at least for the current indexing multiple boolean indices should just be deprecated (I think whether to allow this specific use case may still have been contested – it is consistent, but may not have much use case and be pretty confusing in any case).

@seberg seberg closed this as completed Apr 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants