Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: np.bool_(False) & pd.NA gives pd.NA #58427

Open
Tracked by #58460
jbrockmendel opened this issue Apr 25, 2024 · 2 comments
Open
Tracked by #58460

BUG: np.bool_(False) & pd.NA gives pd.NA #58427

jbrockmendel opened this issue Apr 25, 2024 · 2 comments
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Apr 25, 2024

>>> False & pd.NA
False
>>> np.bool_(False) & pd.NA
<NA>

this surprised me; is it on purpose @jorisvandenbossche?

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 25, 2024
@rhshadrach rhshadrach added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label Apr 26, 2024
tal-ladi added a commit to tal-ladi/pandas that referenced this issue Apr 27, 2024
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Apr 29, 2024

Yes, ideally those are consistent .. This looks like a bug. Similarly, using a numpy array (instead of numpy scalar) also gives this different result:

>>> pd.NA & False
False
>>> pd.NA & np.bool_(False)
<NA>
>>>  pd.NA & np.array([True, False])
array([<NA>, <NA>], dtype=object)

In the implementation of __and__, we are only considering the cases of other being True, False or NA, and so defer to numpy for numpy arrays or scalars (I am not entirely sure how we then end up with the above result, though? Because explicitly putting the NA in a numpy container like np.array([pd.NA], dtype=object) & np.array([True, False]) actually gives the correct result).

Looking at the other dunders, the arithmetic/comparison ops seems to be using helpers like _create_binary_propagating_op, which also handle numpy arrays for other. We should probably do something similar for the logical operators?

@jorisvandenbossche
Copy link
Member

So what I don't understand is how we get this difference in behaviour:

>>> np.array(pd.NA, dtype=object) & np.array([True, False])
array([<NA>, False], dtype=object)

>>> pd.NA & np.array([True, False])
array([<NA>, <NA>], dtype=object)

@jorisvandenbossche jorisvandenbossche added the NA - MaskedArrays Related to pd.NA and nullable extension arrays label Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants