Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: np.where called with ps.Series returns np.array instead of pd.Series #58329

Open
3 tasks done
domsmrz opened this issue Apr 19, 2024 · 5 comments
Open
3 tasks done
Assignees
Labels
Bug Upstream issue Issue related to pandas dependency

Comments

@domsmrz
Copy link
Contributor

domsmrz commented Apr 19, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
s = pd.Series([1,2,3], name='a')
res = np.where(s > 2, s, -s)
print(res)
# res is np.array([-1, -2, 3])

Issue Description

res in the example is np.array. Such behavior is inconsistent with other numpy functions such as np.floor or np.arctan2.

Expected Behavior

res should be Series instead (particularly pd.Series([-1, -2, 3], name='a') in this case.

As for the cases where x and y (i.e., the two last params of np.where) are of mixed types or names, we should probably keep consistency with np.arctan2, meaning:

  • if only one of x and y is pd.Series and the other is np.array we should return pd.Series with the same name as input Series
  • if both x and y are pd.Series of the same name we should return a pd.Series with that name
  • if both x and y are pd.Series but with different names we should return unnamed pd.Series

Installed Versions

INSTALLED VERSIONS

commit : bdc79c1
python : 3.12.3.final.0
python-bits : 64
OS : Windows
OS-release : 11
Version : 10.0.22631
machine : AMD64
processor : AMD64 Family 25 Model 33 Stepping 2, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Germany.1252

pandas : 2.2.1
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : None
pip : 23.3.2
Cython : None
pytest : 8.1.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.23.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : 2.10.0
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None

@domsmrz domsmrz added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 19, 2024
@AkisPanagiotopoulos
Copy link

take

@rhshadrach
Copy link
Member

Is there is anything pandas can do here? You're calling a NumPy function, and I don't think NumPy is deferring to pandas code to define its behavior.

@domsmrz
Copy link
Contributor Author

domsmrz commented Apr 20, 2024

IIUC that is what __array_ufunc__ method is for. At least that is the method that gets invoked for the other methods (e.g., np.floor) and handles the trasformation from array to Series. That being said, I don't know why it isn't invoked in case of np.where and whether there is a reason on the numpy's side or pandas' side.

@rhshadrach rhshadrach added Upstream issue Issue related to pandas dependency and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 20, 2024
@rhshadrach
Copy link
Member

I think this is numpy/numpy#8994; closing as an upstream issue. @domsmrz - let me know you think I'm missing something.

@domsmrz
Copy link
Contributor Author

domsmrz commented Apr 20, 2024

Thanks for linking the issue @rhshadrach . That explains why __array_ufunc__ is not invoked in this case. However we may be able to fix this within pandas by implementing __array_function__ (as per numpy/numpy#5095 (comment) )?

@rhshadrach rhshadrach reopened this Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Upstream issue Issue related to pandas dependency
Projects
None yet
Development

No branches or pull requests

3 participants