Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the design decision regarding whether operations can return another backend's object? #58312

Open
anmyachev opened this issue Apr 18, 2024 · 2 comments
Labels
Arrow pyarrow functionality Dtype Conversions Unexpected or buggy dtype conversions NA - MaskedArrays Related to pd.NA and nullable extension arrays pyarrow dtype retention op with pyarrow dtype -> expect pyarrow result Usage Question

Comments

@anmyachev
Copy link
Contributor

Let's assume we have a pyarrow-backed dataframe/series. Can any operation return numpy-backed dataframe/series, or should this never happen (and if it does, it will be treated as a bug)?

I didn't find an answer to this question here:
https://pandas.pydata.org/pandas-docs/version/2.2.2/user_guide/pyarrow.html

@mroeschke mroeschke added Dtype Conversions Unexpected or buggy dtype conversions Usage Question Arrow pyarrow functionality NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Apr 18, 2024
@mroeschke
Copy link
Member

Thanks for the question. I would say generally an operation with a specific a dtype backend (pyarrow, nullable numpy, normal numpy) should return the same dtype backend unless

  1. An API states it should return a specific type, e.g. ExtensionArray method APIs
  2. Mixing two different backends in an operation, e.g. int64[pyarrow] + Int64. (Probably not well defined in all cases)

@WillAyd
Copy link
Member

WillAyd commented Apr 18, 2024

This is a great question and something that it would serve us to best define in the future though. For simple types like ints, floats, etc... where NumPy and pyarrow share the same storage layout (at least for the data buffers), I think NumPy + Arrow should return Arrow; otherwise it would be a lossy operation

For more complex types that's an open question, but I would still prefer for Arrow types to come out on top

@jbrockmendel jbrockmendel added the pyarrow dtype retention op with pyarrow dtype -> expect pyarrow result label Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Dtype Conversions Unexpected or buggy dtype conversions NA - MaskedArrays Related to pd.NA and nullable extension arrays pyarrow dtype retention op with pyarrow dtype -> expect pyarrow result Usage Question
Projects
None yet
Development

No branches or pull requests

4 participants