Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed pd.infer_freq incompatible with Series["timestamp[s][pyarrow]"] #58404

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

randolf-scholz
Copy link
Contributor

@randolf-scholz randolf-scholz commented Apr 24, 2024

@randolf-scholz randolf-scholz changed the title fixed #58403 Fixed pd.infer_freq incompatible with Series["timestamp[s][pyarrow]"] Apr 24, 2024
)
index = values
if isinstance(index, ABCSeries) and not (
is_datetime64_any_dtype(index) or index.dtype == object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check dtype.kind == "mM" on ArrowDtype instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit surprised by this comment, isn't pandas.api.types.is_ supposed to be a single source of truth of what is considered a certain data type? How do you remember what are all the conditions to check otherwise?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For internal usages, checking dtype.kind is more performant than api.types functions so that is preferred

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I guess I accidentally removed checking for timedelta. Although no tests failed. I am adding some tests for timedeltas as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another bug: when trying with Index["duration[s][pyarrow]"], it complains about the property asi8 missing. asi8 is denoted as deprecated on general index types. Moreover, the FrequencyInferer assumes numpy dtype and makes use of internals (like index._data._ndarray)

the easiest workaround would be to cast to numpy, but I guess we want to avoid that and keep dtype as pyarrow inside the ops.

@mroeschke mroeschke added Frequency DateOffsets Arrow pyarrow functionality labels Apr 24, 2024
@jbrockmendel
Copy link
Member

This is going to convert to a (non-pyarrow) DatetimeIndex under the hood, which will incur a silent perf penalty. Better to just tell users to convert themselves?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Frequency DateOffsets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.infer_freq incompatible with Series["timestamp[s][pyarrow]"].
3 participants