Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed pd.infer_freq incompatible with Series["timestamp[s][pyarrow]"] #58404

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Expand Up @@ -362,6 +362,7 @@ Datetimelike
- Bug in :class:`Timestamp` constructor failing to raise when ``tz=None`` is explicitly specified in conjunction with timezone-aware ``tzinfo`` or data (:issue:`48688`)
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56382`)
- Bug in :func:`infer_freq` prevented application to :class:`Series` of with ``pyarrow`` dtype. (:issue:`58403`)
- Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
- Bug in setting scalar values with mismatched resolution into arrays with non-nanosecond ``datetime64``, ``timedelta64`` or :class:`DatetimeTZDtype` incorrectly truncating those scalars (:issue:`56410`)

Expand Down
24 changes: 24 additions & 0 deletions pandas/tests/tseries/frequencies/test_inference.py
Expand Up @@ -554,3 +554,27 @@ def test_infer_freq_non_nano_tzaware(tz_aware_fixture):

res = frequencies.infer_freq(dta)
assert res == "B"


@pytest.mark.parametrize(
("data", "expected"),
[
(["2022-01-01T10:00:00", "2022-01-01T10:00:30", "2022-01-01T10:01:00"], "30s"),
(["2022-01-01T10:00:00", "2022-01-01T10:00:37", "2022-01-01T10:01:00"], None),
(["2022-01-01T10:00:00", "2022-01-01T10:00:01", "2022-01-01T10:00:02"], "s"),
(
[
"2022-01-01T10:00:00",
"2022-01-01T10:00:01",
"2022-01-01T10:00:02",
"2022-01-01T10:00:04",
],
None,
),
],
)
@pytest.mark.parametrize("cls", [Index, Series])
def test_infer_freq_pyarrow(data, expected, cls):
pytest.importorskip("pyarrow")
obj = cls(data).astype("timestamp[s][pyarrow]")
assert frequencies.infer_freq(obj) == expected
25 changes: 9 additions & 16 deletions pandas/tseries/frequencies.py
Expand Up @@ -33,10 +33,7 @@
from pandas.util._decorators import cache_readonly

from pandas.core.dtypes.common import is_numeric_dtype
from pandas.core.dtypes.dtypes import (
DatetimeTZDtype,
PeriodDtype,
)
from pandas.core.dtypes.dtypes import PeriodDtype
from pandas.core.dtypes.generic import (
ABCIndex,
ABCSeries,
Expand Down Expand Up @@ -112,20 +109,16 @@ def infer_freq(
>>> pd.infer_freq(idx)
'D'
"""
from pandas.api.types import is_datetime64_any_dtype
from pandas.core.api import DatetimeIndex

if isinstance(index, ABCSeries):
values = index._values
if not (
lib.is_np_dtype(values.dtype, "mM")
or isinstance(values.dtype, DatetimeTZDtype)
or values.dtype == object
):
raise TypeError(
"cannot infer freq from a non-convertible dtype "
f"on a Series of {index.dtype}"
)
index = values
if isinstance(index, ABCSeries) and not (
is_datetime64_any_dtype(index) or index.dtype == object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check dtype.kind == "mM" on ArrowDtype instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit surprised by this comment, isn't pandas.api.types.is_ supposed to be a single source of truth of what is considered a certain data type? How do you remember what are all the conditions to check otherwise?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For internal usages, checking dtype.kind is more performant than api.types functions so that is preferred

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I guess I accidentally removed checking for timedelta. Although no tests failed. I am adding some tests for timedeltas as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another bug: when trying with Index["duration[s][pyarrow]"], it complains about the property asi8 missing. asi8 is denoted as deprecated on general index types. Moreover, the FrequencyInferer assumes numpy dtype and makes use of internals (like index._data._ndarray)

the easiest workaround would be to cast to numpy, but I guess we want to avoid that and keep dtype as pyarrow inside the ops.

):
raise TypeError(
"cannot infer freq from a non-convertible dtype "
f"on a Series of {index.dtype}"
)

inferer: _FrequencyInferer

Expand Down