Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] TimeSeriesDataFrame cannot convert frequency to business hours #4139

Open
3 tasks done
canerturkmen opened this issue Apr 25, 2024 · 1 comment
Open
3 tasks done
Labels
bug Something isn't working module: timeseries related to the timeseries module

Comments

@canerturkmen
Copy link
Contributor

Bug Report Checklist

  • I provided code that demonstrates a minimal reproducible example.
  • I confirmed bug exists on the latest mainline of AutoGluon via source install.
  • I confirmed bug exists on the latest stable version of AutoGluon.

Describe the bug
Pandas DataFrames cannot resample to "bh" or "cbh" (business hours or custom business hours) frequency. This leads to TimeSeriesDataFrame.convert_frequency failing when these frequencies are provided.

Several related pandas issues have been open and closed. e.g., pandas-dev/pandas#12351

To Reproduce

import pandas as pd

from autogluon.timeseries import TimeSeriesDataFrame

df = pd.DataFrame(
    {"values": [1, 2, 3, 4, 5, 6]},
    index=[
        pd.Timestamp("2017-01-01 10:15"),
        pd.Timestamp("2017-01-01 11:15"),
        pd.Timestamp("2017-01-01 12:15"),
        pd.Timestamp("2017-01-02 10:15"),
        pd.Timestamp("2017-01-02 11:15"),
        pd.Timestamp("2017-01-02 12:15"),
    ],
)

tsdf = TimeSeriesDataFrame.from_data_frame(
    pd.DataFrame(
        {
            "target": [1, 2, 3, 4, 5, 6],
            "item_id": "1",
            "timestamp": [
                pd.Timestamp("2017-01-01 10:15"),
                pd.Timestamp("2017-01-01 11:15"),
                pd.Timestamp("2017-01-01 12:15"),
                pd.Timestamp("2017-01-02 10:15"),
                pd.Timestamp("2017-01-02 11:15"),
                pd.Timestamp("2017-01-02 12:15"),
            ],
        }
    )
)

cbh = pd.offsets.CustomBusinessHour(start="10:00", end="13:00", weekmask="Mon Tue Wed Thu Fri Sat Sun")


for offset in ["bh", cbh]:
    try:
        df.resample(offset).sum()
    except ValueError as e:
        assert str(e) == "Values falls after last bin"
        
        
    try:
        tsdf.convert_frequency(offset)
    except ValueError as e:
        assert str(e) == "Values falls after last bin"
@canerturkmen canerturkmen added bug Something isn't working module: timeseries related to the timeseries module labels Apr 25, 2024
@shchur
Copy link
Collaborator

shchur commented May 2, 2024

A potential workaround for the time being is to resample the data at hourly (H) frequency and fill the values outside of the business hours with NaN (should work well in version >=1.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working module: timeseries related to the timeseries module
Projects
None yet
Development

No branches or pull requests

2 participants