Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: xreg is rank deficient #826

Open
obiii opened this issue Apr 19, 2024 · 3 comments
Open

ValueError: xreg is rank deficient #826

obiii opened this issue Apr 19, 2024 · 3 comments

Comments

@obiii
Copy link

obiii commented Apr 19, 2024

What happened + What you expected to happen

Hi,

I am trying to use exogenous features for statsForecast.fit method. For some reason, I am unable to do so as it says:

ValueError: xreg is rank deficient

I amusing one-hot encoding for the month and due to that some columns have 0 throughout in the example data below, but with full data, I do not have any zero cols, and there are no constant columns in the data as well. Also, there are no features that have an absolute correlation of more than 0.7.

Versions / Dependencies

python: 311
statsForecast: 1.7.4

Reproduction script

models = [
            AutoARIMA(season_length=31, nmodels=94, allowdrift=True),
            # AutoCES(season_length=30),
            AutoETS(season_length=31),
            HoltWinters(season_length=31),
            MSTL(season_length=31, trend_forecaster=AutoARIMA(), alias="MSTL-ARIMA"),
            MSTL(season_length=31),
            # AutoTheta(season_length=31),
            # DOT(season_length=31),
            # SeasonalWindowAverage(
            #     window_size=60, season_length=30
            # ),
            # SeasonalWindowAverage(
            #     window_size=90, season_length=30, alias="SeasWA30-93"
            # ),
            # SeasonalWindowAverage(
            #     window_size=120, season_length=30, alias="SeasWA30-120"
            # ),
            RandomWalkWithDrift(),
            SeasonalNaive(season_length=31)
        ]

dc_models = StatsForecast(
                models=models,
                freq="D",
                n_jobs=-1,
                verbose=True
            )

data = {
    'ds': ['2024-04-01', '2024-04-02'],
    'unique_id': [1, 2],
    'y': [100, 200],
    'holiday': [0, 1],
    'daysuntilendmonth': [10, 9],
    'tax_return': [1, 0],
    'bailiff_finland': [0, 1],
    'salary': [5000, 6000],
    'day_of_week_0': [0, 0],
    'day_of_week_1': [0, 1],
    'day_of_week_2': [1, 0],
    'day_of_week_3': [0, 0],
    'day_of_week_4': [0, 0],
    'day_of_week_5': [0, 0],
    'day_of_week_6': [0, 0],
    'month_indicator_1': [0, 0],
    'month_indicator_2': [0, 0],
    'month_indicator_3': [0, 0],
    'month_indicator_4': [1, 1],
    'month_indicator_5': [0, 0],
    'month_indicator_6': [0, 0],
    'month_indicator_7': [0, 0],
    'month_indicator_8': [0, 0],
    'month_indicator_9': [0, 0],
    'month_indicator_10': [0, 0],
    'month_indicator_11': [0, 0],
    'month_indicator_12': [0, 0],
    'quarter_1': [0, 0],
    'quarter_2': [1, 1],
    'quarter_3': [0, 0],
    'quarter_4': [0, 0]
}
data= pd.DataFrame(data)
data['ds'] = pd.to_datetime(data['ds'])
exog = True
dc_models.fit(df = data if exog else data[['ds', 'unique_id', 'y']], prediction_intervals=None)

Updated: If I remove both the day_of_week and month_indicator one-hot encodings, it works. But I am not sure what could be a reason behind this. Also, is there any other way to include month as it is an important feature.

Issue Severity

High: It blocks me from completing my task.

@obiii obiii added the bug label Apr 19, 2024
@jmoralez
Copy link
Member

Hey @obiii, thanks for using statsforecast. Can you please provide a minimal reproducible example? You can follow the tips here.

@obiii
Copy link
Author

obiii commented Apr 24, 2024

Hey @obiii, thanks for using statsforecast. Can you please provide a minimal reproducible example? You can follow the tips here.

Hi @jmoralez I have updated the question now.

@jmoralez
Copy link
Member

Thanks! I believe this is due to the colinearity that the dummies introduce, can you try dropping one of the levels? i.e. use 6 dummies for day of week, 11 for month and 3 for quarters. You can read more about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants