Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict_in_sample with exogenous variables #569

Open
bloukanov opened this issue Feb 1, 2024 · 0 comments
Open

predict_in_sample with exogenous variables #569

bloukanov opened this issue Feb 1, 2024 · 0 comments

Comments

@bloukanov
Copy link

bloukanov commented Feb 1, 2024

Describe the bug

If an ARIMA model is fit with exogenous variables, in-sample predictions do not appear to depend on the X values provided to predict_in_sample. In fact, even arrays with missing columns and rows are allowed. Are the true X values in sample saved somewhere, so they don't need to be provided? Or am I missing something here..?

To Reproduce

import pmdarima as pm
from pmdarima import model_selection
import numpy as np
import pandas as pd

np.random.seed(42)

y = pm.datasets.load_wineind()
df = pd.DataFrame(
    {
        "x1": y * np.random.uniform(0, 0.5, len(y)) + np.random.randint(1, 1000, len(y)),
        "x2": y * np.random.uniform(0.5, 0.7, len(y)) + np.random.randint(1, 10000, len(y)),
    }
)
df["y"] = y
train, test = model_selection.train_test_split(df, train_size=150)

arima = pm.auto_arima(
    train["y"],
    train.drop(columns="y"),
    error_action="ignore",
    trace=True,
    suppress_warnings=True,
    maxiter=5,
    seasonal=True,
    m=12,
)


# preds1 takes the expected X args
preds1 = arima.predict_in_sample(X=train.drop(columns="y"))

# preds2 takes xargs with the correct dims, but different values from those used for preds1
preds2 = arima.predict_in_sample(X=train.drop(columns="y") + 1000)

# preds3 takes only x2, not x1, and x2 is subset to only 10 observations
preds3 = arima.predict_in_sample(X=train[:10].drop(columns=["y", "x1"]))

len(preds1)  # 150
len(preds2)  # 150
len(preds3)  # 150

all(preds1 == preds2)  # True
all(preds2 == preds3)  # True

arima.summary()  # To confirm that indeed x1 and x2 are in the model

Versions

System:
    python: 3.11.7 (main, Jan 24 2024, 16:45:17) [Clang 15.0.0 (clang-1500.1.0.2.5)]
executable: /Users/my-user/repos/forecasting/.venv/bin/python
   machine: macOS-14.2.1-arm64-arm-64bit

Python dependencies:
 setuptools: 69.0.3
        pip: 23.3.1
    sklearn: 1.4.0
statsmodels: 0.14.1
      numpy: 1.26.3
      scipy: 1.12.0
     Cython: 3.0.8
     pandas: 2.2.0
     joblib: 1.3.2
   pmdarima: 2.0.4

Expected Behavior

I expect the code to break if an X of incorrect dimensions is provided, and predictions to depend on the values of a correctly-dimensioned X.

Actual Behavior

The code does not break, and there is no difference in output.

Additional Context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant