Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARIMA.arima_res_ doesn't store pd.Series name but statsmodels do #535

Open
JavierEscobarOrtiz opened this issue Jan 9, 2023 · 0 comments

Comments

@JavierEscobarOrtiz
Copy link

Describe the question you have

Hello!

We are creating a wrapper in Skforecast for forecasting using ARIMA models and we are using pmdarima as a dependency.

We are trying to apply the append method from statsmodels in ARIMA().arima_res_and we are finding different behavior between pmdarima and statsmodels.

Inside ARIMA.arima_res_ there is an attribute that stores the original endogenous data (ARIMA().arima_res_.model.data.orig_endog). When statsmodels is used, it stores the pd.Series and its name but when pmdarima is used the name is removed.

As result, when we try to apply the append() method we get the following error:

ValueError: Columns must match to concatenate along rows.

Reproducible example:

  • data:
import pandas as pd
import numpy as np

np.random.seed(123)
y_datetime = pd.Series(data=np.random.rand(50))
y_datetime.name = 'y'
y_datetime.index = pd.date_range(start='2000', periods=50, freq='A')
print(y_datetime.head(5))

last_window_datetime = pd.Series(data=np.random.rand(50))
last_window_datetime.name = 'y'
last_window_datetime.index = pd.date_range(start='2050', periods=50, freq='A')

2000-12-31 0.696469
2001-12-31 0.286139
2002-12-31 0.226851
2003-12-31 0.551315
2004-12-31 0.719469
Freq: A-DEC, Name: y, dtype: float64

  • statsmodels: (Here append() works)
from statsmodels.tsa.statespace.sarimax import SARIMAX

mod = SARIMAX(endog=y_datetime, order=(1,1,1))
res = mod.fit()
print(res.model.data.orig_endog.head(5))

new_res = res.append(last_window_datetime, refit=False)

2000-12-31 0.696469
2001-12-31 0.286139
2002-12-31 0.226851
2003-12-31 0.551315
2004-12-31 0.719469
Freq: A-DEC, Name: y, dtype: float64

  • pmdarima: (Here the Name is deleted and append() does not work)
from pmdarima.arima import ARIMA

mod = ARIMA(order=(1,1,1))
mod.fit(y_datetime)
print(mod.arima_res_.model.data.orig_endog.head(5))

mod.arima_res_ = mod.arima_res_.append(last_window_datetime, refit=False)

2000-12-31 0.696469
2001-12-31 0.286139
2002-12-31 0.226851
2003-12-31 0.551315
2004-12-31 0.719469
Freq: A-DEC, dtype: float64

Versions (if necessary)

Session info:

-----
numpy               1.23.5
pandas              1.4.0
pmdarima            2.0.2
pytest              7.1.2
session_info        1.0.0
skforecast          0.7.dev
sklearn             1.1.0
statsmodels         0.13.5
-----
IPython             8.5.0
jupyter_client      7.3.5
jupyter_core        4.11.1
notebook            6.4.12
-----
Python 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)]
Windows-10-10.0.19042-SP0
-----
Session information updated at 2023-01-09 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant