Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARIMA(1,1,1) model doesn't seem to discard non-differentiated values after differentiation. #9222

Open
enriicoo opened this issue Apr 18, 2024 · 3 comments

Comments

@enriicoo
Copy link

I can't seem to understand if I'm mastering well enough the ARIMA model from statsmodels. It seems that, although I'm differentiating, and the series does need differentiation, the ARIMA model itself does not erase the N first values equivalent to the differentiation parameter. This is illustrated by the residuals and by the summary that still counts 24 observations even though it should (as far as I understand time series) erase the first N non-differentiated values from the differentiation order N.

Here's an example:

import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt

dates = pd.date_range(start='1970-01-01', periods=24, freq='YS')
time_series = [908000, 902000, 930000, 938000, 946000, 961000, 982000, 1002000,
               1024000, 1006000, 1031000, 1047000, 1077000, 1103000, 1136000,
               1170000, 1181000, 1210000, 1227000, 1264000,
               1309000, 1312000, 1316000, 1349000]

df = pd.DataFrame(time_series, index=dates, columns=['Series'])

model = ARIMA(df['Series'], order=(1, 1, 1))
result_model = model.fit()
residuals = result_model.resid

summary = result_model.summary()
residuals.plot(title='Residuals from ARIMA(1,1,1)')
plt.show()
print(summary)
print(residuals)

Output goes with warnings that seems to be from the same problem. It happens on "model.fit()":

UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
  warn('Non-stationary starting autoregressive parameters'
ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "

And then:

                               SARIMAX Results                                
==============================================================================
Dep. Variable:                 Series   No. Observations:                   24
Model:                 ARIMA(1, 1, 1)   Log Likelihood                -253.629
Date:                Thu, 18 Apr 2024   AIC                            513.259
Time:                        03:32:15   BIC                            516.665
Sample:                    01-01-1970   HQIC                           514.116
                         - 01-01-1993                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ar.L1          0.9999      0.021     48.639      0.000       0.960       1.040
ma.L1         -0.9983      0.258     -3.876      0.000      -1.503      -0.493
sigma2      2.799e+08   4.39e-10   6.38e+17      0.000     2.8e+08     2.8e+08
===================================================================================
Ljung-Box (L1) (Q):                   0.14   Jarque-Bera (JB):                 1.59
Prob(Q):                              0.71   Prob(JB):                         0.45
Heteroskedasticity (H):               1.95   Skew:                            -0.64
Prob(H) (two-sided):                  0.37   Kurtosis:                         3.04
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 1.17e+34. Standard errors may be unstable.

In Gretl, another time-series software, the "No. Observations" would be 23, not 24. The residuals plot also shows the first value. If the same is done in Gretl, the first value would be discarded.
https://i.stack.imgur.com/SwN4n.png

The question was also asked in stackoverflow:
https://stackoverflow.com/questions/78345374/why-it-seems-statsmodels-arima-doesnt-discard-values-on-differentiation

@enriicoo enriicoo changed the title ARIMA(1,1,1) model doesn't seem to discard differentiated values. ARIMA(1,1,1) model doesn't seem to discard non-differentiated values after differentiation. Apr 18, 2024
@ChadFulton
Copy link
Member

Yes, I agree that we should probably change our behavior here to report NaN values.

The reason it is the way it is currently is that for the specified model, the "forecast" for the first period is equal to zero, so the forecast error is equal to the first value. Obviously that is not very useful. In addition, although technically the point forecast is equal to zero, it comes from a diffuse prior and the zero is arbitrary, so reporting the forecast error doesn't really make sense from that perspective either.

@ChadFulton
Copy link
Member

Note that these warnings:

UserWarning: Non-stationary starting autoregressive parameters found. Using zeros as starting parameters.
  warn('Non-stationary starting autoregressive parameters'
ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  warnings.warn("Maximum Likelihood optimization failed to "

are not related to the same issue as the first forecast error.

@enriicoo
Copy link
Author

enriicoo commented May 3, 2024

I've got no experience on the algorithms that make the timeseries models, but these warnings aren't raised when I make the same model in R and Gretl. I would like to understand the nature of these warnings and best ways of solving them. I have to note I know it's a suboptimal sample (n<30) and it might as well have something to do with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants