Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input contains NaN for a non NaN data #573

Open
tifa64 opened this issue Mar 18, 2024 · 0 comments
Open

Input contains NaN for a non NaN data #573

tifa64 opened this issue Mar 18, 2024 · 0 comments

Comments

@tifa64
Copy link

tifa64 commented Mar 18, 2024

Describe the question you have

Hello maintainers, I want to understand why this scenario happens, I have the following timeseries

import pandas as pd
data = {
    'date': pd.date_range(start='2023-01-01', periods=10, freq='MS'),
    'value': [1, 3, 3, 4, 3, 2, 1, 1, 3, 2]
}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)

Which yields this ts

            value
date             
2023-01-01      1
2023-02-01      3
2023-03-01      3
2023-04-01      4
2023-05-01      3
2023-06-01      2
2023-07-01      1
2023-08-01      1
2023-09-01      3
2023-10-01      2

image

and when I try and fit the model, it yields these information:

fitted_model = auto_arima(
                    y=df['value'],
                    max_iter=15,
                    max_d=1,
                    method='nm',
                    seasonal=False)
fitted_model

and when I try and fit the model, it yields these information:

ARIMA(2,0,2)(0,0,0)[0]          

Then I try to predict

fitted_model.predict(
                    n_periods=2,
                    return_conf_int=False)

and shows below error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [1047], line 1
----> 1 fitted_model.predict(
      2                     n_periods=2,
      3                     return_conf_int=False)

File ~/cluster-env/clonedenv/lib/python3.10/site-packages/pmdarima/arima/arima.py:791, in ARIMA.predict(self, n_periods, X, return_conf_int, alpha, **kwargs)
    788 arima = self.arima_res_
    789 end = arima.nobs + n_periods - 1
--> 791 f, conf_int = _seasonal_prediction_with_confidence(
    792     arima_res=arima,
    793     start=arima.nobs,
    794     end=end,
    795     X=X,
    796     alpha=alpha)
    798 if return_conf_int:
    799     # The confidence intervals may be a Pandas frame if it comes from
    800     # SARIMAX & we want Numpy. We will to duck type it so we don't add
    801     # new explicit requirements for the package
    802     return f, check_array(conf_int, force_all_finite=False)

File ~/cluster-env/clonedenv/lib/python3.10/site-packages/pmdarima/arima/arima.py:203, in _seasonal_prediction_with_confidence(arima_res, start, end, X, alpha, **kwargs)
    199     conf_int[:, 0] = f - q * np.sqrt(var)
    200     conf_int[:, 1] = f + q * np.sqrt(var)
    202 return check_endog(f, dtype=None, copy=False), \
--> 203     check_array(conf_int, copy=False, dtype=None)

File ~/cluster-env/clonedenv/lib/python3.10/site-packages/sklearn/utils/validation.py:899, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    893         raise ValueError(
    894             "Found array with dim %d. %s expected <= 2."
    895             % (array.ndim, estimator_name)
    896         )
    898     if force_all_finite:
--> 899         _assert_all_finite(
    900             array,
    901             input_name=input_name,
    902             estimator_name=estimator_name,
    903             allow_nan=force_all_finite == "allow-nan",
    904         )
    906 if ensure_min_samples > 0:
    907     n_samples = _num_samples(array)

File ~/cluster-env/clonedenv/lib/python3.10/site-packages/sklearn/utils/validation.py:146, in _assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name)
    124         if (
    125             not allow_nan
    126             and estimator_name
   (...)
    130             # Improve the error message on how to handle missing values in
    131             # scikit-learn.
    132             msg_err += (
    133                 f"\n{estimator_name} does not accept missing values"
    134                 " encoded as NaN natively. For supervised learning, you might want"
   (...)
    144                 "#estimators-that-handle-nan-values"
    145             )
--> 146         raise ValueError(msg_err)
    148 # for object dtype data, we only check for NaNs (GH-13254)
    149 elif X.dtype == np.dtype("object") and not allow_nan:

ValueError: Input contains NaN.

However when I increase the data by one data point

data = {
    'date': pd.date_range(start='2023-01-01', periods=11, freq='MS'),
    'value': [1, 3, 3, 4, 3, 2, 1, 1, 3, 2, 2]
}

or when I change to these values

data = {
    'date': pd.date_range(start='2023-01-01', periods=10, freq='MS'),
    'value': [5, 8, 11, 4, 6, 6, 6, 5, 6, 9]
}

or when setting the seasonal parameter to True for the same exact data

The model returned is ARIMA(0,0,0)(0,0,0)[0] intercept and the predictions are fine without errors


Another work around is to put a guradrail of maximum p, q, d to be 1 and it also works.

Can you help me understand why this happens? Is placing a guardrail the correct way to fix this?

Thank you in advance :)

Here is a video of a cute Otter as a digital bribe: https://www.youtube.com/watch?v=8O8iEz2p7rQ
Can you help me understand this behaviour?

Versions (if necessary)

System:
    python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0]
executable: /home/trusted-service-user/cluster-env/clonedenv/bin/python
   machine: Linux-4.15.0-1174-azure-x86_64-with-glibc2.27

Python dependencies:
        pip: 23.3
 setuptools: 65.5.1
    sklearn: 1.1.3
statsmodels: 0.14.0
      numpy: 1.23.4
      scipy: 1.10.1
     Cython: 0.29.32
     pandas: 1.5.3
     joblib: 1.3.2
   pmdarima: 1.8.5
Linux-4.15.0-1174-azure-x86_64-with-glibc2.27
Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0]
pmdarima 1.8.5
NumPy 1.23.4
SciPy 1.10.1
Scikit-Learn 1.1.3
Statsmodels 0.14.0
/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant