Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SeasonalNaive forecasts are not as expected; expected lag 12 but forecast is rounded and slightly off #806

Open
SaintRod opened this issue Mar 29, 2024 · 2 comments
Labels

Comments

@SaintRod
Copy link

SaintRod commented Mar 29, 2024

What happened + What you expected to happen

I have pandas dataframe with monthly time series data. I am using the SeasonalNaive model because the data has strong annual seasonality (seasonal_length = 12) and YoY would be a good benchmark/baseline. Instead of coding something myself to merely get the lag 12 values (or shift 12 in pandas) I thought to use SeasonalNaive.

I noticed that the forecast from the SeasonalNaive model is not as I expected. I expected $y_{t+1} = y_{t-12}$. That is, I expected the forecast to be the exact value from 12 months ago. Instead, the forecast is rounded and a different value.

For instance, in the example below the forecast for 2024-01-01 is 6779547100 but I expected 6779547060.561772. It's close - residual ($e = y - \hat{y}$) of -75.438... but not what I expected

Perhaps someone could clarify:

  • Whether my expectations/assumptions about SeasonalNaive are wrong; that is, how come the forecast isn't just the exact value from 12 months ago?
  • How come the forecast is rounded?

PS. Thanks all for the great on Nixtla

Versions / Dependencies

  • statsforecast==1.7.3
  • pandas==2.2.1
  • numpy==1.23.5

Reproduction script

# import
import pandas as pd
import numpy as np
import os
from datetime import date
from dateutil.relativedelta import relativedelta
from statsforecast.core import StatsForecast
from statsforecast.models import SeasonalNaive

# settings
os.environ['NIXTLA_ID_AS_COL'] = '1'

# parms
h = 12
periods = h*4

# reproducibility
np.random.seed(123)

# create data
dataDict = {
    "unique_id": "reprex",
    "ds": pd.date_range(start="2021-01-01", periods=periods, freq="MS"),
    "y": np.random.uniform(1e9,9e9,periods),
}

df_data = pd.DataFrame(dataDict)
df_train = df_data.loc[df_data['ds'] < "2024-01-01", :]
df_test = df_data.loc[df_data['ds'] >= "2024-01-01", :]

df_train.reset_index(inplace=True, drop=True)
df_test.reset_index(inplace=True, drop=True)

# define model
SNaive = SeasonalNaive(
    season_length = 12,
    alias = "baseline_yoy"
)

# Instantiate StatsForecast class
fcst = StatsForecast(
    models = [SNaive],
    freq = 'MS',
    n_jobs = 8, 
    verbose = False,
    sort_df = True
)

# forecast
df_forecast = fcst.forecast(
        df = df_train,
        h = h,
        fitted = False,
        sort_df = True,
)

# compare to forecast to actual
tmpDateMask = np.isin([c.date() for c in df_train['ds']], [(c - relativedelta(months=12)).date() for c in df_test['ds']])
df_forecast['y'] = np.array(df_train.loc[tmpDateMask,"y"])
df_forecast['residual'] = df_forecast['y'] - df_forecast['baseline_yoy']

# house cleaning
del tmpDateMask 

# show
df_forecast

Issue Severity

Low: It annoys or frustrates me.

@SaintRod SaintRod added the bug label Mar 29, 2024
@jmoralez
Copy link
Member

Hey. This is most likely because we cast the values to float32. I'll check if we can keep the type instead

@SaintRod
Copy link
Author

SaintRod commented Apr 2, 2024

Hi, @jmoralez . Thanks for the quick reply! Please let me know if I can help w/ anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants