Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TopDown method (proportion_averages, average_proportions) broken in 0.3.0, 0.4.0 and 0.4.1 #253

Open
jmberutich opened this issue Nov 27, 2023 · 6 comments
Labels

Comments

@jmberutich
Copy link

What happened + What you expected to happen

The TopDown methods:

  • proportion_averages
  • average_proportions

Are broken after version 0.2.1.

Error output:

KeyError                                  Traceback (most recent call last)
File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/indexes/base.py:3790, in Index.get_loc(self, key)
   3789 try:
-> 3790     return self._engine.get_loc(casted_key)
   3791 except KeyError as err:

File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'AutoARIMA'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[4], line 40
     33 reconcilers = [
     34     BottomUp(),
     35     TopDown(method='average_proportions'),
     36     MiddleOut(middle_level='Country/Purpose/State',
     37               top_down_method='forecast_proportions')
     38 ]
     39 hrec = HierarchicalReconciliation(reconcilers=reconcilers)
---> 40 Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_train_df,
     41                           S=S, tags=tags)

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/hierarchicalforecast/core.py:280, in HierarchicalReconciliation.reconcile(self, Y_hat_df, S, tags, Y_df, level, intervals_method, num_samples, seed, sort_df, is_balanced)
    278         y_hat_insample = Y_df[model_name].values.reshape(len(S_df), -1).astype(np.float32)
    279     else:
--> 280         y_hat_insample = Y_df.pivot(columns='ds', values=model_name).loc[S_df.index].values.astype(np.float32)
    281     reconciler_args['y_hat_insample'] = y_hat_insample
    283 if has_level and (level is not None):

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/frame.py:9025, in DataFrame.pivot(self, columns, index, values)
   9018 @Substitution("")
   9019 @Appender(_shared_docs["pivot"])
   9020 def pivot(
   9021     self, *, columns, index=lib.no_default, values=lib.no_default
   9022 ) -> DataFrame:
   9023     from pandas.core.reshape.pivot import pivot
-> 9025     return pivot(self, index=index, columns=columns, values=values)

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/reshape/pivot.py:549, in pivot(data, columns, index, values)
    545         indexed = data._constructor(
    546             data[values]._values, index=multiindex, columns=values
    547         )
    548     else:
--> 549         indexed = data._constructor_sliced(data[values]._values, index=multiindex)
    550 # error: Argument 1 to "unstack" of "DataFrame" has incompatible type "Union
    551 # [List[Any], ExtensionArray, ndarray[Any, Any], Index, Series]"; expected
    552 # "Hashable"
    553 result = indexed.unstack(columns_listlike)  # type: ignore[arg-type]

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/frame.py:3893, in DataFrame.__getitem__(self, key)
   3891 if self.columns.nlevels > 1:
   3892     return self._getitem_multilevel(key)
-> 3893 indexer = self.columns.get_loc(key)
   3894 if is_integer(indexer):
   3895     indexer = [indexer]

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/indexes/base.py:3797, in Index.get_loc(self, key)
   3792     if isinstance(casted_key, slice) or (
   3793         isinstance(casted_key, abc.Iterable)
   3794         and any(isinstance(x, slice) for x in casted_key)
   3795     ):
   3796         raise InvalidIndexError(key)
-> 3797     raise KeyError(key) from err
   3798 except TypeError:
   3799     # If we have a listlike key, _check_indexing_error will raise
   3800     #  InvalidIndexError. Otherwise we fall through and re-raise
   3801     #  the TypeError.
   3802     self._check_indexing_error(key)

KeyError: 'AutoARIMA'

Versions / Dependencies

v.0.3.0
v0.4.0
v0.4.1

Reproduction script

# !pip install -U numba statsforecast datasetsforecast
import numpy as np
import pandas as pd

#obtain hierarchical dataset
from datasetsforecast.hierarchical import HierarchicalData

# compute base forecast no coherent
from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA, Naive

#obtain hierarchical reconciliation methods and evaluation
from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.evaluation import HierarchicalEvaluation
from hierarchicalforecast.methods import BottomUp, TopDown, MiddleOut


# Load TourismSmall dataset
Y_df, S, tags = HierarchicalData.load('./data', 'TourismSmall')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])

#split train/test sets
Y_test_df  = Y_df.groupby('unique_id').tail(4)
Y_train_df = Y_df.drop(Y_test_df.index)

# Compute base auto-ARIMA predictions
fcst = StatsForecast(df=Y_train_df,
                     models=[AutoARIMA(season_length=4), Naive()],
                     freq='Q', n_jobs=-1)
Y_hat_df = fcst.forecast(h=4)

# Reconcile the base predictions
reconcilers = [
    BottomUp(),
    TopDown(method='average_proportions'),
    MiddleOut(middle_level='Country/Purpose/State',
              top_down_method='forecast_proportions')
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_train_df,
                          S=S, tags=tags)

Issue Severity

None

@jmberutich jmberutich added the bug label Nov 27, 2023
@mjsandoval04

This comment was marked as off-topic.

@jmoralez
Copy link
Member

Hey. The TopDown method requires the in-sample predictions of the models to be provided in Y_df, so if you add the following to your example it should work:

Y_hat_df = fcst.forecast(h=4, fitted=True)  # added fitted=True here
insample_df = fcst.forecast_fitted_values()  # get in-sample predictions
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=insample_df, S=S, tags=tags)  # provide insample_df through Y_df

@jmberutich
Copy link
Author

Thanks for the quick response. I followed your suggestion and adding the model insample predictions worked.

I have some questions regarding the TopDown (average_proportions and proportion_averages) methods are how the are calculated.

  • Why are they estimated from the fitted values when we can do it with lower error from the actual data?
  • If the insample predictions are not available for a model we want to reconcile, what would duplicating the "y" column as the insample predictions cause? (Assume the model has 0 error on the training data).

For forecast_proportions would it not make sense to use the out of sample predictions?

@mjsandoval04

This comment was marked as off-topic.

@AzulGarza
Copy link
Member

hey @jmberutich, regarding your questions on the methods:

  • The average_proportions and proportion_averages approaches use the actual data (the target values used for training). The insample_df is required because it contains the historical target values (just adding an insample_df with columns unique_id, ds, and y should work for TopDown).
  • following this, you don't need to have the in-sample prediction to use the methods, you only need to pass insample_df with the historical target variable.
  • The forecast_proportions methods use the out-of-sample predictions.

@nelsoncardenas
Copy link

nelsoncardenas commented Dec 13, 2023

So, the solution would be

Y_hat_df = fcst.forecast(h=group.horizon, fitted=True)
insample_df = fcst.forecast_fitted_values()
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=insample_df, S=S_df, tags=tags)

option B

insample_df = Y_train_df.copy()
insample_df["AutoARIMA"] = insample_df["y"]
insample_df["Naive"] = insample_df["y"]
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=insample_df, S=S_df, tags=tags)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants