TopDown method (proportion_averages, average_proportions) broken in 0.3.0, 0.4.0 and 0.4.1 #253

jmberutich · 2023-11-27T10:09:46Z

What happened + What you expected to happen

The TopDown methods:

proportion_averages
average_proportions

Are broken after version 0.2.1.

Error output:

KeyError                                  Traceback (most recent call last)
File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/indexes/base.py:3790, in Index.get_loc(self, key)
   3789 try:
-> 3790     return self._engine.get_loc(casted_key)
   3791 except KeyError as err:

File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'AutoARIMA'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[4], line 40
     33 reconcilers = [
     34     BottomUp(),
     35     TopDown(method='average_proportions'),
     36     MiddleOut(middle_level='Country/Purpose/State',
     37               top_down_method='forecast_proportions')
     38 ]
     39 hrec = HierarchicalReconciliation(reconcilers=reconcilers)
---> 40 Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_train_df,
     41                           S=S, tags=tags)

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/hierarchicalforecast/core.py:280, in HierarchicalReconciliation.reconcile(self, Y_hat_df, S, tags, Y_df, level, intervals_method, num_samples, seed, sort_df, is_balanced)
    278         y_hat_insample = Y_df[model_name].values.reshape(len(S_df), -1).astype(np.float32)
    279     else:
--> 280         y_hat_insample = Y_df.pivot(columns='ds', values=model_name).loc[S_df.index].values.astype(np.float32)
    281     reconciler_args['y_hat_insample'] = y_hat_insample
    283 if has_level and (level is not None):

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/frame.py:9025, in DataFrame.pivot(self, columns, index, values)
   9018 @Substitution("")
   9019 @Appender(_shared_docs["pivot"])
   9020 def pivot(
   9021     self, *, columns, index=lib.no_default, values=lib.no_default
   9022 ) -> DataFrame:
   9023     from pandas.core.reshape.pivot import pivot
-> 9025     return pivot(self, index=index, columns=columns, values=values)

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/reshape/pivot.py:549, in pivot(data, columns, index, values)
    545         indexed = data._constructor(
    546             data[values]._values, index=multiindex, columns=values
    547         )
    548     else:
--> 549         indexed = data._constructor_sliced(data[values]._values, index=multiindex)
    550 # error: Argument 1 to "unstack" of "DataFrame" has incompatible type "Union
    551 # [List[Any], ExtensionArray, ndarray[Any, Any], Index, Series]"; expected
    552 # "Hashable"
    553 result = indexed.unstack(columns_listlike)  # type: ignore[arg-type]

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/frame.py:3893, in DataFrame.__getitem__(self, key)
   3891 if self.columns.nlevels > 1:
   3892     return self._getitem_multilevel(key)
-> 3893 indexer = self.columns.get_loc(key)
   3894 if is_integer(indexer):
   3895     indexer = [indexer]

File ~/projects/efds-fcpf-forecasting-engine/.venv/lib/python3.9/site-packages/pandas/core/indexes/base.py:3797, in Index.get_loc(self, key)
   3792     if isinstance(casted_key, slice) or (
   3793         isinstance(casted_key, abc.Iterable)
   3794         and any(isinstance(x, slice) for x in casted_key)
   3795     ):
   3796         raise InvalidIndexError(key)
-> 3797     raise KeyError(key) from err
   3798 except TypeError:
   3799     # If we have a listlike key, _check_indexing_error will raise
   3800     #  InvalidIndexError. Otherwise we fall through and re-raise
   3801     #  the TypeError.
   3802     self._check_indexing_error(key)

KeyError: 'AutoARIMA'

Versions / Dependencies

v.0.3.0
v0.4.0
v0.4.1

Reproduction script

# !pip install -U numba statsforecast datasetsforecast
import numpy as np
import pandas as pd

#obtain hierarchical dataset
from datasetsforecast.hierarchical import HierarchicalData

# compute base forecast no coherent
from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA, Naive

#obtain hierarchical reconciliation methods and evaluation
from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.evaluation import HierarchicalEvaluation
from hierarchicalforecast.methods import BottomUp, TopDown, MiddleOut


# Load TourismSmall dataset
Y_df, S, tags = HierarchicalData.load('./data', 'TourismSmall')
Y_df['ds'] = pd.to_datetime(Y_df['ds'])

#split train/test sets
Y_test_df  = Y_df.groupby('unique_id').tail(4)
Y_train_df = Y_df.drop(Y_test_df.index)

# Compute base auto-ARIMA predictions
fcst = StatsForecast(df=Y_train_df,
                     models=[AutoARIMA(season_length=4), Naive()],
                     freq='Q', n_jobs=-1)
Y_hat_df = fcst.forecast(h=4)

# Reconcile the base predictions
reconcilers = [
    BottomUp(),
    TopDown(method='average_proportions'),
    MiddleOut(middle_level='Country/Purpose/State',
              top_down_method='forecast_proportions')
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_train_df,
                          S=S, tags=tags)

Issue Severity

None

The text was updated successfully, but these errors were encountered:

jmoralez · 2023-11-29T00:05:14Z

Hey. The TopDown method requires the in-sample predictions of the models to be provided in Y_df, so if you add the following to your example it should work:

Y_hat_df = fcst.forecast(h=4, fitted=True)  # added fitted=True here
insample_df = fcst.forecast_fitted_values()  # get in-sample predictions
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=insample_df, S=S, tags=tags)  # provide insample_df through Y_df

jmberutich · 2023-11-29T07:09:58Z

Thanks for the quick response. I followed your suggestion and adding the model insample predictions worked.

I have some questions regarding the TopDown (average_proportions and proportion_averages) methods are how the are calculated.

Why are they estimated from the fitted values when we can do it with lower error from the actual data?
If the insample predictions are not available for a model we want to reconcile, what would duplicating the "y" column as the insample predictions cause? (Assume the model has 0 error on the training data).

For forecast_proportions would it not make sense to use the out of sample predictions?

AzulGarza · 2023-11-29T20:27:21Z

hey @jmberutich, regarding your questions on the methods:

The average_proportions and proportion_averages approaches use the actual data (the target values used for training). The insample_df is required because it contains the historical target values (just adding an insample_df with columns unique_id, ds, and y should work for TopDown).
following this, you don't need to have the in-sample prediction to use the methods, you only need to pass insample_df with the historical target variable.
The forecast_proportions methods use the out-of-sample predictions.

nelsoncardenas · 2023-12-13T15:25:43Z

So, the solution would be

Y_hat_df = fcst.forecast(h=group.horizon, fitted=True)
insample_df = fcst.forecast_fitted_values()
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=insample_df, S=S_df, tags=tags)

option B

insample_df = Y_train_df.copy()
insample_df["AutoARIMA"] = insample_df["y"]
insample_df["Naive"] = insample_df["y"]
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=insample_df, S=S_df, tags=tags)

jmberutich added the bug label Nov 27, 2023

This comment was marked as off-topic.

Sign in to view

jmoralez added the awaiting response label Nov 29, 2023

github-actions bot removed the awaiting response label Nov 29, 2023

This comment was marked as off-topic.

Sign in to view

jmoralez mentioned this issue Nov 29, 2023

TopDown returns NaN #255

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TopDown method (proportion_averages, average_proportions) broken in 0.3.0, 0.4.0 and 0.4.1 #253

TopDown method (proportion_averages, average_proportions) broken in 0.3.0, 0.4.0 and 0.4.1 #253

jmberutich commented Nov 27, 2023

This comment was marked as off-topic.

jmoralez commented Nov 29, 2023

jmberutich commented Nov 29, 2023

This comment was marked as off-topic.

AzulGarza commented Nov 29, 2023

nelsoncardenas commented Dec 13, 2023 •

edited

TopDown method (proportion_averages, average_proportions) broken in 0.3.0, 0.4.0 and 0.4.1 #253

TopDown method (proportion_averages, average_proportions) broken in 0.3.0, 0.4.0 and 0.4.1 #253

Comments

jmberutich commented Nov 27, 2023

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

This comment was marked as off-topic.

jmoralez commented Nov 29, 2023

jmberutich commented Nov 29, 2023

This comment was marked as off-topic.

AzulGarza commented Nov 29, 2023

nelsoncardenas commented Dec 13, 2023 • edited

nelsoncardenas commented Dec 13, 2023 •

edited