Nested forecasting - common recipe feature engineering causing issue in model calibration #243

daniepi · 2024-02-22T08:00:29Z

Hi @mdancho84,
First and foremost thanks for this amazing suite of modeltime packages. I am trying to model many individual time series using nested forecasting as mentiond here: https://business-science.github.io/modeltime/articles/nested-forecasting.html

I came across a peculiar problem with using a commonly defined recipe with date-based features and having timeseries of differing lengths and not fully overlapping periods.

With recipe like this

rec_date_feats <- recipe(y~ date, data = extract_nested_train_split(nested_data_tbl)) |>
  step_timeseries_signature(date) |>
  step_rm(date) |>
  step_normalize(date_index.num) |>
  step_zv(all_predictors()) |>
  step_corr(all_numeric_predictors(), threshold = 0.99) |>
  step_dummy(all_nominal_predictors(), one_hot = TRUE)

The training works well and models are fitted well on all time series. I see from the recipes nested in the output of modeltime_nested_fit that not all series where fitted with same features (I guess it is the ZV and CORR removal which decided to remove different features for different series) which is ok and wanted.
Unfortunately, models for some series are lacking .calibration_data, so I was trying to figure out why. What I have found out is that it works well for all series which end up with same features as the original recipe definition, while it fails producing .calibration_data for all other series.

Simple example. I have 8 series. I build the recipe as stated above with extract_nested_train_split(nested_data_tbl) which by default uses .row_id = 1, i.e. first series. Let say series nr. 7 and 8 were trained with different feature sets (because their training period was slightly different to series 1-6). Then the calculation of .calibration_data would fail.

I can manualy produce new_data using bake and prep using the recipe specifically extracted for series 7/8 and the predict(model, new_data = ...) and predictions work fine. e.g.

mod <- modeltime_table(nested_modeltime_tbl$.modeltime_tables[[7]]$.model[[2]])
recp <- nested_modeltime_tbl$.modeltime_tables[[7]]$.model[[2]]$pre$actions$recipe$recipe

# This fails
mod |> modeltime_calibrate(new_data = extract_nested_test_split(nested_data_tbl, .row_id = 7))

# This works
bake_test <- bake(prep(recp, training = extract_nested_train_split(nested_data_tbl, .row_id = 7)),
                  new_data = extract_nested_test_split(nested_data_tbl, .row_id = 7))
predict(mod$.model[[1]]$fit$fit, new_data = bake_test |> select(!x))

Finally, when I create the initial recipe with extract_nested_train_split(nested_data_tbl, .row_id = 7), then calibration fails for first 6 series and works for series 7.

I don't know the implementation details well, but I think the problem is that when prediction data for calibration is being constructed, it bakes the recipe trained on the data supplied when recipe is being instantiated and not on the actual (individual time series) training data. Hence it tries to predict a model trained on a given feature set using new data with different feature set.

Is my understanding correct? Thanks for any feedback. :)

The text was updated successfully, but these errors were encountered:

daniepi · 2024-03-08T13:03:51Z

Hi again,

I was digging into to the code. I think the problem arises from mdl_time_forecast
https://github.com/business-science/modeltime/blob/master/R/modeltime-forecast.R#L1034

The problem is that mld$blueprint$recipe is a trained recipe as estimated on whatever is the first series in nested data
https://github.com/business-science/modeltime/blob/master/R/modeltime-forecast.R#L927-L928

Hence if any of the series does not share the same time index, processing steps that remove some features (like CORR, ZV) will create discrepancy between data used to train model for such series vs. data used to predict on.
This seems to create problem for models like XGBoost, where given set of features is expected in predict time, but it receives different set.

mdancho84 · 2024-03-08T13:14:17Z

Ok sorry haven't had time to dig into it. But yeah the logic there was that the recipe used on the first model can be used on others. Might need to rethink that

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nested forecasting - common recipe feature engineering causing issue in model calibration #243

Nested forecasting - common recipe feature engineering causing issue in model calibration #243

daniepi commented Feb 22, 2024

daniepi commented Mar 8, 2024 •

edited

mdancho84 commented Mar 8, 2024

Nested forecasting - common recipe feature engineering causing issue in model calibration #243

Nested forecasting - common recipe feature engineering causing issue in model calibration #243

Comments

daniepi commented Feb 22, 2024

daniepi commented Mar 8, 2024 • edited

mdancho84 commented Mar 8, 2024

daniepi commented Mar 8, 2024 •

edited