explain_forecast runs unnecessarily many predictions when multiple xreg variables are present #355

jonlachmann · 2023-08-18T14:44:24Z

Given a time series Y and two exogenous variables X1 and X2, we want to explain a forecast of three steps.

With the settings in the example below, we get 941 predictions to make, out of which only 864 are unique. I have seen a case which is a bit large to use as a reproduction example where out of 4586 predictions, only 160 are unique, and also a case where 750000 forecasts are to be created, out of which only a handful are unique.

This is most probably due to the code not recognizing that a forecast using some data for X1 and X2 as newxreg in the predicted period for horizon 1 does not care about what the values are for horizon 2 as they do not affect it.

An easy way to showcase the issue is to run the following rows inside compute_preds:

data_to_pred_fun <- dt[, ..feature_names]
unique_data_to_pred_fun <- unique(xxx)

and observe that the unique are fewer than the total.

Example code below:

options(digits = 5) # To avoid round off errors when printing output on different systems

data <- data.table::as.data.table(airquality)

model_ar_temp <- ar(data$Temp, order = 2)
model_ar_temp$n.ahead <- 3

p0_ar <- rep(mean(data$Temp), 3)

model_arima_temp <- arima(data$Temp[1:150], c(2, 1, 0), xreg = data[1:150, c("Wind", "Day", "Month")])

devtools::load_all()

explain_forecast(
      model = model_arima_temp,
      y = data[1:126, "Temp"],
      xreg = data[, c("Wind", "Day", "Month")],
      train_idx = 2:125,
      explain_idx = 126,
      explain_y_lags = 2,
      explain_xreg_lags = c(2, 2, 2),
      horizon = 3,
      approach = "empirical",
      prediction_zero = p0_ar[1:3],
      group_lags = TRUE,
      n_batches = 1,
      timing = FALSE,
      n_combinations = 20
    )

The text was updated successfully, but these errors were encountered:

martinju · 2023-08-20T19:11:26Z

Hi

I looked into this a bit. The problem seems to be that different feature subsets (id_combinations) end up requiring predictions for the the same feature combinations. If I recall correctly, I think we actually discussed this issue sometime during the winter/spring and decided to leave it (maybe we also tried to deal with it???). Maybe it is a bigger issue than we thought back then?

As I view it now, we could somehow keep track of which id_combinations that need the different predictions, then do a call to unique() before the predictions are ran, to then populate these back to the different features combinations. This needs to be done efficently though, so we don't spend more time on this than we save on doing less predictions.

jonlachmann · 2023-08-21T08:10:51Z

I did some more investigations, the reason I get very extreme cases (160 out of 250 000 unique predictions) is that I have some variables that are just dummy variables, providing very little opportunity for unique samples from it. This branch here ensures that only unique predictions are made, but it is not as performant as I would like:
jonlachmann@50a0649

I also looked into the steps where the combinations are generated, and there is a lot of overlap of what is predicted if the horizon is longer than 1, i.e. we predict a lot of times whilst varying an xreg variable at horizon 2 when we only want to use the predictions to explain horizon 1, and these predictions are also available for the combinations used for horizon 2.

Overall, there is a great opportunity to optimize this, but it is a bit complicated to do it without also introducing errors. I will continue to investigate to see if I can find any reasonable way to optimize it. I do however think that you @martinju knows much more about how the inner workings are written.

I have attached an example of the predictions to make, together with the map to the different horizons, both before and after pruning the combinations which are really not necessary.

Before:

After:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explain_forecast runs unnecessarily many predictions when multiple xreg variables are present #355

explain_forecast runs unnecessarily many predictions when multiple xreg variables are present #355

jonlachmann commented Aug 18, 2023

martinju commented Aug 20, 2023

jonlachmann commented Aug 21, 2023

explain_forecast runs unnecessarily many predictions when multiple xreg variables are present #355

explain_forecast runs unnecessarily many predictions when multiple xreg variables are present #355

Comments

jonlachmann commented Aug 18, 2023

martinju commented Aug 20, 2023

jonlachmann commented Aug 21, 2023