Separate performance metrics for each forecasting horizon #4584

cgarciga · 2023-05-13T01:08:03Z

cgarciga
May 13, 2023

Hello, I was wondering if sktime supports computing a performance metric, e.g. RMSE, for one-step ahead forecasts, two-step ahead forecasts, etc? For example, the following code gives the MAPE across each collection of 30 forecasts (y is daily data here):

forecaster = NaiveForecaster(strategy="last", sp=365)

cv = ExpandingWindowSplitter(
    step_length=30, fh= np.arange(1, 31), initial_window=365*2
)

df = evaluate(forecaster=forecaster,
              y=y,
              cv=cv,
              strategy="refit",
              backend='loky',
              return_data=True)

So it looks like at each step it generates 30 forecasts (since fh = [1,...,30]) and computes a single MAPE over those 30 forecasts. What I have in mind is getting all the fh=1 forecasts across all the steps and then computing say the RMSE over all of those, then getting all the fh=2 forecasts across all the steps and computing the RMSE over all of those, and so on.

Is this supported?

fkiraly · 2023-05-13T12:41:45Z

fkiraly
May 13, 2023
Maintainer

Is this supported?

You could simply run evaluate multiple times with different fh and metric. Or, you could pass multiple metrics and that should work (and then just ignore the ones you do not want).

Can you elaborate on your use case a bit more, i.e., what do you want to achieve here? Because I'm wondering whether it is more of a mixed metric that you want, or more of a benchmarking tool.

0 replies

fkiraly · 2023-05-14T15:14:34Z

fkiraly
May 14, 2023
Maintainer

would this PR by @hazrulakmal suit your needs?
https://github.com/sktime/sktime/pull/4586/files

0 replies

cgarciga · 2023-05-14T16:52:10Z

cgarciga
May 14, 2023
Author

Hi @fkiraly, first off, thank you for your quick replies and your interest. What I have in mind, which I don't believe the PR you referenced solves, is something like Figure 4, p. 16, of https://www.federalreserve.gov/econres/feds/files/2021014pap.pdf. I say something like, because here the statistics are all relative to a baseline model(here, the AR(1), Gap model, which is the first one on the x-axis), whereas in my original question I was after the raw RMSEs, although in practice RMSEs are reported relative to a baseline. In any case, in this figure they are reporting (relative) RMSEs for forecasts of a quarterly inflation variable, and the results are reported separately for horizons 1, 2, 4, and 8. So if we look at the Tealbook forecasts for example (third to last on the x-axis), we see that (relative to the baseline) it is very accurate for 1-quarter-ahead forecasts, however this performance deteriorates for 2-quarter-ahead forecasts, and deteriorates again for 4-quarter-ahead forecasts.

Let me know if this still doesn't help explain. Also, you are correct in that one can just run evaluate multiple times, although to produce the results I have in mind, I think I would need to subtract, at every step, the y_test column from the y_pred column to get the forecast errors, then take the RMSE across all forecast errors in the first position (e[0] at each step, where e are the Series of errors) to get the 1-step ahead RMSE, then take the RMSE across all the e[1] forecast errors to get the 2-step-ahead RMSE, and so on.

0 replies

hazrulakmal · 2023-05-15T01:37:24Z

hazrulakmal
May 15, 2023
Collaborator

Interesting evaluation case. I hope I understood your message correctly. Based on your explanation and the code you provided, let's assume we have 2.5 years of data, with the last 6 months being the test set. CV will output 6 fold evaluation because the window moves every 30 days and you want to compute metrics, say RMSE, on those 6 points of each forecast horizon, fh $\in \{1, ... ,30\}$ resulting in 30 different RMSE values, right?. I don't think we can currently do that on sktime. @fkiraly pls correct me if I'm wrong. but a workaround that I can think of will be the same as you that is to take the output data frame from evaluate and do extra preprocessing steps on df[["y_pred","y_test"]] to compute per forecast horizon. on a side note for dev, I think this is another use case where it would be great if evaluate could output cleaner estimator predictions - current implementation embeds the prediction series in a single cell of a dataframe.

1 reply

cgarciga May 15, 2023
Author

Hi, you are correct. If df is the output of evaluate, here is how I would calculate RMSE by horizon:

df['errs'] = df.apply(lambda row: row[['y_test']]-row[['y_pred']].values,axis=1)

errs_by_horz = pd.concat([df.loc[r,'errs'].reset_index(drop=True) for r in df.index],axis=1, ignore_index = True)

rmse_by_horz = errs_by_horz.apply(lambda row: np.sqrt(row.pow(2).mean()),axis=1)

Also, I like that y_pred and y_test are returned as series within cells of the larger df, since it makes calculations like this simple.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate performance metrics for each forecasting horizon #4584

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Separate performance metrics for each forecasting horizon #4584

cgarciga May 13, 2023

Replies: 4 comments · 1 reply

fkiraly May 13, 2023 Maintainer

fkiraly May 14, 2023 Maintainer

cgarciga May 14, 2023 Author

hazrulakmal May 15, 2023 Collaborator

cgarciga May 15, 2023 Author

cgarciga
May 13, 2023

Replies: 4 comments 1 reply

fkiraly
May 13, 2023
Maintainer

fkiraly
May 14, 2023
Maintainer

cgarciga
May 14, 2023
Author

hazrulakmal
May 15, 2023
Collaborator

cgarciga May 15, 2023
Author