Compute scaled errors for accuracy by series, then summarise. #342

robjhyndman · 2022-01-12T10:12:23Z

This doesn't look right:

library(fpp3)

fc <- aus_arrivals %>%
  filter(year(Quarter) <= 2009) %>% 
  model(
    ets = ETS(Arrivals),
    arima = ARIMA(Arrivals)
  ) %>%
  forecast(h=11)
fc %>% 
  accuracy(aus_arrivals) %>%
  select(.model, Origin, MASE)
#> # A tibble: 8 × 3
#>   .model Origin  MASE
#>   <chr>  <chr>  <dbl>
#> 1 arima  Japan  2.95 
#> 2 arima  NZ     0.576
#> 3 arima  UK     2.31 
#> 4 arima  US     2.46 
#> 5 ets    Japan  2.90 
#> 6 ets    NZ     0.656
#> 7 ets    UK     1.90 
#> 8 ets    US     1.53
fc %>% 
  accuracy(aus_arrivals, by=".model") %>%
  select(.model, MASE)
#> # A tibble: 2 × 2
#>   .model  MASE
#>   <chr>  <dbl>
#> 1 arima   6.49
#> 2 ets     6.38

^{Created on 2022-01-12 by the reprex package (v2.0.1)}

The second table should be the average of the results by model in the first table.

robjhyndman · 2022-02-04T01:13:03Z

Bumping this. I need the accuracy function for something I'm writing.

mitchelloharawild · 2022-02-04T05:12:25Z

This unexpected behaviour is due to how the scaling term for MASE is computed.
The by argument is used to group up the data passed to the accuracy functions, and when by = ".model" is used, the 'grouped' training data for MASE doesn't make much sense (it combines the training data of each Origin).

It's not clear to me how best to handle this for MASE, and other measures which might use information that doesn't match the indices of the forecasts (such as training data). For your expected results, you would want to pass multiple scaling terms (one for each series) to MASE(), but I think this verges on doing something too complicated in a function that may be surprising in other situations.

Another related consideration is what is the correct MASE scaling factor if you're grouping by a season (say accuracy by each quarter of the year). Would the correct scaling factor for MASE be based on the training data of each quarter? If the seasonality induces substantially different scales for each quarter, I think that could be reasonable.

So if anything, I think when accuracy() tries to prepare the .train input for accuracy measures like MASE, and the input combines data from multiple keys, it should give NA as the training data is not unique to a single series. 🤷

robjhyndman · 2022-02-06T05:37:44Z

OK. Supposing we do want to compute the MASE with a different scaling factor per series, what is the simplest way to compute it?

mitchelloharawild · 2022-02-06T23:43:24Z

Compute MASE with accuracy separately for each series, then compute the mean from the result.

library(fpp3)
#> ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
#> ✓ tibble      3.1.6          ✓ tsibble     1.1.1     
#> ✓ dplyr       1.0.7          ✓ tsibbledata 0.4.0.9000
#> ✓ tidyr       1.1.4          ✓ feasts      0.2.2.9000
#> ✓ lubridate   1.8.0          ✓ fable       0.3.1.9000
#> ✓ ggplot2     3.3.5.9000
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> x lubridate::date()    masks base::date()
#> x dplyr::filter()      masks stats::filter()
#> x tsibble::intersect() masks base::intersect()
#> x tsibble::interval()  masks lubridate::interval()
#> x dplyr::lag()         masks stats::lag()
#> x tsibble::setdiff()   masks base::setdiff()
#> x tsibble::union()     masks base::union()

fc <- aus_arrivals %>%
  filter(year(Quarter) <= 2009) %>% 
  model(
    ets = ETS(Arrivals),
    arima = ARIMA(Arrivals)
  ) %>%
  forecast(h=11)
z <- fc %>% 
  accuracy(aus_arrivals) %>%
  select(.model, Origin, MASE)
z %>% 
  group_by(.model) %>% 
  summarise(MASE = mean(MASE))
#> # A tibble: 2 × 2
#>   .model  MASE
#>   <chr>  <dbl>
#> 1 arima   2.08
#> 2 ets     1.75

^{Created on 2022-02-07 by the reprex package (v2.0.0)}

mitchelloharawild · 2022-02-06T23:52:14Z

I think an alternative approach that would work here is reconsidering how accuracy measures are specified. Instead of always reducing the output into a single unit summary, it could be more valuable to expose the measure before combination.

For instance, MAE would be comprised of a mean of an absolute_error. Then mean can be swapped out for median if you like.
MASE would be comprised of mean and absolute_scaled_error, and so on.

This would allow us to compute the f(errors) by series, and then combine them with mean/median/summary_fn grouped in accordance with the by argument.

An alternative summary_fn that could be of interest is dist_sample, which would allow you to look at the distribution of the errors used in computing the summarised accuracy measure.

robjhyndman · 2022-02-08T00:37:58Z

Yes, that could be better. The problem with just averaging the results as in your first suggestion is that it only works for means. medians for example would not work.

mitchelloharawild · 2022-02-08T03:11:26Z

Right. I'll try to rework the accuracy() interface to compute accuracy measures in the two stage approach mentioned above.

mitchelloharawild self-assigned this Jan 12, 2022

mitchelloharawild added the bug Something isn't working label Jan 12, 2022

mitchelloharawild removed the bug Something isn't working label Feb 22, 2022

mitchelloharawild changed the title ~~accuracy with by argument~~ Compute scaled errors for accuracy by series, then summarise. Feb 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute scaled errors for accuracy by series, then summarise. #342

Compute scaled errors for accuracy by series, then summarise. #342

robjhyndman commented Jan 12, 2022

robjhyndman commented Feb 4, 2022

mitchelloharawild commented Feb 4, 2022

robjhyndman commented Feb 6, 2022

mitchelloharawild commented Feb 6, 2022

mitchelloharawild commented Feb 6, 2022

robjhyndman commented Feb 8, 2022

mitchelloharawild commented Feb 8, 2022

Compute scaled errors for accuracy by series, then summarise. #342

Compute scaled errors for accuracy by series, then summarise. #342

Comments

robjhyndman commented Jan 12, 2022

robjhyndman commented Feb 4, 2022

mitchelloharawild commented Feb 4, 2022

robjhyndman commented Feb 6, 2022

mitchelloharawild commented Feb 6, 2022

mitchelloharawild commented Feb 6, 2022

robjhyndman commented Feb 8, 2022

mitchelloharawild commented Feb 8, 2022