Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute scaled errors for accuracy by series, then summarise. #342

Open
robjhyndman opened this issue Jan 12, 2022 · 7 comments
Open

Compute scaled errors for accuracy by series, then summarise. #342

robjhyndman opened this issue Jan 12, 2022 · 7 comments
Assignees

Comments

@robjhyndman
Copy link
Member

This doesn't look right:

library(fpp3)

fc <- aus_arrivals %>%
  filter(year(Quarter) <= 2009) %>% 
  model(
    ets = ETS(Arrivals),
    arima = ARIMA(Arrivals)
  ) %>%
  forecast(h=11)
fc %>% 
  accuracy(aus_arrivals) %>%
  select(.model, Origin, MASE)
#> # A tibble: 8 × 3
#>   .model Origin  MASE
#>   <chr>  <chr>  <dbl>
#> 1 arima  Japan  2.95 
#> 2 arima  NZ     0.576
#> 3 arima  UK     2.31 
#> 4 arima  US     2.46 
#> 5 ets    Japan  2.90 
#> 6 ets    NZ     0.656
#> 7 ets    UK     1.90 
#> 8 ets    US     1.53
fc %>% 
  accuracy(aus_arrivals, by=".model") %>%
  select(.model, MASE)
#> # A tibble: 2 × 2
#>   .model  MASE
#>   <chr>  <dbl>
#> 1 arima   6.49
#> 2 ets     6.38

Created on 2022-01-12 by the reprex package (v2.0.1)

The second table should be the average of the results by model in the first table.

@mitchelloharawild mitchelloharawild self-assigned this Jan 12, 2022
@mitchelloharawild mitchelloharawild added the bug Something isn't working label Jan 12, 2022
@robjhyndman
Copy link
Member Author

Bumping this. I need the accuracy function for something I'm writing.

@mitchelloharawild
Copy link
Member

This unexpected behaviour is due to how the scaling term for MASE is computed.
The by argument is used to group up the data passed to the accuracy functions, and when by = ".model" is used, the 'grouped' training data for MASE doesn't make much sense (it combines the training data of each Origin).

It's not clear to me how best to handle this for MASE, and other measures which might use information that doesn't match the indices of the forecasts (such as training data). For your expected results, you would want to pass multiple scaling terms (one for each series) to MASE(), but I think this verges on doing something too complicated in a function that may be surprising in other situations.

Another related consideration is what is the correct MASE scaling factor if you're grouping by a season (say accuracy by each quarter of the year). Would the correct scaling factor for MASE be based on the training data of each quarter? If the seasonality induces substantially different scales for each quarter, I think that could be reasonable.

So if anything, I think when accuracy() tries to prepare the .train input for accuracy measures like MASE, and the input combines data from multiple keys, it should give NA as the training data is not unique to a single series. 🤷

@robjhyndman
Copy link
Member Author

OK. Supposing we do want to compute the MASE with a different scaling factor per series, what is the simplest way to compute it?

@mitchelloharawild
Copy link
Member

Compute MASE with accuracy separately for each series, then compute the mean from the result.

library(fpp3)
#> ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
#> ✓ tibble      3.1.6          ✓ tsibble     1.1.1     
#> ✓ dplyr       1.0.7          ✓ tsibbledata 0.4.0.9000
#> ✓ tidyr       1.1.4          ✓ feasts      0.2.2.9000
#> ✓ lubridate   1.8.0          ✓ fable       0.3.1.9000
#> ✓ ggplot2     3.3.5.9000
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> x lubridate::date()    masks base::date()
#> x dplyr::filter()      masks stats::filter()
#> x tsibble::intersect() masks base::intersect()
#> x tsibble::interval()  masks lubridate::interval()
#> x dplyr::lag()         masks stats::lag()
#> x tsibble::setdiff()   masks base::setdiff()
#> x tsibble::union()     masks base::union()

fc <- aus_arrivals %>%
  filter(year(Quarter) <= 2009) %>% 
  model(
    ets = ETS(Arrivals),
    arima = ARIMA(Arrivals)
  ) %>%
  forecast(h=11)
z <- fc %>% 
  accuracy(aus_arrivals) %>%
  select(.model, Origin, MASE)
z %>% 
  group_by(.model) %>% 
  summarise(MASE = mean(MASE))
#> # A tibble: 2 × 2
#>   .model  MASE
#>   <chr>  <dbl>
#> 1 arima   2.08
#> 2 ets     1.75

Created on 2022-02-07 by the reprex package (v2.0.0)

@mitchelloharawild
Copy link
Member

I think an alternative approach that would work here is reconsidering how accuracy measures are specified. Instead of always reducing the output into a single unit summary, it could be more valuable to expose the measure before combination.

For instance, MAE would be comprised of a mean of an absolute_error. Then mean can be swapped out for median if you like.
MASE would be comprised of mean and absolute_scaled_error, and so on.

This would allow us to compute the f(errors) by series, and then combine them with mean/median/summary_fn grouped in accordance with the by argument.

An alternative summary_fn that could be of interest is dist_sample, which would allow you to look at the distribution of the errors used in computing the summarised accuracy measure.

@robjhyndman
Copy link
Member Author

Yes, that could be better. The problem with just averaging the results as in your first suggestion is that it only works for means. medians for example would not work.

@mitchelloharawild
Copy link
Member

Right. I'll try to rework the accuracy() interface to compute accuracy measures in the two stage approach mentioned above.

@mitchelloharawild mitchelloharawild removed the bug Something isn't working label Feb 22, 2022
@mitchelloharawild mitchelloharawild changed the title accuracy with by argument Compute scaled errors for accuracy by series, then summarise. Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants