Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid repeated and unnecessary quality checks during forecast scoring #612

Open
nmdefries opened this issue Jan 17, 2023 · 4 comments
Open
Labels
enhancement New feature or request evalcast package

Comments

@nmdefries
Copy link
Contributor

Most of the runtime in scoring forecasts is spent in the checking functions, is_symmetric and score_func_params_check. The check functions are called by each evalcast error measure, so they're done repeatedly for each chunk of data.

Besides the checks being repeated across error measures, not every error measure needs all the checks contained in score_func_param_checker. How best to avoid unnecessary and repeated checks isn't immediately obvious.

@nmdefries nmdefries added enhancement New feature or request evalcast package labels Jan 17, 2023
@nmdefries
Copy link
Contributor Author

Potential approaches and considerations from discussion between @brookslogan and @nmdefries

Scoring is particularly slow when processing many forecasts at once, as in the forecast eval pipeline.

For forecast eval purposes, it makes sense to move the checks to the erm function before we do the loop, and drop the checks from all the error functions. This means that the checks only get run once per data group.

However, this isn't a good general solution since it makes assumptions about data structure that might not be true for custom error measure fns.

A general solution might be to use some sort of caching.

A naively memoised check would not be sufficiently reliable or performant. Attributes sometimes propagate through calculations, failing to invalidate the cache. Object addresses can remain the same while the object changes, even when only using R's "copy on write"-ish functions (due to some optimizations underneath), failing to invalidate the cache. Object hashes are not performant enough.

Could we attach check output to the data obj the first time the checks are run? The two check fns would be called on each data group, and the output (including assertion failures) attached as attributes to the chunk to effectively use as a cache the next time the checks are called. To do this, the checks would have to be run before any error measures, or the error measures would always have to pass back a potentially-modified data obj.

@brookslogan
Copy link
Collaborator

brookslogan commented Oct 12, 2023

attached as attributes to the chunk

When the chunks are combined the attributes will need to be combined, and I think the way dplyr does this is by default to just use the first or last attribute, not combine. We'd either need special classes (maybe a vctrs vector implementation), or maybe store things first as a list (using either special objects in the list or attributes attached to objects in the list to indicate errors/warnings), and then manually unchop that column. Or there might be some similar helper functionality for this in purrr, maybe [something similar to] purrr:safely.

Above seems potentially easiest, but we might also think of registering checker functions for each metric & then have a step to run all unique checks required by selected metrics, then run all metrics.

@nmdefries
Copy link
Contributor Author

When the chunks are combined the attributes will need to be combined

We shouldn't need to combine the data chunks themselves or preserve the check-result attributes long term.

In the old approach, each chunk (forecaster + some other stuff) is passed separately to erm. erm() sequentially evaluates all error measures on that chunk and returns only the scores, which are then combined in evaluate_predictions. Attribute-modified chunks would only be used within erm (or similar wrapper fn).

@nmdefries
Copy link
Contributor Author

Our current approach evaluates all error measures at once, in a single summarize, so there's not a clear place to run up-front checks or pass modified chunks back -- perhaps we'd need to return to the old approach. Or maybe we could give summarize a wrapper that runs the checks for each chunk, then does an internal summarize(!!!error_measures) over just that chunk.

Would modifying chunks cause much-increased memory usage? I'm thinking that adding an attribute to a df (underlying implementation is a list) will trigger a shallow copy only. But we may need to be careful about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request evalcast package
Projects
None yet
Development

No branches or pull requests

2 participants