Adjust ahead #296

dsweber2 · 2024-03-18T18:20:57Z

Checklist

Please:

Make sure this PR is against "dev", not "main".
Request a review from one of the current epipredict main reviewers:
dajmcdon.
Makes sure to bump the version number in DESCRIPTION and NEWS.md.
Always increment the patch version number (the third number), unless you are
making a release PR from dev to main, in which case increment the minor
version number (the second number).
Describe changes made in NEWS.md, making sure breaking changes
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
0.7.2, then write your changes under the 0.8 heading).

Change explanations for reviewer

Implementation of latency adjustments as an argument to step_epi_ahead. The rough idea is to make ahead relative to the as_of date, rather than the last day of data. It throws a warning if this is too far in the future. Only extend_ahead and None are currently implemented. Biggest downside is it only works for the epi_df that the pipeline was created on.

Still working on:

some tests (such as making sure it only works on the first epi_df and the end-to-end application works, rather than just the specialized function extend_ahead).
adjustments to layers to account for the existence of step_adjust_ahead
locf and associated tests
extend_lags we should talk about this one, it's a new-ish @lcbrooks idea. Sort of the opposite of extend_aheads. Setting the forecast date to be as_of, per predictor, adjust the lags so that they're relative to the last day of data. E.g. if predictor A has 3 days of latency, and the lags are given as (0,7,14), set the lags to (3,10,17), while if B has 5 days of latency, set the lags for B to (5,12,19). Feel free to move discussion to the issue
demo vignettes

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

Resolves Add latency adjustment #279

R/utils-shift.R

dsweber2 · 2024-03-19T22:04:38Z

based on feedback, I'm making this a separate step and making sure full-pipeline tests work. It will be a PR until both of those are done.

dsweber2 · 2024-03-29T07:37:05Z

So I'm still putting the tests in a format that isn't a pile of spaghetti, but I wanted to get this out so we could maybe talk about it.

To implement a separate step, I couldn't find a way to get previous steps, so I basically had to communicate everything relevant through the headers of new_data in the bake step. Hopefully modifying those doesn't cause problems.

While I was looking around I noticed that both step_epi_lag and step_epi_ahead were essentially the same; I made them call a separate function to highlight the actual differences. Couldn't quite reuse that for step_adjust_latency unfortunately, though that has a similar thing where adjust_ahead and adjust_lag basically do the same thing.

There's still some downstream implications for how layer_forecast_date works, specifically in the default case. We may have to add it as an option here too for it to work for the case of someone setting it manually using that step, which seems a bit unfortunate but kinda unavoidable

dsweber2 · 2024-04-01T22:59:02Z

R/utils-latency.R

+        key_cols = keys
+      )
+    }) %>%
+    map(\(x) zoo::na.trim(x)) %>% # TODO need to talk about this


Two things about this:

Zoo dependency. Seems we don't currently have this. are we trying to minimize package dependencies? This is the only function I use from it

This shears NA's off the end of the column. We have a step that explicitly drops NA's, so I'm not sure how much we're trying to avoid removing NA's otherwise. Not sure how to do this correctly otherwise though.

by default, na.trim removes from both the beginning and the end

[oops just repeating Daniel here] side note: \( is R >= 4.1. We were recently talking about not trying to support versions 3.* but not sure about 4.0.

this function from latency processing code lying around may be helpful:

n_true_at_tail = function(x) { match(FALSE, rev(x), nomatch = length(x) + 1L) - 1L } # So then to remove NAs at tail you could do the following (or maybe an S3 version of this): remove_na_tail <- function(x) { x[seq_len(length(x) - n_true_at_tail(is.na(x)))] } # (don't use `tail` with "negative" `n` unless you know it's actually negative and not zero, which isn't the case here) remove_na_tail(c(NA, NA, 1:10, NA, 1:10, NA, NA)) remove_na_tail(1:100) # ALTREP remove_na_tail(1:100 + 1) # non-ALTREP remove_na_tail(integer(0L)) remove_na_tail(rep(NA_integer_, 5))

[oops just repeating Daniel here]

yeah, threading in github issues can be hard to follow, especially when there are a lot of comments >.<

could you add some docs to those functions?

dsweber2 · 2024-04-02T00:33:26Z

Getting a peculiar error where the local snapshot test for population scaling has 4 lines like this

    Code
      p <- predict(wf, latest)
    Message
      Joining with `by = join_by(geo_value)`
      Joining with `by = join_by(geo_value)`
      Joining with `by = join_by(geo_value)`
      Joining with `by = join_by(geo_value)`

whereas the CI is apparently only generating 2. And this apparently changed within the past 2 weeks.

dajmcdon · 2024-04-02T01:19:50Z

This was a change in {dplyr} at some point that generates these if you don't specify by =. I tried purposely to get rid of them. But maybe I missed some? Or your new code is recreating more?

dajmcdon · 2024-04-02T17:13:57Z

A few other notes (we can discuss synchronously):

The only time a step_*() function has access to the recipe is when it is added. At that point, it can "see" all previous steps. So inside of step_adjust_latency(), you can immediately check to see if there are leads/lags in the current set of recipe instructions. In principle, you could then adjust them if desired. I'm not sure this handles everything, but I suspect it handles some of it.
The map(\(x), x+3) style syntax was added in R 4.1. We've avoided forcing this Dependency on users to this point. I'm open to changing this (it's almost 3 years old), but we should discuss. This is also the reason for sticking with the {magrittr} pipe %>% rather than the native pipe |>.
The {zoo} dependency. There are situations where removal is always the right thing (creating NAs on future time_value's is probably one of these), and our step_epi_lead() does this currently. However, my previous practice has been to pass any NAs created by steps along. This way, additional steps can potentially be added to handle them (say, any of these, or similar time series fill versions). Simplest is the step_naomit() functions. Is there a reason that this can't be done without the dependency or handled downstream?

dsweber2 · 2024-04-02T18:14:50Z

tests/testthat/test-population_scaling.R

+  expect_snapshot(prep <- prep(r, jhu))
+
+  expect_snapshot(b <- bake(prep, jhu))


So these are the tests generating the snapshots. Looking at my loca gitl timemachine, this was a set of tests that was previously being skipped, and I added these as snapshot tests. I guess I'll just put this back to skipped for now.

But yes, it does look like there are some lingering Joining with by = join_by(geo_value)` messages

dshemetov · 2024-04-02T19:37:39Z

NEWS.md

@@ -1,6 +1,9 @@
 # epipredict (development)

 Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.0.x will indicate PR's.
+# epipredict 0.2
+
+-   add `latency_adjustment` as an option for `add_epi_ahead`, which adjusts the `ahead` so that the prediction is `ahead` relative to the `as_of` date for the `epi_data`, rather than relative to the last day of data.


next "release" version is 0.1, so this should just go below

dajmcdon

I left a number of questions and comments through out. It may be easiest to schedule a check in or maybe on Slack?

dajmcdon · 2024-05-02T20:37:27Z

DESCRIPTION

+    stringr,
+    stringi,


I would strongly prefer to avoid these if possible.

brief reason? I'll try to remove them regardless, I'm just wondering

Brief: "fewer Imports is better"
Longer: {stringi} uses some beefy compiled code (even the precompiled binaries are large), and can sometimes fail to install on some systems. There are a handful of packages like this, {sf} is another, that I prefer to avoid, where possible.

DESCRIPTION

tests/testthat/test-dist_quantiles.R

DESCRIPTION

R/arx_forecaster.R

R/utils-latency.R

dajmcdon · 2024-05-02T21:07:02Z

R/utils-latency.R

+    rlang::abort("the `time_value` column of `new_data` is empty")
+  }
+  as_of <- attributes(new_data)$metadata$as_of
+  max_time <- max(time_values)
+  # make sure the as_of is sane
+  if (!inherits(as_of, class(time_values))) {
+    rlang::abort(glue::glue(
+      "the data matrix `as_of` value is {as_of}, ",
+      "and not a valid `time_type` with type ",
+      "matching `time_value`'s type of ",
+      "{typeof(new_data$time_value)}."
+    ))


Is there a reason for rlang versions here and cli elsewhere?

nope, not sure I put much thought into it. The package is also in a split state; is it alright if I just find-replace all instances of rlang::abort with cli::cli_abort, or is there some reason there are both generally?

At the beginning, we used {rlang}, but I've been trying to have all new code use {cli} (with a plan to find and replace all the old stuff eventually). I thought there was an issue, but I can't find it.

You can't generally just find-replace. See this migration info. [I threw a fix for various violations of this in epiprocess in the R6 refactor which will be merged shortly.]

ah, they have a guide! Thanks Logan. I'll go over the changed ones and make sure they're not broken. Thankfully none of them broke the tests, but I'm guessing we don't have super strict expect_error tests for every single one

R/utils-latency.R

R/utils-shift.R

dsweber2 · 2024-05-08T23:14:35Z

Overview of new changes:

I dropped stringr and stringi, added purrr.
zoo is still present, waiting for us to decide on whether we want to trim or not.
I moved lag/ahead detection into the step creation thanks to liberal use of recipes_eval_select. If the template is different from the training data this could cause problems, but I guess that's supposed to be true anyways.

dajmcdon

Ok. I think everything looks potentially fine. It would help for the example to be a bit more elaborate.

Potential for merge conflicts with #331.

R/utils-latency.R

NAMESPACE

DESCRIPTION

R/arx_classifier.R

R/step_growth_rate.R

R/step_lag_difference.R

tests/testthat/test-epi_shift.R

dajmcdon · 2024-05-09T21:24:40Z

R/step_adjust_latency.R

+#' @family row operation steps
+#' @rdname step_adjust_latency
+#' @export
+#' @examples
+#' r <- epi_recipe(case_death_rate_subset) %>%
+#'   step_epi_ahead(death_rate, ahead = 7) %>%
+#'   #   step_adjust_latency(method = "extend_ahead") %>%
+#'   step_epi_lag(death_rate, lag = c(0, 7, 14))
+#' r


It would help me to flesh this out a bit. Maybe something with latency and see the results with and without?

For now, do the ones in the tests generally address that? I still need to update the layer functions to use dates correctly, but was waiting until these steps are relatively stable. Once I have those, I'll definitely expand the @examples, and then write a vignette

brookslogan · 2024-05-14T20:01:58Z

Wishlist item based on some dogfooding: if the minimum reporting latency across signals&geos is really large, say, a month or greater, that's worth a warning. E.g., FluSurv-NET has gaps in reporting between seasons, and I tried to forecast a new season before the first report actually was published using ahead adjustment, and happily tried to forecast the beginning of that season from the end of the preceding season, likely getting poor predictions.

This is purely before any work on latency adjusting, and is just about geting the tests to run without warnings or skipped tests. Getting this involved: 1. renaming some lingering examples of p to probs and `q` to `quantile_levels` (and some `quantile_level`s to the plural) 2. Adding snapshots so the tests for printing in population_scaling work as intended (should probably be converted to cli_informs at some point) 3. removing the nearly empty `test-propagate_samples` which seems like something intended that was never finished. Probably want to add an issue if we actually want it done. 4. added a bunch of `edf`'s in unhappy `prep` steps

dsweber2 self-assigned this Mar 18, 2024

dsweber2 requested a review from dajmcdon as a code owner March 18, 2024 18:20

dsweber2 mentioned this pull request Mar 19, 2024

291 date period #297

Merged

4 tasks

dsweber2 commented Mar 19, 2024

View reviewed changes

R/utils-shift.R Show resolved Hide resolved

dsweber2 marked this pull request as draft March 19, 2024 22:03

dsweber2 force-pushed the adjustAhead branch from 08d2481 to c8f7e1d Compare April 1, 2024 20:49

dsweber2 commented Apr 1, 2024

View reviewed changes

dsweber2 commented Apr 2, 2024

View reviewed changes

dshemetov reviewed Apr 2, 2024

View reviewed changes

dsweber2 marked this pull request as ready for review April 30, 2024 22:10

dajmcdon requested changes May 2, 2024

View reviewed changes

dsweber2 force-pushed the adjustAhead branch 2 times, most recently from b07656c to 03c2120 Compare May 8, 2024 21:57

dajmcdon mentioned this pull request May 9, 2024

Djm/deprecate #331

Merged

4 tasks

dajmcdon approved these changes May 9, 2024

View reviewed changes

dsweber2 added 8 commits May 17, 2024 11:12

first draft of extend_ahead

9c9651d

extend_ahead version bump and news

bc687d5

styler has opinions

80153c5

separate step version

a2ed2e9

styler

dd857ef

tests for utils-latency and accompanying fixes

3060021

adding stringr

b4d66e1

dsweber2 added 19 commits May 17, 2024 11:12

old snapshots, select prefers all_of for vectors

ef02b29

local renv way out of date

63e6878

pkgdown needs @Keywords internal

0961644

passes local tests after updating

bcce194

back to skipping some population_scaling tests

cdd8aed

step_adjust_latency works on tests

3c28475

spurious lifecycle addition removed

df48298

fixing RMDcheck remote

a20dcf1

nothing but rlang::abort -> cli::cli_aborts

8d37492

smaller suggestions and styling

0541948

smaller suggestions: local tests passing again

40e86ba

moving shift detection earlier,dropping string*dep

36328de

+purrr, styling

9abae59

glue -> glue::glue

db71207

fix get_latent_column_tibble docs

2868cbd

step_adjust_latency arg docs

a4a33db

rec formatting things, dropping purrr

2f0305e

glue->paste, dropping zoo

6c3ba35

Detecting required/forbidden steps beforehand

a6b0d3f

dsweber2 force-pushed the adjustAhead branch from 2902cc2 to a6b0d3f Compare May 17, 2024 16:16

minor rebase woes

1ada3d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust ahead #296

Adjust ahead #296

dsweber2 commented Mar 18, 2024 •

edited

dsweber2 commented Mar 19, 2024

dsweber2 commented Mar 29, 2024

dsweber2 Apr 1, 2024

brookslogan May 3, 2024 •

edited

dsweber2 May 6, 2024

dsweber2 commented Apr 2, 2024

dajmcdon commented Apr 2, 2024

dajmcdon commented Apr 2, 2024

dsweber2 Apr 2, 2024 •

edited

dsweber2 Apr 2, 2024

dshemetov Apr 2, 2024 •

edited

dajmcdon left a comment

dajmcdon May 2, 2024

dsweber2 May 6, 2024

dajmcdon May 6, 2024

dajmcdon May 2, 2024

dsweber2 May 3, 2024

dajmcdon May 3, 2024

brookslogan May 3, 2024 •

edited

dsweber2 May 3, 2024

dsweber2 commented May 8, 2024

dajmcdon left a comment

dajmcdon May 9, 2024

dsweber2 May 10, 2024

brookslogan commented May 14, 2024

		expect_snapshot(prep <- prep(r, jhu))

		expect_snapshot(b <- bake(prep, jhu))

Adjust ahead #296

Are you sure you want to change the base?

Adjust ahead #296

Conversation

dsweber2 commented Mar 18, 2024 • edited

Checklist

Change explanations for reviewer

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

dsweber2 commented Mar 19, 2024

dsweber2 commented Mar 29, 2024

Choose a reason for hiding this comment

brookslogan May 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsweber2 commented Apr 2, 2024

dajmcdon commented Apr 2, 2024

dajmcdon commented Apr 2, 2024

dsweber2 Apr 2, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dshemetov Apr 2, 2024 • edited

Choose a reason for hiding this comment

dajmcdon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brookslogan May 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsweber2 commented May 8, 2024

dajmcdon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brookslogan commented May 14, 2024

dsweber2 commented Mar 18, 2024 •

edited

brookslogan May 3, 2024 •

edited

dsweber2 Apr 2, 2024 •

edited

dshemetov Apr 2, 2024 •

edited

brookslogan May 3, 2024 •

edited