Add latency adjustment #279

dajmcdon · 2023-12-16T18:22:52Z

See the implementation here: https://github.com/cmu-delphi/exploration-tooling/blob/77d7e5eb95e4b17f40567430b0233f6b24cf100a/R/latency_adjusting.R#L11.

One possibility is as an argument to step_epi_ahead().
- pros: easier adjustment in current canned methods. Just add an argument to all the canned steps and proceed.
- cons: limits the flexibility to simply shifting based on explicit missingness (or carefully specified target_date). If the target_date is autocalculated based on the as_of for the epi_df, the ahead could shift massively (say when using finalized data).
The other is as a separate (new) step.
- pros: much more flexibility, if needed (do we?)
- cons: more difficult to propagate through downstream.

Deliverables:

A new / adjusted step.
Propagate throughout examples / vignettes where necessary.
Check that all canned forecaster workflows operate appropriately.
Add a vignette describing the behaviour and alternatives (LOCF implementation, imputing using steps in {recipes})
Adjust get_test_data() as needed

The text was updated successfully, but these errors were encountered:

dsweber2 · 2024-03-01T01:21:33Z

Thinking about this, it actually needs to tie the test and training data together, right? e.g the amount to extend the ahead we're training depends on the latency of the test data. If we're bundling both fit and predict into a forecast function, then we can probably do this. outside that context I think it ends up invalid?

dsweber2 · 2024-03-18T20:30:29Z

oh, rereading this, I guess you're thinking do locf as a separate step, rather than an option for step_ahead?

dsweber2 · 2024-05-15T19:51:06Z

Thinking about locf, I'm not entirely sure there's an appropriate built-in from recipes. I think the most relevant one is step_impute_roll, which doesn't do what we're looking for. None of the other imputers here seem like the right thing either.

Problem example for step_impute_roll

example_data <-
  data.frame(
    day = ymd("2012-06-07") + days(1:12),
    x1 = round(runif(12), 2),
    x2 = round(runif(12), 2),
    x3 = round(runif(12), 2)
  )
example_data$x1[c(1, 5, 6)] <- NA
example_data$x2[c(1:4, 10)] <- NA
example_data$x2[c(8:12, 10)] <- NA
example_data
          day   x1   x2   x3
1  2012-06-08   NA   NA 0.67
2  2012-06-09 0.08   NA 0.70
3  2012-06-10 0.55   NA 0.05
4  2012-06-11 0.46   NA 0.45
5  2012-06-12   NA 0.19 0.54
6  2012-06-13   NA 0.57 0.16
7  2012-06-14 0.56 0.51 0.98
8  2012-06-15 0.67   NA 0.21
9  2012-06-16 0.16   NA 0.06
10 2012-06-17 0.99   NA 0.51
11 2012-06-18 0.16   NA 0.34
12 2012-06-19 0.33   NA 0.45

Which has more NA's than the window

seven_pt <- recipe(~., data = example_data) %>%
  update_role(day, new_role = "time_index") %>%
  step_impute_roll(all_numeric_predictors(), window = 5) %>%
  prep(training = example_data)

# The training set:
bake(seven_pt, new_data = NULL)
   <date>     <dbl> <dbl> <dbl>
 1 2012-06-08  0.46  0.19  0.67
 2 2012-06-09  0.08  0.19  0.7 
 3 2012-06-10  0.55  0.19  0.05
 4 2012-06-11  0.46  0.38  0.45
 5 2012-06-12  0.55  0.19  0.54
 6 2012-06-13  0.56  0.57  0.16
 7 2012-06-14  0.56  0.51  0.98
 8 2012-06-15  0.67  0.54  0.21
 9 2012-06-16  0.16  0.51  0.06
10 2012-06-17  0.99 NA     0.51
11 2012-06-18  0.16 NA     0.34
12 2012-06-19  0.33 NA     0.45

results in more NA's. It won't accept window = Inf, so even if we cook up a custom statistic, we can't set the window wide enough.

So as far as I can tell, either we add a step_locf (which would be quite easy using tidyr::fill), or we add it to step_adjust _latency.

dsweber2 · 2024-05-17T21:57:42Z

I recently learned that get_test_data already has locf built-in. @dshemetov @brookslogan @dajmcdon, we're thinking about dropping that function; do we want to integrate that feature into this step in some way?

brookslogan · 2024-05-20T18:31:06Z

I think the [shifting/ahead&lag-adjustment] approach will give better results than locf imputation for dealing with shared latency between all signals&epikeys (ahead adjustment) and the per-signal, cross-epikey latency (lag adjustment). However, I'm assuming that it doesn't handle any differences in latency between epikeys for the same signal; that part could be done [--- and probably should be done by default in canned forecasters ---] with locf imputation just to enable getting some predictions while still just requiring a single fit. [The locf step should also probably warn if it's locfing very far, or maybe at all, similar to the warnings in these adjustment steps. E.g., if a location is more than a month behind the other locations, something's probably up --- either they stopped reporting and you wouldn't want to forecast, or ingestion is failing, and you want to fix that.]

(If we wanted better results then there's an approach that probably gives better performance by fitting e.g., one model for the regular locations and another model for each unique different set of signal latencies, e.g., normally it might just be VI having some extra latency so we'd fit a separate model based on the lags it has available, but still geopooling across everything. But that's metamodeling, probably not achievable with a step.)

dajmcdon assigned dshemetov, dsweber2 and brookslogan Dec 16, 2023

dsweber2 mentioned this issue Dec 19, 2023

Add the 7dav we talked about along with the std cmu-delphi/exploration-tooling#76

Merged

dsweber2 linked a pull request Mar 18, 2024 that will close this issue

Adjust ahead #296

Open

9 tasks

dajmcdon mentioned this issue Apr 12, 2024

Release version 0.1.0 / 1.0.0 #318

Open

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add latency adjustment #279

Add latency adjustment #279

dajmcdon commented Dec 16, 2023

dsweber2 commented Mar 1, 2024

dsweber2 commented Mar 18, 2024

dsweber2 commented May 15, 2024

dsweber2 commented May 17, 2024

brookslogan commented May 20, 2024 •

edited

Add latency adjustment #279

Add latency adjustment #279

Comments

dajmcdon commented Dec 16, 2023

dsweber2 commented Mar 1, 2024

dsweber2 commented Mar 18, 2024

dsweber2 commented May 15, 2024

dsweber2 commented May 17, 2024

brookslogan commented May 20, 2024 • edited

brookslogan commented May 20, 2024 •

edited