Further review on `cdc_baseline_forecaster` #250

brookslogan · 2023-10-10T18:46:36Z

cdc_baseline_forcecaster can successfully reproduce historical Flusight-baseline forecasts from last season; see this script. However, some extra features in cdc_baseline_forecaster that aren't needed to produce Flusight-baseline forecasts may need some extra review.

[also review claims re. covid if they still exist]

Rough points to revisit; some probably don't actually apply

check if runs when doing geo pooling
.$ in epi_slide… prefer .x$? or fine?
locale-independent Saturday check?
styler
.data$ / .env$ to calm checks? or fn Nat used in epiprocess?
why !!outcome?
why predictor on keys? could this be follow "id variable" example in `?add_role`?
(1group_by(across(……))1 vs. 1group_by(pick(…….))1? deprecation planned in future https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-pick-reframe-arrange/. but for compatibility better to keep around??)
missing a step_epi_naomit for the training window? (but what about test data selection?)
side
- `get_test_data `arg validation & fixup a little weird looking (might be able to combine some, `allow_null=TRUE` inside non-NULL branch, -Inf thing weird, class != class)
- ~ min(.x$lag %||% Inf) — why min? also, mapping across all steps…. need to remember this if doing archive-based recipes b/c we may not want this
- check… max lags & max horizon also might not be archive-backcast-compatible unless already doing transform to epi`_df
- what??`?
if (is.null(n_recent)) n_recent <- min_required + 1 # one extra for filling
if (n_recent <= min_required) n_recent <- min_required + n_recent
appears to be flatline + iterated (as if independent) symmetrized 1-week
differences, separately for each geo (w/ no time window, no transformation)
no need for `if (args_list$nonneg) f <- layer_threshold(f, ".pred")`? or does it need to be before `cdc_flatline_quantiles`? or not?
what type of warning are we trying to suppress with suppressWarnings? be more selective?
incomplete propagate test
major
- `data_frequenc`y not considered in layer?
- check on hhs… we don't want filling through forecast_date
- nsims much smaller?
- do we really want warning + something different rather than error when
  `by_key` cols aren't available? also, the warnings don't trigger?? probably
  just casualty of suppressWarnings
- no clue about the reasoning here

`nafill_buffer`: At predict time, recent values of the training data are
used to create a forecast. However, these can be 'NA' due to,
e.g., data latency issues. By default, any missing values
will get filled with less recent data. Setting this value to
'NULL' will result in 1 extra recent row (beyond those
required for lag creation) to be used. Note that we require
at least 'min(lags)' rows of recent data per '`geo_value`' to
create a prediction. For this reason, setting 'nafill_buffer
< min(lags)' will be treated as additional allowed recent
data rather than the total amount of recent data to examine.

semimajor
- assume this doesn't actually work with `time_type` = week?
- data frequency was not 1 week for covid-19 forecasts
- need to do a `step_epi_lag` 1? data_frequency? to get the right training window selection?
- if there are gaps, are deltas appropriately NA?
- with hhs latency, `forecast_date` & `target_date` setting is awkward
- for non-flatline, had issue with contrasts on 1 geo and with residuals not matching size on mult geos. maybe missing `step_epi_naomit`? but adding `step_epi_naomit` gives `Warning: Values from `q` are not uniquely identified; output will contain list-cols.`
awkward…:

if (max_ahead > 1L) {
for (iter in 2:max_ahead) {
filter to Saturdays & no time_type update…
`epiprocess::guess_period` useful?
`state_census$fips` should be chr
maybe avoid `sample` due to length-1-numeric case? unlikely to encounter but bad…

The text was updated successfully, but these errors were encountered:

brookslogan self-assigned this Oct 10, 2023

dajmcdon mentioned this issue Oct 20, 2023

Vignette illustrating FluSight Forecaster #256

Open

3 tasks

dajmcdon mentioned this issue Apr 12, 2024

Release version 0.1.0 / 1.0.0 #318

Open

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further review on `cdc_baseline_forecaster` #250

Further review on `cdc_baseline_forecaster` #250

brookslogan commented Oct 10, 2023 •

edited

Further review on cdc_baseline_forecaster #250

Further review on cdc_baseline_forecaster #250

Comments

brookslogan commented Oct 10, 2023 • edited

Further review on `cdc_baseline_forecaster` #250

Further review on `cdc_baseline_forecaster` #250

brookslogan commented Oct 10, 2023 •

edited