Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further review on cdc_baseline_forecaster #250

Open
Tracked by #318
brookslogan opened this issue Oct 10, 2023 · 0 comments
Open
Tracked by #318

Further review on cdc_baseline_forecaster #250

brookslogan opened this issue Oct 10, 2023 · 0 comments
Assignees

Comments

@brookslogan
Copy link
Contributor

brookslogan commented Oct 10, 2023

cdc_baseline_forcecaster can successfully reproduce historical Flusight-baseline forecasts from last season; see this script. However, some extra features in cdc_baseline_forecaster that aren't needed to produce Flusight-baseline forecasts may need some extra review.

[also review claims re. covid if they still exist]

Rough points to revisit; some probably don't actually apply
  • check if runs when doing geo pooling

  • .$ in epislide… prefer .x$? or fine?

  • locale-independent Saturday check?

  • styler

  • .data$ / .env$ to calm checks? or fn Nat used in epiprocess?

  • why !!outcome?

  • why predictor on keys? could this be follow "id variable" example in `?addrole`?

  • (1groupby(across(……))1 vs. 1groupby(pick(…….))1? deprecation planned in future https://www.tidyverse.org/blog/2023/02/dplyr-1-1-0-pick-reframe-arrange/. but for compatibility better to keep around??)

  • missing a stepepinaomit for the training window? (but what about test data selection?)

  • side

    • `gettestdata `arg validation & fixup a little weird looking (might be able to combine some, `allownull=TRUE` inside non-NULL branch, -Inf thing weird, class != class)
    • ~ min(.x$lag %||% Inf) — why min? also, mapping across all steps…. need to remember this if doing archive-based recipes b/c we may not want this
    • check… max lags & max horizon also might not be archive-backcast-compatible unless already doing transform to epi`df
    • what??`?

    if (is.null(n_recent)) n_recent <- min_required + 1 # one extra for filling
    if (n_recent <= min_required) n_recent <- min_required + n_recent

  • appears to be flatline + iterated (as if independent) symmetrized 1-week
    differences, separately for each geo (w/ no time window, no transformation)

  • no need for `if (argslist$nonneg) f <- layerthreshold(f, ".pred")`? or does it need to be before `cdcflatlinequantiles`? or not?

  • what type of warning are we trying to suppress with suppressWarnings? be more selective?

  • incomplete propagate test

  • major

    • `datafrequenc`y not considered in layer?
    • check on hhs… we don't want filling through forecastdate
    • nsims much smaller?
    • do we really want warning + something different rather than error when
      `bykey` cols aren't available? also, the warnings don't trigger?? probably
      just casualty of suppressWarnings
    • no clue about the reasoning here

`nafillbuffer`: At predict time, recent values of the training data are
used to create a forecast. However, these can be 'NA' due to,
e.g., data latency issues. By default, any missing values
will get filled with less recent data. Setting this value to
'NULL' will result in 1 extra recent row (beyond those
required for lag creation) to be used. Note that we require
at least 'min(lags)' rows of recent data per '`geovalue`' to
create a prediction. For this reason, setting 'nafillbuffer
< min(lags)' will be treated as additional allowed recent
data rather than the total amount of recent data to examine.

  • semimajor

    • assume this doesn't actually work with `timetype` = week?
    • data frequency was not 1 week for covid-19 forecasts
    • need to do a `stepepilag` 1? datafrequency? to get the right training window selection?
    • if there are gaps, are deltas appropriately NA?
    • with hhs latency, `forecastdate` & `targetdate` setting is awkward
    • for non-flatline, had issue with contrasts on 1 geo and with residuals not matching size on mult geos. maybe missing `stepepinaomit`? but adding `stepepinaomit` gives `Warning: Values from `q` are not uniquely identified; output will contain list-cols.`
  • awkward…:

    if (max_ahead > 1L) {
    for (iter in 2:max_ahead) {

  • filter to Saturdays & no timetype update…

  • `epiprocess::guessperiod` useful?

  • `statecensus$fips` should be chr

  • maybe avoid `sample` due to length-1-numeric case? unlikely to encounter but bad…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant