Fix `predict_time_value` not matching `forecast_date` #121

brookslogan · 2021-10-12T17:25:40Z

Currently predict_time_value is the max time_value in the wide df, which appears to be the max time_value with a non-shifted signal available. This is not equal to the forecast (as-of) date for any covidcast signal that I'm aware of, but we want it to be. This problem is even worse for data sources such as "hhs" that can be missing a few days before the as-of date rather than just one, and for those where wday effects may be important. See also cmu-delphi/covidcast#569

The text was updated successfully, but these errors were encountered:

brookslogan · 2021-10-13T11:43:49Z

Reprex

library(dplyr)
library(animalia)
library(evalcast)

## A version of production_forecaster that will place debug info into an object in the global environment:
debug_production_forecaster = `body<-`(
  animalia::production_forecaster,
  value = expr({
    .GlobalEnv[["debug.production_forecaster.env"]] <- environment()
    !!body(animalia::production_forecaster)
  })
)

predictions =
  evalcast::get_predictions(
              debug_production_forecaster,
              "debug_production_forecaster",
              signals = tibble(
                data_source = "jhu-csse",
                signal = "confirmed_incidence_num",
                geo_type = "state",
                geo_values = "pa",
                start_day = "2021-01-01"
              ),
              forecast_dates = as.Date("2021-03-10"),
              incidence_period = "day",
              forecaster_args = list(
                incidence_period = "day",
                lags = c(0L, 7L, 14L)
              )
            )

debug.production_forecaster.env$mats$predict_time_value # expected to be forecast date
debug.production_forecaster.env$predict_params$newx # expected lag 0 to be NA (& trigger an error)

The result in this case is forecasts that target times 1d earlier than intended. Again, for "hhs"-data-source data or less reliably near-real-time data, it could be more than 1d. The impact of such mistargeting would be larger when there are significant wday effects.

(Note: we shouldn't expect to have data for the forecast date for most/all covidcast signals, so including 0L in the lags doesn't really make sense. But if we remove it and have lags=c(7L, 14L), the problem remains: the predict_time_value and relevant newx entries are still the same as in the example above.)

brookslogan mentioned this issue Oct 25, 2021

Think about how to handle multiple signals cmu-delphi/epiprocess#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `predict_time_value` not matching `forecast_date` #121

Fix `predict_time_value` not matching `forecast_date` #121

brookslogan commented Oct 12, 2021

brookslogan commented Oct 13, 2021

Fix predict_time_value not matching forecast_date #121

Fix predict_time_value not matching forecast_date #121

Comments

brookslogan commented Oct 12, 2021

brookslogan commented Oct 13, 2021

Fix `predict_time_value` not matching `forecast_date` #121

Fix `predict_time_value` not matching `forecast_date` #121