Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix predict_time_value not matching forecast_date #121

Open
brookslogan opened this issue Oct 12, 2021 · 1 comment
Open

Fix predict_time_value not matching forecast_date #121

brookslogan opened this issue Oct 12, 2021 · 1 comment

Comments

@brookslogan
Copy link
Collaborator

Currently predict_time_value is the max time_value in the wide df, which appears to be the max time_value with a non-shifted signal available. This is not equal to the forecast (as-of) date for any covidcast signal that I'm aware of, but we want it to be. This problem is even worse for data sources such as "hhs" that can be missing a few days before the as-of date rather than just one, and for those where wday effects may be important. See also cmu-delphi/covidcast#569

@brookslogan
Copy link
Collaborator Author

Reprex

library(dplyr)
library(animalia)
library(evalcast)

## A version of production_forecaster that will place debug info into an object in the global environment:
debug_production_forecaster = `body<-`(
  animalia::production_forecaster,
  value = expr({
    .GlobalEnv[["debug.production_forecaster.env"]] <- environment()
    !!body(animalia::production_forecaster)
  })
)

predictions =
  evalcast::get_predictions(
              debug_production_forecaster,
              "debug_production_forecaster",
              signals = tibble(
                data_source = "jhu-csse",
                signal = "confirmed_incidence_num",
                geo_type = "state",
                geo_values = "pa",
                start_day = "2021-01-01"
              ),
              forecast_dates = as.Date("2021-03-10"),
              incidence_period = "day",
              forecaster_args = list(
                incidence_period = "day",
                lags = c(0L, 7L, 14L)
              )
            )

debug.production_forecaster.env$mats$predict_time_value # expected to be forecast date
debug.production_forecaster.env$predict_params$newx # expected lag 0 to be NA (& trigger an error)

The result in this case is forecasts that target times 1d earlier than intended. Again, for "hhs"-data-source data or less reliably near-real-time data, it could be more than 1d. The impact of such mistargeting would be larger when there are significant wday effects.

(Note: we shouldn't expect to have data for the forecast date for most/all covidcast signals, so including 0L in the lags doesn't really make sense. But if we remove it and have lags=c(7L, 14L), the problem remains: the predict_time_value and relevant newx entries are still the same as in the example above.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant