Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error handling when no data available on forecast_date #448

Open
huisaddison opened this issue Feb 9, 2021 · 0 comments
Open

Improve error handling when no data available on forecast_date #448

huisaddison opened this issue Feb 9, 2021 · 0 comments
Assignees

Comments

@huisaddison
Copy link
Collaborator

I believe this is an evalcast issue. When we attempt to get_predictions() for a forecast_date during which there is no data available over the entire training window, covidcast returns a dataframe of 0 columns and 0 rows. evalcast then tries to data munge but raises a cryptic "issue not found in .data" error.

Suggestion:

  • Catch the empty dataframe case and return a more informative message.

Backtrace:

arrange() failed at implicit mutate() step. 
* Problem with `mutate()` input `..1`.
✖ Column `issue` not found in `.data`
ℹ Input `..1` is `.data$issue`.
Backtrace:
     █
  1. ├─evalcast::get_predictions(...)
  2. │ └─`%>%`(...)
  3. ├─dplyr::bind_rows(.)
  4. │ └─rlang::list2(...)
  5. ├─purrr::map(...)
  6. │ └─evalcast:::.f(.x[[i]], ...)
  7. │   ├─base::do.call(...)
  8. │   └─(function (forecaster, name_of_forecaster, signals, forecast_date, ...
  9. │     └─modeltools:::forecaster(...)
 10. │       └─covidcast::aggregate_signals(df, dt = dt, format = "wide")
 11. │         └─covidcast:::apply_shifts(x, dt)
 12. │           └─base::mapply(apply_shifts_one, x, dt, SIMPLIFY = FALSE)
 13. │             └─(function (x, dt) ...
 14. │               └─covidcast::latest_issue(x)
 15. │                 └─covidcast:::first_or_last_issue(df, TRUE)
 16. │                   └─`%>%`(...)
 17. ├─dplyr::distinct(...)
 18. ├─covidcast:::issue_sort(.)
 19. │ ├─dplyr::arrange(df, dplyr::desc(.data$issue))
 20. │ └─dplyr:::arrange.data.frame(df, dplyr::desc(.data$issue))
 21. │   └─dplyr:::arrange_rows(.data, dots)
 22. │     ├─base::withCallingHandlers(...)
 23. │     ├─dplyr::transmute(new_data_frame(.data), !!!quosures)
 24. │     └─dplyr:::transmute.data.frame(new_data_frame(.data), !!!quosures)
 25. │       ├─dplyr::mutate(.data, ..., .keep = "none")
 26. │       └─dplyr:::mutate.data.frame(.data, ..., .keep = "none")
 27. │         └─dplyr:::mutate_cols(.data, ...)
 28. │           ├─base::withCallingHandlers(...)
 29. │           └─mask$eval_all_mutate(quo)
 30. ├─issue
 31. ├─rlang:::`$.rlang_data_pronoun`(.data, issue)
 32. │ └─rlang:::data_pronoun_get(x, nm)
 33. ├─rlang:::abort_data_pronoun(x)
 34. │ └─rlang::abort(msg, "rlang_error_data_pronoun_not_found")
 35. │   └─rlang:::signal_abort(cnd)
 36. │     └─base::signalCondition(cnd)
 37. ├─(function (e) ...
 38. │ └─rlang::abort(...)
 39. │   └─rlang:::signal_abort(cnd)
 40. │     └─base::signalCondition(cnd)
 41. └─(function (cnd) ...
> 

Reproducing example:

library(covidcast) # branch main
library(evalcast) # branch evalcast-killcards
library(modeltools) # branch main
library(dplyr)

## Setup 

# What are we forecasting?
response_source <- "jhu-csse"
response_signal <- "confirmed_7dav_incidence_prop"
incidence_period <- "day"
ahead <- 1:21
geo_type <- "hrr" 
forecast_dates <- seq(as.Date("2020-07-01"), as.Date("2021-01-31"), by = "day")

# Some quantgen parameters 
n <- 21               # Training set size (in days) 
lags <- c(0, 7, 14)   # Lags (in days) for features
no_pen_vars <- 1      # Variables to leave unpenalized (lastest response value)
nlambda <- 20         # Number of lambda values to consider in cross-validation
lp_solver <- "gurobi"
sort <- TRUE
nonneg <- TRUE

tau = c(0.05, 0.20, 0.50, 0.80, 0.95)

# Important: functions to considerably speed up data fetching steps. Only pull 
# recent data for each forecast date, depending on the training set size (and 
# other parameters for quantgen)
start_day_baseline <- function(forecast_date) {
  return(as.Date(forecast_date) - n - 4 + 1)
}

start_day_quantgen <- function(forecast_date) {
  return(as.Date(forecast_date) - max(ahead) - n - max(lags) + 1)
}

## Produce forecasts

# Quantile autoregression with 3 lags, or QAR3
t0 = Sys.time()
pred_quantgen1 <- get_predictions(
  forecaster = quantgen_forecaster, 
  name_of_forecaster = "QAR3 + CHNG_CLI3",
  signals = tibble::tibble(
                      data_source = c(response_source, "chng"),
                      signal = c(response_signal, "smoothed_adj_outpatient_cli"),
                      start_day = list(start_day_quantgen)),
  forecast_dates = forecast_dates, 
  incidence_period = incidence_period, 
  ahead = ahead, geo_type = geo_type, 
  tau=tau,
  signal_aggregation = "list",
  n = n, lags = lags, 
  nlambda = nlambda, no_pen_vars = no_pen_vars,
  debug = 'debug/QAR3+CHNG_CLI3',
  verbose=TRUE,
  lp_solver = lp_solver, sort = sort, nonneg = nonneg)
t1 = Sys.time()
print(t1-t0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants