Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent object serialization in slide error messages #429

Open
brookslogan opened this issue Mar 12, 2024 · 3 comments
Open

Prevent object serialization in slide error messages #429

brookslogan opened this issue Mar 12, 2024 · 3 comments
Assignees

Comments

@brookslogan
Copy link
Contributor

brookslogan commented Mar 12, 2024

When I try outputting an unnamed, length-1 list result from a slide computation, I get an unintelligible, incomplete error message.

library(dplyr)
library(purrr)
library(epiprocess)
library(epipredict)
library(ggplot2)

# Mondays
forecast_dates <- c(as.Date("2020-11-02"))

horizons <- c(7L) # relative to forecast_date

example_forecaster <- function(snapshot_edf, forecast_date) {
  shared_reporting_latency <- forecast_date - max(snapshot_edf$time_value)
  aheads <- horizons + shared_reporting_latency # relative to max time_value
  aheads %>%
    map(function(ahead) {
      snapshot_edf %>%
        arx_forecaster(
          outcome = "case_rate_7d_av", # via JHU-CSSE
          predictors = "percent_cli", # from doctor-visits (claims)
          args_list = arx_args_list(
            ahead = ahead,
            quantile_levels = c(0.25, 0.5, 0.75),
            forecast_date = forecast_date
          ))
    }) %>%
    list()
}

pseudoprospective_forecasts <-
  archive_cases_dv_subset %>%
  epix_slide(
    ref_time_values = forecast_dates,
    before = 365000L, # 1000-year time window --> don't filter out any `time_value`s
    ~ example_forecaster(.x, .ref_time_value),
    as_list_col = TRUE
  ) %>%
  rename(forecast_date = time_value)
})

#> Error: Assertion on 'list(list(structure(list(predictions = structure(list(geo_value = c("ca", "fl", "ny", "tx"), .pred = c(13.9733229746794, 20.1862784685272, 21.6914779432801, 18.7492407321225), .pred_distn = structure(list(structure(list(values = c("25%" = 8.81690382631066, "50%" = 13.9733229746794, "75%" = 19.1297421230482), quantile_levels = c(0.25, 0.5, 0.75)), class = c("dist_quantiles", "dist_default", "vctrs_rcrd", "vctrs_vctr")), structure(list(values = c("25%" = 15.0298593201585, "50%" = 20.1862784685272, 
#> "75%" = 25.342697616896), quantile_levels = c(0.25, 0.5, 0.75)), class = c("dist_quantiles", "dist_default", "vctrs_rcrd", "vctrs_vctr")), structure(list(values = c("25%" = 16.5350587949114, "50%" = 21.6914779432801, "75%" = 26.8478970916489), quantile_levels = c(0.25, 0.5, 0.75)), class = c("dist_quantiles", "dist_default", "vctrs_rcrd", "vctrs_vctr")), structure(list(values = c("25%" = 13.5928215837537, "50%" = 18.7492407321225, "75%" = 23.9056598804913), quantile_levels

[In interactive mode the actual description of what's wrong seems to be cut off? Trying in reprex instead seems to output a whole lot more of the structure; I didn't look to see if it also cuts off the important part.]

The "problem" we're catching here is that an unnamed, length-1 list isn't atomic. (Aside: I think we could probably accept these, and maybe even unnamed lists of other lengths [though it might require some implementation tweaks, not just validation changes]. Named lists are likely a mistake or would need extra processing, though.) This was previously the job of

              if (! (is.atomic(comp_value) || is.data.frame(comp_value))) {
                Abort("The slide computation must return an atomic vector or a data frame.")
              }

but now is

        assert(check_atomic(comp_value, any.missing = TRUE), 
            check_data_frame(comp_value), combine = "or", .var.name = vname(comp_value))

and checkmate's not giving a great message.

@brookslogan
Copy link
Contributor Author

brookslogan commented Mar 13, 2024

I'm guessing the culprit here is vname(comp_value). This is just a local variable rather than an argument, so this is probably triggering the [serialization/reverse-parsing] capabilities of deparse within vname.

(function() {
  comp_value = list(letters)
  assert(check_atomic(comp_value, any.missing = TRUE), 
  check_data_frame(comp_value), combine = "or", .var.name = vname(comp_value))
})()
#> Error: Assertion on 'list(c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"))' failed: One of the following must apply:
#>  * check_atomic(list(c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l",
#>  * "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"))): Must be
#>  * of type 'atomic', not 'list'
#>  * check_data_frame(list(c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k",
#>  * "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"))):
#>  * Must be of type 'data.frame', not 'list'.

We can improve this a bit with a custom name

(function() {
  comp_value = list(letters)
  assert(check_atomic(comp_value, any.missing = TRUE), 
  check_data_frame(comp_value), combine = "or", .var.name = "slide computation output")
})()
#> Error: Assertion on 'slide computation output' failed: One of the following must apply:
#>  * check_atomic(slide computation output): Must be of type 'atomic', not 'list'
#>  * check_data_frame(slide computation output): Must be of type 'data.frame', not
#>  * 'list'.

though it still looks a little awkward, and might not look as good if we try to include some context about where the error occurred (see #351 point 2).

@dshemetov any ideas here to make error message output a bit friendlier?

@brookslogan brookslogan changed the title Improve error messaging on slide computation outputs Prevent object destructuring in slide error messages Mar 13, 2024
@brookslogan brookslogan changed the title Prevent object destructuring in slide error messages Prevent object serialization in slide error messages Mar 13, 2024
@dshemetov
Copy link
Contributor

So seems like we need to add check_list() in for one of the accepted types here? As for naming, your "slide computation output" seems like a good approach, but I'll see if I have any better ideas when I make a PR for this.

@dshemetov dshemetov self-assigned this Apr 11, 2024
@brookslogan
Copy link
Contributor Author

brookslogan commented Apr 12, 2024

Accepting [unnamed] lists [& rejecting named lists] would probably need some more effort verifying/tweaking things to work. If we want to just fix the error message we could focus on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants