New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust ahead #296
base: dev
Are you sure you want to change the base?
Adjust ahead #296
Conversation
based on feedback, I'm making this a separate step and making sure full-pipeline tests work. It will be a PR until both of those are done. |
So I'm still putting the tests in a format that isn't a pile of spaghetti, but I wanted to get this out so we could maybe talk about it. To implement a separate step, I couldn't find a way to get previous steps, so I basically had to communicate everything relevant through the headers of While I was looking around I noticed that both There's still some downstream implications for how |
R/utils-latency.R
Outdated
key_cols = keys | ||
) | ||
}) %>% | ||
map(\(x) zoo::na.trim(x)) %>% # TODO need to talk about this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things about this:
- Zoo dependency. Seems we don't currently have this. are we trying to minimize package dependencies? This is the only function I use from it
- This shears
NA
's off the end of the column. We have a step that explicitly dropsNA
's, so I'm not sure how much we're trying to avoid removingNA
's otherwise. Not sure how to do this correctly otherwise though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- by default,
na.trim
removes from both the beginning and the end - [oops just repeating Daniel here] side note:
\(
is R >= 4.1. We were recently talking about not trying to support versions 3.* but not sure about 4.0. - this function from latency processing code lying around may be helpful:
n_true_at_tail = function(x) {
match(FALSE, rev(x), nomatch = length(x) + 1L) - 1L
}
# So then to remove NAs at tail you could do the following (or maybe an S3 version of this):
remove_na_tail <- function(x) {
x[seq_len(length(x) - n_true_at_tail(is.na(x)))]
}
# (don't use `tail` with "negative" `n` unless you know it's actually negative and not zero, which isn't the case here)
remove_na_tail(c(NA, NA, 1:10, NA, 1:10, NA, NA))
remove_na_tail(1:100) # ALTREP
remove_na_tail(1:100 + 1) # non-ALTREP
remove_na_tail(integer(0L))
remove_na_tail(rep(NA_integer_, 5))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[oops just repeating Daniel here]
yeah, threading in github issues can be hard to follow, especially when there are a lot of comments >.<
could you add some docs to those functions?
Getting a peculiar error where the local snapshot test for population scaling has 4 lines like this Code
p <- predict(wf, latest)
Message
Joining with `by = join_by(geo_value)`
Joining with `by = join_by(geo_value)`
Joining with `by = join_by(geo_value)`
Joining with `by = join_by(geo_value)` whereas the CI is apparently only generating 2. And this apparently changed within the past 2 weeks. |
This was a change in |
A few other notes (we can discuss synchronously):
|
expect_snapshot(prep <- prep(r, jhu)) | ||
|
||
expect_snapshot(b <- bake(prep, jhu)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So these are the tests generating the snapshots. Looking at my loca gitl timemachine, this was a set of tests that was previously being skipped, and I added these as snapshot tests. I guess I'll just put this back to skipped for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yes, it does look like there are some lingering Joining with
by = join_by(geo_value)` messages
@@ -1,6 +1,9 @@ | |||
# epipredict (development) | |||
|
|||
Pre-1.0.0 numbering scheme: 0.x will indicate releases, while 0.0.x will indicate PR's. | |||
# epipredict 0.2 | |||
|
|||
- add `latency_adjustment` as an option for `add_epi_ahead`, which adjusts the `ahead` so that the prediction is `ahead` relative to the `as_of` date for the `epi_data`, rather than relative to the last day of data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next "release" version is 0.1, so this should just go below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a number of questions and comments through out. It may be easiest to schedule a check in or maybe on Slack?
DESCRIPTION
Outdated
stringr, | ||
stringi, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly prefer to avoid these if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
brief reason? I'll try to remove them regardless, I'm just wondering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Brief: "fewer Imports is better"
Longer: {stringi}
uses some beefy compiled code (even the precompiled binaries are large), and can sometimes fail to install on some systems. There are a handful of packages like this, {sf}
is another, that I prefer to avoid, where possible.
R/utils-latency.R
Outdated
rlang::abort("the `time_value` column of `new_data` is empty") | ||
} | ||
as_of <- attributes(new_data)$metadata$as_of | ||
max_time <- max(time_values) | ||
# make sure the as_of is sane | ||
if (!inherits(as_of, class(time_values))) { | ||
rlang::abort(glue::glue( | ||
"the data matrix `as_of` value is {as_of}, ", | ||
"and not a valid `time_type` with type ", | ||
"matching `time_value`'s type of ", | ||
"{typeof(new_data$time_value)}." | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for rlang versions here and cli elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, not sure I put much thought into it. The package is also in a split state; is it alright if I just find-replace all instances of rlang::abort
with cli::cli_abort
, or is there some reason there are both generally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the beginning, we used {rlang}
, but I've been trying to have all new code use {cli}
(with a plan to find and replace all the old stuff eventually). I thought there was an issue, but I can't find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't generally just find-replace. See this migration info. [I threw a fix for various violations of this in epiprocess in the R6 refactor which will be merged shortly.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, they have a guide! Thanks Logan. I'll go over the changed ones and make sure they're not broken. Thankfully none of them broke the tests, but I'm guessing we don't have super strict expect_error
tests for every single one
b07656c
to
03c2120
Compare
Overview of new changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I think everything looks potentially fine. It would help for the example to be a bit more elaborate.
Potential for merge conflicts with #331.
#' @family row operation steps | ||
#' @rdname step_adjust_latency | ||
#' @export | ||
#' @examples | ||
#' r <- epi_recipe(case_death_rate_subset) %>% | ||
#' step_epi_ahead(death_rate, ahead = 7) %>% | ||
#' # step_adjust_latency(method = "extend_ahead") %>% | ||
#' step_epi_lag(death_rate, lag = c(0, 7, 14)) | ||
#' r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would help me to flesh this out a bit. Maybe something with latency and see the results with and without?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, do the ones in the tests generally address that? I still need to update the layer functions to use dates correctly, but was waiting until these steps are relatively stable. Once I have those, I'll definitely expand the @examples
, and then write a vignette
Wishlist item based on some dogfooding: if the minimum reporting latency across signals&geos is really large, say, a month or greater, that's worth a warning. E.g., FluSurv-NET has gaps in reporting between seasons, and I tried to forecast a new season before the first report actually was published using ahead adjustment, and happily tried to forecast the beginning of that season from the end of the preceding season, likely getting poor predictions. |
This is purely before any work on latency adjusting, and is just about geting the tests to run without warnings or skipped tests. Getting this involved: 1. renaming some lingering examples of p to probs and `q` to `quantile_levels` (and some `quantile_level`s to the plural) 2. Adding snapshots so the tests for printing in population_scaling work as intended (should probably be converted to cli_informs at some point) 3. removing the nearly empty `test-propagate_samples` which seems like something intended that was never finished. Probably want to add an issue if we actually want it done. 4. added a bunch of `edf`'s in unhappy `prep` steps
Checklist
Please:
dajmcdon.
DESCRIPTION
andNEWS.md
.Always increment the patch version number (the third number), unless you are
making a release PR from dev to main, in which case increment the minor
version number (the second number).
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
0.7.2, then write your changes under the 0.8 heading).
Change explanations for reviewer
Implementation of latency adjustments as an argument to
step_epi_ahead
. The rough idea is to makeahead
relative to theas_of
date, rather than the last day of data. It throws a warning if this is too far in the future. Onlyextend_ahead
andNone
are currently implemented. Biggest downside is it only works for theepi_df
that the pipeline was created on.Still working on:
epi_df
and the end-to-end application works, rather than just the specialized functionextend_ahead
).step_adjust_ahead
locf
and associated testsextend_lags
we should talk about this one, it's a new-ish @lcbrooks idea. Sort of the opposite ofextend_aheads
. Setting the forecast date to beas_of
, per predictor, adjust the lags so that they're relative to the last day of data. E.g. if predictorA
has 3 days of latency, and the lags are given as(0,7,14)
, set the lags to(3,10,17)
, while ifB
has 5 days of latency, set the lags forB
to(5,12,19)
. Feel free to move discussion to the issueMagic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch