This repo is for exploring forecasting methods and tools for both COVID and Flu. The repo is structured as a targets project, which means that it is easy to run things in parallel and to cache results. The repo is also structured as an R package, which means that it is easy to share code between different targets.
Define run parameters:
# Save to your `.Renviron` file:
EPIDATR_USE_CACHE=true
# not strictly necessary, but you probably want a long cache time, since this is for the historical data
EPIDATR_CACHE_MAX_AGE_DAYS=42
DEBUG_MODE=false
USE_SHINY=false
TAR_PROJECT=covid_hosp_explore
EXTERNAL_SCORES_PATH=legacy-exploration-scorecards.qs
AWS_S3_PREFIX=exploration
EPIDATR_USE_CACHE
controls whetherepidatr
functions use the cache.DEBUG_MODE
controls whethertargets::tar_make
is run with thecallr_function=NULL
, which allows for debugging. This only works if parallelization has been turned off inscripts/targets-common.R
by setting the default controller to serial on line 51.USE_SHINY
controls whether we start a Shiny server after producing the targets.TAR_PROJECT
controls whichtargets
project is run byrun.R
. Likely eithercovid_hosp_explore
orflu_hosp_explore
EXTERNAL_SCORES_PATH
controls where external scores are loaded from. If not set, external scores are not used.AWS_S3_PREFIX
controls the prefix to use in the AWS S3 bucket (a prefix is a pseudo-directory in a bucket).
Run the pipeline using:
# Install renv and R dependencies
make install
# Pull pre-scored forecasts from the AWS bucket
make pull
# Run only the dashboard, to display results run on other machines
make dashboard
# Run the pipeline using the helper script `run.R`
make run
# or in the background
make run-nohup
# Push complete or partial results to the AWS bucket
make push
run.R
andMakefile
: the main entrypoint for all pipelinesR/
: R package code to be reusedscripts/
: plotting, code, and misc.tests/
: package testscovid_hosp_explore/
andscripts/covid_hosp_explore.R
: atargets
project for exploring covid hospitalization forecastersflu_hosp_explore/
andscripts/flu_hosp_explore.R
: atargets
project for exploring flu hospitalization forecasterscovid_hosp_prod/
andscripts/covid_hosp_prod.R
: atargets
project for predicting covid hospitalizationsflu_hosp_prod/
andscripts/flu_hosp_prod.R
: atargets
project for predicting flu hospitalizationsforecaster_testing/
andscripts/forecaster_testing.R
: atargets
project for testing forecasters
When running a pipeline with parallelization, make sure to install the package via renv::install(".")
and not just via devtools::load_all()
.
It is safest to develop with parallelism disabled.
Targets in parallel mode has two problems when it comes to debugging: 1) it ignores browsers, so you can't step through functions and 2) reloading any changes requires both renv::install(".")
and restarting R.
To debug a target named yourTarget
:
- set
DEBUG_MODE=true
- insert a browser in the relevant function
- run an R session, and call
tar_make(yourTarget)
See this diagram. Double diamond objects represent "plates" (to evoke plate notation, but don't take the comparison too literally), which are used to represent multiple objects of the same type (e.g. different forecasters).
The basic forecaster takes in an epi_df, does some pre-processing, does an epipredict workflow, and then some post-processing
This kind of forecaster has two components: a list of existing forecasters it depends on, and a function that aggregates those forecasters.
Any forecaster which requires a pre-trained component. An example is a forecaster with a sophisticated imputation method. Evaluating these has some thorns around training/testing splitting. It may be foldable into the basic variety though.