Dev (#365)

* Add an environment variable that can be used for writing uniquely named output files across blocks of jobs from AWS batch * RStudio in the Docker container RStudio is now available in the Docker container, which allows development and EDA with the same set of packages as is run in production. * Update covidImportation package to v1.6 (#10) * Update covidImportation package to v1.6 (#250) * Adding final form of previous packrat + docker setup after merging weirdness * Switching .so to git lfs * Updated indexing in simulations and hospitalization * Added better indexing for hospitalization * Add ability to reduce alpha, sigma, and gamma (#241) * Add the ability to reduce multiple parameters * Add Reduce scenario template to test_simple and documentation * minor bug test fix * Minor bugs Co-authored-by: Joseph Lemaitre <joseph.lemaitre@epfl.ch> * Move the spatial setup outside of the scenarios loop since it's expensive to load and doesn't change per scenario. * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Updated based on review * Fixed filter issues with makefile setup in case dynfilter isn't provided in config * Updated makefile * Reduce hospitalization memory pressure Switch a critical split-apply-combine away from `do.call()`, which results in a 45% reduction in memory usage and a 35% speedup in execution time in my testing. * Packrat (#253) * Adding final form of previous packrat + docker setup after merging weirdness * Switching .so to git lfs * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Updated based on review * Updated to use dev's docker instead of dataseed's * Added reticulate zoo and xts * Updated docker with git-lfs * Packrat (#267) * Adding final form of previous packrat + docker setup after merging weirdness * Switching .so to git lfs * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Updated based on review * Updated to use dev's docker instead of dataseed's * Added reticulate zoo and xts * Updated docker with git-lfs * Updating docker to install current versions of local packages * Update .Rprofile * Update dockerhub.yaml * Update aws.yaml * Yet another packrat attempt * Update ci.yml * Generic version of the batch job launcher/runner (#257) * Generic version of batch from the union of jwills_dfU_run and dataseed_batch2 * Fixes from running stuff on some test jobs * Add a vcpu CLI option and update sims_per_job to refer to slots per job Co-authored-by: jkamins7 <jkaminsky@jhu.edu> * Reduce SEIR startup costs (#273) * 60% speedup in one run SEIR performance The biggest cost in a single sim SEIR run was importation of Numba and the JIT compilation. Change this to compile ahead of time, which results in a nice 60% lift in one run SEIR performance by saving these startup costs---which will be valuable for our large inference runs. Minor performance benefit when running many simulations as JIT costs are amortized away. ``` Benchmark #1: single sim JIT compilation (current) Time (mean ± σ): 13.429 s ± 0.537 s Range (min … max): 12.973 s … 14.867 s 100 runs Benchmark #2: single sim AOT compilation (new) Time (mean ± σ): 5.129 s ± 0.125 s Range (min … max): 4.901 s … 5.364 s 100 runs ``` * Add Python build directory to .gitignore * Integrate build_US_setup into pipeline and... (#271) * Add hard-coded territory data to build_US_setup * Create csv of island area census data since it cannot be accessed by API * Change the report targets to follow the conventions of make_makefile * Integrate build_US_setup into pipeline * Some bug fixes * git lfs pull of commute_data.csv and switch docker image * Update ci.yml * Update ci.yml * Remove generated files * Update make_makefile.R * Update run_tests.py * pull census year from config * Use census year from config to build_US_setup * Update build_US_setup.R Co-authored-by: eclee25 <eclee25@gmail.com> * Add check to hospitalization that geodata geoids are in geoid-params.csv (#283) * added state level script for creating csv reporting out quantiles * Fixed a slight bug with static dates and added full geographic extent version of the quantile generation script * Added countylevel script * Varios fixes and updates to post run summarization scripts. * Integrate QuantileSummarizeGeoExtent.R into pipeline (untested) * Integrate QuantileSummarizeGeoExtent.R into pipeline * Create QuantileSummarizeGeoidLevel.py * Working on the python script * Integrate quantile scripts into Makefile * Delete QuantileSummarizeGeoidLevel.py * perf fix for quantile_report_script * QuantileSummarizeGeoidLevel on Apache Spark This commit includes a Python implementation of `QuantileSummarizeGeoidLevel.R` running on Apache Spark. The job essentially computes quantiles grouped by geoid and time whereby Spark provides the shuffle and quantile estimation mechanism to perform this aggregation efficiently. The job can be run locally within the container (fine for USA run but takes ~45mins on a r5.24xlarge) or distributed on Amazon EMR. This commit adds Spark and consequently Java inside the container. * add `--name_filter` to quantile_summarize_geoid_level as per feedback * Adjust quantile scripts so they all have the same interface - Fixed bug in both R scripts where `num_files` was set incorrectly - Adjust quantile_summarize_geoid_level.py to take scenarios (+ config file) versus path names as input to mimic the interface of the other scripts * Revert make_makefile.R to dev branch version * setup file for international countries * Fatiguing NPI * tested MVP * other implementation, maybe cleaner * update to hosp_run to take specified geoid-params * Added mild infections as output of hospitalization * minor * Hospitalization package update * dev setup * fixed rate * adding apl deployment to ecr * international seeding and setup files created * Update to report template docs for country reports * update to non-US scripts * update to international branch country setup * non-US setup Rmd and other scripts finished. * update * minor print edit * updates to script to make international functional with master * minor update to report and setup scripts * setup fix * non-us update * dev setup relative min * relative min ready * 1. Added integration tests for US and non-US create_seeding.R and build_US_setup.R/build_nonUS_setup.R 2. create_seeding.R now has the option to choose "CSSE" or "USAFacts" for a US run. * Delete jhucsse_case_data_crude.csv accidental data commit * vignette fix * Removed man folders from packages * fixes in the international branch before the merge * Do not update packages * Update covidImportation to v1.6.1 * minor fix * fix non-US setup * Update local_install.R * Fix merge error * Reload covidImportation v1.6.1 to fix tidyverse dependency * seeding update with inputted incidence multiplier * minor names fix * Minor fixes to build_US and build_nonUS integration tests * deleted a comma * minor bug fix * Fix reversed international tag * fixed error message * fixed python error * minor * Adding updated severity parameters * fixing US seeding * adding print message * Update covidImportation with bug fix * minor update * Fix filter issue * integration testing fixes * Non-US makefile added. This should actually work fine for US as well. It also adds the ability to use the setup_name from the config to add a file prefix to model outputs, and then only clean those model outputs when running "make clean". * make_makefile.R now includes both US and non-US functionality * make_makefile white space fix * Add tictoc package to dev docker * Updated to fix a docker bug * Report devel2 into dev (#352) * updates to state template * fix load_cum_inf_geounit_dates to use hosp only * add hosp method chunks from report_devel * adding generic mapping function * removing grouping by time for appropriate cumsum in load_cum_inf * fixing error in load_cum_inf * add ventilator to scenario tbl * add warning about loading infections from hosp data * deprecate old functions, integration testing temp * recreating clean NAMESPACE to remove export of setup_testing_environment preventing pkg install * adding sim_num before post_process in load_hosp_sims_filtered for output that does not contain sim_num but requires it for post-processing * adding warning about variable name to load_hosp_geounit_threshold * moving make_excess_heatmap to deprecated functions * prep report_devel2 for dev merge (#351) * Version with pyarrow included * Dependencies for arrow in R as well * Fixed check_model script * Updated for feather integration * Updated test cases since `n` is reserved in yml * adding make_excess_heatmap function for hosp outcomes * Fixing parallelization mistake * Minor fixes - Use the "optimize" covidImportation version - Always upgrade local packages if upgrade available (vs silently ignore) - check_model_reports should ensure axis are dates * new figure relative to threshold heatmap * Update importation.R to match covidImportation package updates * Updated model code to use the new covidImportation package, and also seed to E instead of I (and keep population fixed * Fixed typo * Final fix to avoid numba * Fixed path to install_local script * Added package * Fixed seeding creation * rm NAs and fix create_seeding.R * add new cum hosp/deaths check to check_models scr * update indexes in check model script * long form mobility * Update reference to geoid-params.csv inside of hosp_run.R * 10x seeding file * Write the npi when writing parquet output * template * report after simulation * Removed geodata read from hosp_run.R since it's not being used * Updated things that feed into mobility * Updated build_US_setup.R to account for the move * These files got removed in a previous commit * Removing unused (as far as I can tell anyway) data * Fix bug when the places are also a number * Changing back test cases to use size/prob instead of n/p * Updated name to pass checks on case sensitive OS * Updated to use file_extension argument` * Fix broken tests, though I recommend we eliminate the mean and var checks since they'll be flaky * Updated build_US_setup.R to work with the current setup * Renamed parameters to avoid confusion; print out simid as 9 digits SEIR and hospitalization phases have more standardized file format * read parquet file times correctly * Revert "read parquet file times correctly" This reverts commit 521dd25. * parquet date fixes (#207) Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com> * Report devel (#208) * fix unit test code * fix unit test for real * fix unit tests * adding ability to filter geoids in relative heatmap function * adding template for county-specific report for a given state * lower tolerance for distribution tests * planning_models chunk * planning scenario chunk * add names to dev team Co-authored-by: eclee25 <eclee25@gmail.com> Co-authored-by: Kyra Grantz <kyragrantz@gmail.com> Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com> * Adding Javier (#210) Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com> Co-authored-by: Elizabeth Lee <eclee25@gmail.com> * Delete build-model-input.R (#217) * Dataseed merge (#215) * Adding Javier * Adding commute data back in * rm fixed param and comment out bad plot * commit namesapce report gen * fix NVentCurr name * formatting changes to county report template, removing defaults that should be modified for each report * adding references for county report template * change importation seeding * table formatting * limitations chunk considering age specific hosp calculations * removing build_hospdeath_geoid_par - old version not used in hosprun.R * removing legacy hospitalization scripts. everything runs through hosp_run.R now * using current default durations to minimize confusion Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com> Co-authored-by: Elizabeth Lee <eclee25@gmail.com> Co-authored-by: Kyra Grantz <kyragrantz@gmail.com> * Removing config.yml and changing the variable name in create_seeding to be truthful. (#219) * Fixed the low in followup issue (#224) * Fixed the low in followup issue * Adding initial ^ * adding county report template yaml (#221) Co-authored-by: jkamins7 <jkaminsky@jhu.edu> * Fix load-bearing typo (#225) * Fix load-bearing typo * pretty sure it's supposed to be this Co-authored-by: Josh Wills <jwills@apache.org> Co-authored-by: kkintaro <katkintaro@gmail.com> * Add an environment variable that can be used for writing uniquely named output files across blocks of jobs from AWS batch * fix for 1 scenario (#230) Co-authored-by: Elizabeth Lee <eclee25@gmail.com> * RStudio in the Docker container RStudio is now available in the Docker container, which allows development and EDA with the same set of packages as is run in production. * Update covidImportation package to v1.6 (#10) * Update covidImportation package to v1.6 (#250) * Adding final form of previous packrat + docker setup after merging weirdness * Switching .so to git lfs * Updated indexing in simulations and hospitalization * Added better indexing for hospitalization * Add ability to reduce alpha, sigma, and gamma (#241) * Add the ability to reduce multiple parameters * Add Reduce scenario template to test_simple and documentation * minor bug test fix * Minor bugs Co-authored-by: Joseph Lemaitre <joseph.lemaitre@epfl.ch> * Move the spatial setup outside of the scenarios loop since it's expensive to load and doesn't change per scenario. * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Updated based on review * Fixed filter issues with makefile setup in case dynfilter isn't provided in config * Updated makefile * Reduce hospitalization memory pressure Switch a critical split-apply-combine away from `do.call()`, which results in a 45% reduction in memory usage and a 35% speedup in execution time in my testing. * Packrat (#253) * Adding final form of previous packrat + docker setup after merging weirdness * Switching .so to git lfs * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Updated based on review * Updated to use dev's docker instead of dataseed's * Added reticulate zoo and xts * Updated docker with git-lfs * Packrat (#267) * Adding final form of previous packrat + docker setup after merging weirdness * Switching .so to git lfs * Removing source for packages installable from cran * Updated the python rules for reticulate (tests still pass) * Updated based on review * Updated to use dev's docker instead of dataseed's * Added reticulate zoo and xts * Updated docker with git-lfs * Updating docker to install current versions of local packages * Update .Rprofile * Update dockerhub.yaml * Update aws.yaml * Yet another packrat attempt * Update ci.yml * Generic version of the batch job launcher/runner (#257) * Generic version of batch from the union of jwills_dfU_run and dataseed_batch2 * Fixes from running stuff on some test jobs * Add a vcpu CLI option and update sims_per_job to refer to slots per job Co-authored-by: jkamins7 <jkaminsky@jhu.edu> * changing covidImportation tag to 1.6.1 * Reduce SEIR startup costs (#273) * 60% speedup in one run SEIR performance The biggest cost in a single sim SEIR run was importation of Numba and the JIT compilation. Change this to compile ahead of time, which results in a nice 60% lift in one run SEIR performance by saving these startup costs---which will be valuable for our large inference runs. Minor performance benefit when running many simulations as JIT costs are amortized away. ``` Benchmark #1: single sim JIT compilation (current) Time (mean ± σ): 13.429 s ± 0.537 s Range (min … max): 12.973 s … 14.867 s 100 runs Benchmark #2: single sim AOT compilation (new) Time (mean ± σ): 5.129 s ± 0.125 s Range (min … max): 4.901 s … 5.364 s 100 runs ``` * Add Python build directory to .gitignore * Integrate build_US_setup into pipeline and... (#271) * Add hard-coded territory data to build_US_setup * Create csv of island area census data since it cannot be accessed by API * Change the report targets to follow the conventions of make_makefile * Integrate build_US_setup into pipeline * Some bug fixes * git lfs pull of commute_data.csv and switch docker image * Update ci.yml * Update ci.yml * Remove generated files * Update make_makefile.R * Update run_tests.py * pull census year from config * Use census year from config to build_US_setup * Update build_US_setup.R Co-authored-by: eclee25 <eclee25@gmail.com> * Add check to hospitalization that geodata geoids are in geoid-params.csv (#283) * added state level script for creating csv reporting out quantiles * Fixed a slight bug with static dates and added full geographic extent version of the quantile generation script * Added countylevel script * Varios fixes and updates to post run summarization scripts. * Integrate QuantileSummarizeGeoExtent.R into pipeline (untested) * Integrate QuantileSummarizeGeoExtent.R into pipeline * Create QuantileSummarizeGeoidLevel.py * Working on the python script * Integrate quantile scripts into Makefile * Delete QuantileSummarizeGeoidLevel.py * perf fix for quantile_report_script * QuantileSummarizeGeoidLevel on Apache Spark This commit includes a Python implementation of `QuantileSummarizeGeoidLevel.R` running on Apache Spark. The job essentially computes quantiles grouped by geoid and time whereby Spark provides the shuffle and quantile estimation mechanism to perform this aggregation efficiently. The job can be run locally within the container (fine for USA run but takes ~45mins on a r5.24xlarge) or distributed on Amazon EMR. This commit adds Spark and consequently Java inside the container. * add `--name_filter` to quantile_summarize_geoid_level as per feedback * Adjust quantile scripts so they all have the same interface - Fixed bug in both R scripts where `num_files` was set incorrectly - Adjust quantile_summarize_geoid_level.py to take scenarios (+ config file) versus path names as input to mimic the interface of the other scripts * Revert make_makefile.R to dev branch version * setup file for international countries * Fatiguing NPI * tested MVP * other implementation, maybe cleaner * update to hosp_run to take specified geoid-params * Added mild infections as output of hospitalization * minor * Hospitalization package update * dev setup * fixed rate * adding apl deployment to ecr * international seeding and setup files created * Update to report template docs for country reports * update to non-US scripts * update to international branch country setup * non-US setup Rmd and other scripts finished. * update * minor print edit * updates to script to make international functional with master * minor update to report and setup scripts * setup fix * non-us update * dev setup relative min * relative min ready * 1. Added integration tests for US and non-US create_seeding.R and build_US_setup.R/build_nonUS_setup.R 2. create_seeding.R now has the option to choose "CSSE" or "USAFacts" for a US run. * Delete jhucsse_case_data_crude.csv accidental data commit * vignette fix * Removed man folders from packages * fixes in the international branch before the merge * Do not update packages * Update covidImportation to v1.6.1 * minor fix * fix non-US setup * Update local_install.R * Fix merge error * Reload covidImportation v1.6.1 to fix tidyverse dependency * seeding update with inputted incidence multiplier * minor names fix * Minor fixes to build_US and build_nonUS integration tests * deleted a comma * minor bug fix * Fix reversed international tag * fixed error message * fixed python error * minor * Adding updated severity parameters * fixing US seeding * adding print message * Update covidImportation with bug fix * minor update * Fix filter issue * integration testing fixes * Non-US makefile added. This should actually work fine for US as well. It also adds the ability to use the setup_name from the config to add a file prefix to model outputs, and then only clean those model outputs when running "make clean". * make_makefile.R now includes both US and non-US functionality * make_makefile white space fix * Add tictoc package to dev docker * Updated to fix a docker bug Co-authored-by: Josh Wills <jwills@apache.org> Co-authored-by: jkamins7 <jkaminsky@jhu.edu> Co-authored-by: kkintaro <katkintaro@gmail.com> Co-authored-by: Kyra Grantz <kyragrantz@gmail.com> Co-authored-by: Sam Shah <sam@skipflag.com> Co-authored-by: shauntruelove <satruelove@gmail.com> Co-authored-by: chadi <joseph.lemaitre@epfl.ch> Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com> Co-authored-by: Josh Wills <josh.wills@gmail.com> Co-authored-by: Sam Shah <shahsam@umich.edu> Co-authored-by: Dave <David.Witman@jhuapl.edu> Co-authored-by: Shaun Truelove <shauntruelove@users.noreply.github.com> * rename report.generation folder * update report.generation path in workflow test Co-authored-by: Kyra Grantz <kyragrantz@gmail.com> Co-authored-by: juanderone <57634493+juanderone@users.noreply.github.com> Co-authored-by: Josh Wills <jwills@apache.org> Co-authored-by: jkamins7 <jkaminsky@jhu.edu> Co-authored-by: kkintaro <katkintaro@gmail.com> Co-authored-by: Sam Shah <sam@skipflag.com> Co-authored-by: shauntruelove <satruelove@gmail.com> Co-authored-by: chadi <joseph.lemaitre@epfl.ch> Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com> Co-authored-by: Josh Wills <josh.wills@gmail.com> Co-authored-by: Sam Shah <shahsam@umich.edu> Co-authored-by: Dave <David.Witman@jhuapl.edu> Co-authored-by: Shaun Truelove <shauntruelove@users.noreply.github.com> * configurable delay and ratio for seeding * seeding file extra comma * change path to report.generation * rm double parens * Dev make (#358) * make_makefile - rm filter and add seeding & intl * add parens * typo * Removed filter from tests * fix parens issue * fixes #338 by raising an error * fixes #339 by raising an error * better and correct message * bugfixes * better presentation * consistency accross messages * accidently deleted some test, putting them back * newlines * Updated make_makefile.R to pass tests multiple times in a row * integ test 2x, update local install * try to fix 2x integ test * rm unnecessary chdir * fix typo in aws apl workflow Co-authored-by: jkamins7 <jkaminsky@jhu.edu> Co-authored-by: chadi <joseph.lemaitre@epfl.ch> * readme file changes * change to latest docker image * dev image * make sensible load_config err + test * Updated docker file * Removed failing workflow * Removed more rstudio config from docker file * Removed more rstudio config from docker file * Removed outdated vignettes * Updated covidImportation version in docker * Updated packrat Co-authored-by: Josh Wills <jwills@apache.org> Co-authored-by: kkintaro <katkintaro@gmail.com> Co-authored-by: Sam Shah <sam@skipflag.com> Co-authored-by: jkamins7 <jkaminsky@jhu.edu> Co-authored-by: Joseph Lemaitre <joseph.lemaitre@epfl.ch> Co-authored-by: Josh Wills <josh.wills@gmail.com> Co-authored-by: Sam Shah <shahsam@umich.edu> Co-authored-by: shauntruelove <satruelove@gmail.com> Co-authored-by: Dave <David.Witman@jhuapl.edu> Co-authored-by: Shaun Truelove <shauntruelove@users.noreply.github.com> Co-authored-by: Kyra Grantz <kyragrantz@gmail.com> Co-authored-by: juanderone <57634493+juanderone@users.noreply.github.com> Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
HopkinsIDD · Sep 9, 2020 · a9a3175 · a9a3175
1 parent 4b65fac
commit a9a3175
Show file tree

Hide file tree

Showing 395 changed files with 21,320 additions and 6,071 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -1,2 +1,3 @@
 data/united-states-commutes/census_tracts_2010.csv filter=lfs diff=lfs merge=lfs -text
 data/united-states-commutes/commute_data.csv filter=lfs diff=lfs merge=lfs -text
+packrat/lib/x86_64-pc-linux-gnu/3.6.3/arrow/libs/arrow.so filter=lfs diff=lfs merge=lfs -text
diff --git a/.github/workflows/aws.yaml b/.github/workflows/aws.yaml
@@ -17,6 +17,8 @@ on:
       - 'local_install.R'
       - 'Dockerfile'
       - 'R/pkgs/**'
+      - '.Rprofile'
+      - 'packrat/**'
 
 name: Deploy to Amazon ECR
 

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -16,11 +16,17 @@ jobs:
   unit-tests:
     runs-on: ubuntu-18.04
     container:
-      image: docker://hopkinsidd/covidscenariopipeline:latest-dataseed
+      image: docker://hopkinsidd/covidscenariopipeline:latest-dev
       options: --user root
     steps:
       - name: Checkout
         uses: actions/checkout@v2
+      - name: Set up Rprofile
+        run: cp Docker.Rprofile $HOME/.Rprofile
+        shell: bash
+      - name: Install SEIR package
+        run: /home/app/python_venv/bin/python setup.py install
+        shell: bash
       - name: Run pytest SEIR
         run: |
           /home/app/python_venv/bin/pytest SEIR/
@@ -35,15 +41,16 @@ jobs:
           setwd("R/pkgs/hospitalization")
           devtools::test(stop_on_failure=TRUE)
         shell: Rscript {0}
-      - name: Run report_generation tests
+      - name: Run report.generation tests
         run: |
-          setwd("R/pkgs/report_generation")
+          setwd("R/pkgs/report.generation")
           devtools::test(stop_on_failure=TRUE)
         shell: Rscript {0}
       - name: Run integration tests
         env: 
           CENSUS_API_KEY: ${{ secrets.CENSUS_API_KEY }}
         run: |
+          git lfs pull
           Rscript local_install.R
           cd test
           /home/app/python_venv/bin/pytest run_tests.py

diff --git a/.github/workflows/dockerhub.yaml b/.github/workflows/dockerhub.yaml
@@ -15,6 +15,8 @@ on:
       - 'local_install.R'
       - 'Dockerfile'
       - 'R/pkgs/**'
+      - '.Rprofile'
+      - 'packrat/**'
 
 name: Deploy to DockerHub
 

diff --git a/.gitignore b/.gitignore
@@ -54,3 +54,9 @@ vignettes/*.pdf
 
 # R Environment Variables
 .Renviron
+packrat/lib*/
+
+# Python build dirs
+build/
+dist/
+SEIR.egg-info/
diff --git a/Docker.Rprofile b/Docker.Rprofile
@@ -0,0 +1,6 @@
+#### -- Packrat Autoloader (version 0.5.0) -- ####
+working_directory <- getwd()
+setwd("/home/app/")
+source("/home/app/packrat/init.R")
+setwd(working_directory)
+#### -- End Packrat Autoloader -- ####
diff --git a/Dockerfile b/Dockerfile
@@ -1,7 +1,7 @@
 FROM ubuntu:18.04
 
 USER root
-ENV TERM dumb
+ENV TERM linux
 
 # set locale info
 RUN apt-get update && apt-get install -y locales && locale-gen en_US.UTF-8
@@ -31,6 +31,7 @@ RUN apt-get update && \
     less \
     build-essential \
     git-core \
+    git-lfs \
     curl \
     pandoc \
     pandoc-citeproc \
@@ -87,13 +88,14 @@ ENV HOME /home/app
 #####
 
 # TODO: use packrat (or something else) for R package management
-COPY packages.R $HOME
-RUN Rscript packages.R
-
-# install custom packages from R/pkgs/**
-COPY local_install.R $HOME
-COPY R/pkgs $HOME/R/pkgs
-RUN Rscript local_install.R
+RUN Rscript -e "install.packages('packrat',repos='https://cloud.r-project.org/')" \
+    && Rscript -e "install.packages('arrow',repos='https://cloud.r-project.org/')" \
+    && Rscript -e 'arrow::install_arrow()'
+COPY --chown=app:app packrat $HOME/packrat
+COPY --chown=app:app Docker.Rprofile $HOME/.Rprofile
+COPY --chown=app:app R/pkgs $HOME/R/pkgs
+RUN Rscript -e 'packrat::restore()'
+RUN Rscript -e 'install.packages(list.files("R/pkgs",full.names=TRUE),type="source",repos=NULL)'
 
 
 #####
@@ -105,9 +107,10 @@ ENV PYTHON_VERSION 3.7.6
 ENV PYTHON_VENV_DIR $HOME/python_venv
 ENV PATH $PYENV_ROOT/shims:$PYENV_ROOT/bin:$PATH
 
+
 RUN git clone git://github.com/yyuu/pyenv.git $HOME/.pyenv \
     && rm -rf $HOME/.pyenv/.git \
-    && pyenv install -s $PYTHON_VERSION --verbose \
+    && env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install -s $PYTHON_VERSION --verbose \
     && pyenv rehash \
     && echo 'eval "$(pyenv init -)"' >> ~/.bashrc \
     && echo "PS1=\"\[\e]0;\u@\h: \w\a\] \h:\w\$ \"" >> ~/.bashrc

diff --git a/R/pkgs/covidcommon/R/config.R b/R/pkgs/covidcommon/R/config.R
@@ -27,8 +27,15 @@ load_config <- function(fname) {
     fname <- Sys.getenv("CONFIG_PATH")
   }
   if (!missing(fname)) {
-    handlers <- list(map=function(x) { class(x) <- "config"; return(x) })
-    return(tryCatch(yaml.load_file(fname, handlers=handlers), error = function(e) { stop(paste("Could not find file: ", fname)) }))
+
+    if(!file.exists(fname)){
+      stop(paste("Could not find file:", fname))
+    } else{
+      handlers <- list(map=function(x) { class(x) <- "config"; return(x) })
+      return(tryCatch(yaml.load_file(fname, handlers=handlers), error = function(e) { stop(paste("The config", fname, "has an error. Run `yaml::read_yaml(", fname, ")` to identify the line where the error exists.")) }))
+    } 
+
+
   } else {
     return(NA)
   }

diff --git a/R/pkgs/covidcommon/tests/testthat/test-load_config.R b/R/pkgs/covidcommon/tests/testthat/test-load_config.R
@@ -1,6 +1,8 @@
 test_that("load_config works", {
   fname <- tempfile()
   cat("yaml: TRUE\n",file=fname)
+  fname_bad <- tempfile()
+  cat("yaml: TRUE\n yaml2: FALSE\n",file=fname_bad)
 
   expect_equal(
     load_config(fname)$yaml,
@@ -9,7 +11,7 @@ test_that("load_config works", {
 
   expect_error(
     load_config(";lkdjaoijdsfjoasidjfaoiwerfj q2fu8ja8erfasdiofj aewr;fj aff409a urfa8rf a';j 38i a0fuadf "),
-    "file"
+    "Could not find"
   )
 
   expect_error(
@@ -21,6 +23,11 @@ test_that("load_config works", {
     load_config(fname)$missing$badkey,
     "missing"
   )
+
+  expect_error(
+    load_config(fname_bad),
+    "yaml::read_yaml"
+    )
 })
 
 test_that("as_evaled_expression works", {

diff --git a/R/pkgs/hospitalization/DESCRIPTION b/R/pkgs/hospitalization/DESCRIPTION
@@ -23,3 +23,4 @@ Description: Generate hospitalization scenarios corresponding to the infection s
 License: What license it uses
 Encoding: UTF-8
 LazyData: true
+RoxygenNote: 7.1.0
diff --git a/R/pkgs/hospitalization/NAMESPACE b/R/pkgs/hospitalization/NAMESPACE
@@ -1,4 +1,5 @@
-# Generated by roxygen2: fake comment so roxygen2 overwrites silently.
-exportPattern("^[^\\.]")
+# Generated by roxygen2: do not edit by hand
 
-importFrom(foreach,"%dopar%")
+export(build_hospdeath_geoid_fixedIFR_par)
+export(build_hospdeath_par)
+export(create_delay_frame)
diff --git a/R/pkgs/hospitalization/R/hospdeath.R b/R/pkgs/hospitalization/R/hospdeath.R
@@ -108,7 +108,7 @@ create_delay_frame <- function(data, name, local_config){
       geoid %in% all_geoids
     ) %>%
     dplyr::arrange(geoid,time) %>%
-    ungroup()
+    dplyr::ungroup()
 
   data <- dplyr::arrange(data,geoid,time)
 
@@ -120,6 +120,9 @@ create_delay_frame <- function(data, name, local_config){
   return(data)
 }
 
+
+
+
 hosp_create_delay_frame <- function(X, p_X, data_, X_pars, varname) {
     X_ <- rbinom(length(data_[[X]]),data_[[X]],p_X)
     rc <- data.table::data.table(
@@ -237,9 +240,11 @@ build_hospdeath_par <- function(p_hosp,
                                 time_ventdur_pars = log(17),
                                 cores=8,
                                 root_out_dir='hospitalization',
-                                use_parquet = FALSE) {
+                                use_parquet = FALSE,
+                                start_sim = 1,
+                                num_sims = -1) {
 
-  n_sim <- length(list.files(data_dir))
+  n_sim <- ifelse(num_sims < 0, length(list.files(data_dir)), num_sims)
   print(paste("Creating cluster with",cores,"cores"))
   doParallel::registerDoParallel(cores)
 
@@ -251,7 +256,8 @@ build_hospdeath_par <- function(p_hosp,
 
   pkgs <- c("dplyr", "readr", "data.table", "tidyr", "hospitalization")
   foreach::foreach(s=seq_len(n_sim), .packages=pkgs) %dopar% {
-    dat_ <- hosp_load_scenario_sim(data_dir,s,
+    sim_id <- start_sim + s - 1
+    dat_ <- hosp_load_scenario_sim(data_dir,sim_id,
                                    keep_compartments = "diffI",
                                    geoid_len = 5,
                                    use_parquet = use_parquet) %>%
@@ -293,24 +299,22 @@ build_hospdeath_par <- function(p_hosp,
       mutate(date_inds = as.integer(time - min(time) + 1),
              geo_ind = as.numeric(as.factor(geoid))) %>%
       arrange(geo_ind, date_inds) %>%
-      group_by(geo_ind) %>%
-      group_map(function(.x,.y){
+      split(.$geo_ind) %>%
+      purrr::map_dfr(function(.x){
         .x$hosp_curr <- cumsum(.x$incidH) - lag(cumsum(.x$incidH),
                                                 n=R_delay_,default=0)
         .x$icu_curr <- cumsum(.x$incidICU) - lag(cumsum(.x$incidICU),
                                                  n=ICU_dur_,default=0)
         .x$vent_curr <- cumsum(.x$incidVent) - lag(cumsum(.x$incidVent),
                                                    n=Vent_dur_)
-        .x$geo_ind <- .y$geo_ind
         return(.x)
       }) %>%
-      do.call(what=rbind) %>%
       replace_na(
         list(vent_curr = 0,
              icu_curr = 0,
              hosp_curr = 0)) %>%
       arrange(date_inds, geo_ind)
-    write_hosp_output(root_out_dir, data_dir, dscenario_name, s, res, use_parquet)
+    write_hosp_output(root_out_dir, data_dir, dscenario_name, sim_id, res, use_parquet)
     NULL
   }
   doParallel::stopImplicitCluster()
@@ -351,9 +355,11 @@ build_hospdeath_geoid_fixedIFR_par <- function(
   time_ventdur_pars = log(17),
   cores=8,
   root_out_dir='hospitalization',
-  use_parquet = FALSE
+  use_parquet = FALSE,
+  start_sim = 1,
+  num_sims = -1
 ) {
-  n_sim <- length(list.files(data_dir))
+  n_sim <- ifelse(num_sims < 0, length(list.files(data_dir)), num_sims)
   print(paste("Creating cluster with",cores,"cores"))
   doParallel::registerDoParallel(cores)
 
@@ -370,7 +376,8 @@ build_hospdeath_geoid_fixedIFR_par <- function(
 
   pkgs <- c("dplyr", "readr", "data.table", "tidyr", "hospitalization")
   foreach::foreach(s=seq_len(n_sim), .packages=pkgs) %dopar% {
-    dat_I <- hosp_load_scenario_sim(data_dir,s,
+    sim_id <- start_sim + s - 1
+    dat_I <- hosp_load_scenario_sim(data_dir, sim_id,
                                    keep_compartments = "diffI",
                                    geoid_len=5,
                                    use_parquet = use_parquet) %>%
@@ -384,6 +391,11 @@ build_hospdeath_geoid_fixedIFR_par <- function(
       left_join(prob_dat, by="geoid")
 
     # Add time things
+    dat_Mild <- hosp_create_delay_frame('incidI',
+                                     dat_$p_mild_inf,
+                                     dat_,
+                                     c(-Inf, 0), # we dont want a delay here, so this is the easiest way
+                                     "Mild") 
     dat_H <- hosp_create_delay_frame('incidI',
                                      dat_$p_hosp_inf_scaled,
                                      dat_,
@@ -404,13 +416,14 @@ build_hospdeath_geoid_fixedIFR_par <- function(
     ICU_dur_ <- round(exp(time_ICUdur_pars[1]))
     Vent_dur_ <- round(exp(time_ventdur_pars[1]))
 
-    stopifnot(is.data.table(dat_I) && is.data.table(dat_H) && is.data.table(data_ICU) && is.data.table(data_Vent) && is.data.table(data_D))
+    stopifnot(is.data.table(dat_I) && is.data.table(dat_Mild) && is.data.table(dat_H) && is.data.table(data_ICU) && is.data.table(data_Vent) && is.data.table(data_D))
 
     # Using `merge` instead of full_join for performance reasons
     res <- Reduce(function(x, y, ...) merge(x, y, all = TRUE, ...),
-                  list(dat_I, dat_H, data_ICU, data_Vent, data_D)) %>%
+                  list(dat_I, dat_Mild, dat_H, data_ICU, data_Vent, data_D)) %>%
       replace_na(
         list(incidI = 0,
+             incidMild = 0,
              incidH = 0,
              incidICU = 0,
              incidVent = 0,
@@ -423,25 +436,23 @@ build_hospdeath_geoid_fixedIFR_par <- function(
       mutate(date_inds = as.integer(time - min(time) + 1),
              geo_ind = as.numeric(as.factor(geoid))) %>%
       arrange(geo_ind, date_inds) %>%
-      group_by(geo_ind) %>%
-      group_map(function(.x,.y){
+      split(.$geo_ind) %>%
+      purrr::map_dfr(function(.x){
         .x$hosp_curr <- cumsum(.x$incidH) - lag(cumsum(.x$incidH),
                                                 n=R_delay_,default=0)
         .x$icu_curr <- cumsum(.x$incidICU) - lag(cumsum(.x$incidICU),
                                                  n=ICU_dur_,default=0)
         .x$vent_curr <- cumsum(.x$incidVent) - lag(cumsum(.x$incidVent),
                                                    n=Vent_dur_)
-        .x$geo_ind <- .y$geo_ind
         return(.x)
       }) %>%
-      do.call(what=rbind) %>%
       replace_na(
         list(vent_curr = 0,
              icu_curr = 0,
              hosp_curr = 0)) %>%
       arrange(date_inds, geo_ind)
 
-    write_hosp_output(root_out_dir, data_dir, dscenario_name, s, res, use_parquet)
+    write_hosp_output(root_out_dir, data_dir, dscenario_name, sim_id, res, use_parquet)
     NULL
   }
   doParallel::stopImplicitCluster()

diff --git a/R/pkgs/report_generation/DESCRIPTION → R/pkgs/report.generation/DESCRIPTION b/R/pkgs/report_generation/DESCRIPTION → R/pkgs/report.generation/DESCRIPTION
diff --git a/R/pkgs/report_generation/NAMESPACE → R/pkgs/report.generation/NAMESPACE b/R/pkgs/report_generation/NAMESPACE → R/pkgs/report.generation/NAMESPACE
@@ -24,6 +24,7 @@ export(make_scn_time_summary_table)
 export(make_scn_time_summary_table_withVent)
 export(plot_event_time_by_geoid)
 export(plot_geounit_attack_rate_map)
+export(plot_geounit_map)
 export(plot_hist_incidHosp_state)
 export(plot_line_hospPeak_time_county)
 export(plot_model_vs_obs)
@@ -35,4 +36,3 @@ export(plot_ts_incid_inf_state_sample)
 export(print_pretty_date)
 export(print_pretty_date_short)
 export(reference_chunk)
-export(setup_testing_environment)
diff --git a/R/pkgs/report_generation/R/DataLoadFuncs.R → R/pkgs/report.generation/R/DataLoadFuncs.R b/R/pkgs/report_generation/R/DataLoadFuncs.R → R/pkgs/report.generation/R/DataLoadFuncs.R
@@ -118,7 +118,7 @@ load_scenario_sims_filtered <- function(scenario_dir,
 ##' with pre and post filters
 ##' 
 ##' @param scenario_dir the subdirectory containing this scenario
-##' @param name_filter function that 
+##' @param name_filter string that indicates which pdeath level to import (from the hosp filename) 
 ##' @param post_process function that does processing after 
 ##' @param geoid_len in defined, this we want to make geoids all the same length
 ##' @param padding_char character to add to the front of geoids if fixed length
@@ -172,8 +172,8 @@ load_hosp_sims_filtered <- function(scenario_dir,
 
     read_file(files[i]) %>%
       padfn %>%
-      post_process(...) %>%
-      mutate(sim_num = i)
+      mutate(sim_num = i) %>%
+      post_process(...) 
   }
 
   rc<- dplyr::bind_rows(rc)