Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jagstargets: reproducible JAGS pipelines at scale #425

Closed
11 of 27 tasks
wlandau opened this issue Feb 2, 2021 · 30 comments
Closed
11 of 27 tasks

jagstargets: reproducible JAGS pipelines at scale #425

wlandau opened this issue Feb 2, 2021 · 30 comments
Assignees

Comments

@wlandau
Copy link

wlandau commented Feb 2, 2021

Date accepted: 2021-09-27
Submitting Author: Will Landau (@wlandau)
Repository: https://github.com/wlandau/jagstargets
Version submitted: 0.0.0.9000
Editor: @noamross
Reviewers: @dill, @ernestguevarra

Due date for @dill: 2021-05-11

Due date for @ernestguevarra: 2021-06-15
Archive: TBD
Version accepted: TBD


  • Paste the full DESCRIPTION file inside a code block below:
Package: jagstargets
Title: Targets for JAGS Workflows
Description: Bayesian data analysis usually incurs long runtimes
  and cumbersome custom code. A pipeline toolkit tailored to Bayesians,
  the 'jagstargets' R package is leverages
  'targets' and 'R2jags' to ease this burden.
  'jagstargets' makes it super easy to set up useful scalable
  JAGS pipelines that automatically parallelize the computation
  and skip expensive steps when the results are already up to date.
  Minimal custom code is required, and there is no need to manually
  configure branching, so usage is much easier than 'targets' alone.
  For the underlying methodology, please refer
  to the documentation of 'targets' <doi:10.21105/joss.02959> and 'JAGS'
  (Plummer 2003) <https://www.r-project.org/conferences/DSC-2003/Proceedings/Plummer.pdf>.
Version: 0.0.0.9000
License: MIT + file LICENSE
URL: https://wlandau.github.io/jagstargets/, https://github.com/wlandau/jagstargets
BugReports: https://github.com/wlandau/jagstargets/issues
Authors@R: c(
  person(
    given = c("William", "Michael"),
    family = "Landau",
    role = c("aut", "cre"),
    email = "will.landau@gmail.com",
    comment = c(ORCID = "0000-0003-1878-3253")
  ),
  person(
    family = "Eli Lilly and Company",
    role = "cph"
  ))
Depends:
  R (>= 3.5.0)
Imports:
  coda (>= 0.19.4),
  digest (>= 0.6.27),
  fst (>= 0.9.4),
  posterior (>= 0.1.3),
  purrr (>= 0.3.4),
  qs (>= 0.14.1),
  R2jags (>= 0.6.1),
  rjags (>= 4.10),
  rlang (>= 0.4.8),
  stats,
  targets (>= 0.0.1),
  tarchetypes (>= 0.0.1),
  tibble (>= 3.0.4),
  tools,
  utils,
  withr (>= 2.3.0),
Suggests:
  dplyr (>= 1.0.2),
  fs (>= 1.5.0),
  knitr (>= 1.28),
  R.utils (>= 2.10.1),
  rmarkdown (>= 2.1),
  testthat (>= 3.0.0),
  tidyr (>= 1.1.2),
  visNetwork (>= 2.0.9)
Remotes:
  stan-dev/posterior
SystemRequirements: JAGS 4.x.y (http://mcmc-jags.sourceforge.net)
Encoding: UTF-8
Language: en-US
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
VignetteBuilder: knitr
Config/testthat/edition: 3

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

jagstargets leverages the existing workflow automation capabilities of targets to orchestrate computation and skip up-to-date tasks in Bayesian data analysis pipelines. jagstargets reduces the burden of user-side custom code that targets would otherwise require, which helps free statisticians to focus more on the models and less on the software engineering.

  • Who is the target audience and what are scientific applications of this package?

jagstargets is for Bayesian statisticians who develop and run JAGS models. Example workflows range from individual analyses of clinical data to large-scale simulation-based calibration studies for validation.

targets already provides the same kind of workflow automation, but it requires more custom code to set up a workflow. jagstargets uses specialized domain knowledge to make this process easier. Packages rjags and R2jags interface with JAGS but provide little workflow automation.

N/A

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

N/A

Technical checks

Confirm each of the following by checking the box.

This package:

The continuous integration is temporarily having some issues because tarchetypes was just accepted to CRAN and the binaries are not yet available. (Should be fixed now.)

Publication options

  • Do you intend for this package to go on CRAN?

This package depends on posterior, which is not yet on CRAN.

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

I have prepared a JOSS manuscript, and I intend to submit it if jagstargets clears rOpenSci software review.

Code of conduct

@noamross
Copy link
Contributor

@ropensci-review-bot assign @noamross as editor

@ropensci-review-bot
Copy link
Collaborator

Assigned! @noamross is now the editor

@noamross
Copy link
Contributor

Hi @wlandau! Running editor checks, I am able to fully test the package but have the following issue checking the package using devtools::check():

✓  checking for file ‘/home/noamross/reviews/jagstargets/DESCRIPTION’ ...
─  preparing ‘jagstargets’:
✓  checking DESCRIPTION meta-information ...
─  installing the package to build vignettes
E  creating vignettes (30.7s)
   --- re-building ‘mcmc.Rmd’ using rmarkdown
   Quitting from lines 284-285 (mcmc.Rmd) 
   Error: processing vignette 'mcmc.Rmd' failed with diagnostics:
   callr subprocess failed: jags_file is not an existing file.
   Visit https://books.ropensci.org/targets/debugging.html for debugging advice.
   --- failed re-building ‘mcmc.Rmd’
   
   --- re-building ‘mcmc_rep.Rmd’ using rmarkdown
   Quitting from lines 354-355 (mcmc_rep.Rmd) 
   Error: processing vignette 'mcmc_rep.Rmd' failed with diagnostics:
   callr subprocess failed: jags_file is not an existing file.
   Visit https://books.ropensci.org/targets/debugging.html for debugging advice.
   --- failed re-building ‘mcmc_rep.Rmd’
   
   SUMMARY: processing the following files failed:
     ‘mcmc.Rmd’ ‘mcmc_rep.Rmd’

This appears to occur the visNetwork() call in the vignettes (commenting them out allows the vignettes to build). Otherwise all tests pass and I am running JAGS 4.3.0.

@wlandau
Copy link
Author

wlandau commented Apr 14, 2021

Sorry about that, should be fixed now.

@noamross
Copy link
Contributor

Editor checks:

  • Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package.
  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via a CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly

Thank you will. All tests are running fine now, and your goodpractice::gp() diagnostics are all clear! I am seeking reviewers.

@noamross
Copy link
Contributor

@ropensci-review-bot add @dill to reviewers

@ropensci-review-bot
Copy link
Collaborator

@dill added to the reviewers list. Review due date is 2021-05-11. Thanks @dill for accepting to review! Please refer to our reviewer guide.

@dill
Copy link

dill commented Apr 29, 2021

hi @wlandau, I'm just getting started here and in installation I see that currently targets has been archived on CRAN (which I assume you're already aware of) so I have installed the version from ropensci/targets. Is that the most appropriate version to work from?

@wlandau
Copy link
Author

wlandau commented Apr 29, 2021

Thanks for reviewing, @dill. Development targets (from https://github.com/ropensci/targets) should be fine while we wait for CRAN to review targets version 0.4.2, which I submitted last weekend.

@dill
Copy link

dill commented May 6, 2021

Package Review

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples (that run successfully locally) for all exported functions
    • devtools::run_examples()
    • devtools::run_examples(run_donttest=TRUE)
    • devtools::run_examples(run_donttest=TRUE, run_dontrun=TRUE)
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.

I was using a 2020 MacBook Air (Intel) running macOS 10.15.7 (Catalina). Here is the output of sessionInfo():

> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.4

I tried:

remotes::install_github("wlandau/jagstargets", dependencies=TRUE)

but (2021/04/29) targets had been removed from CRAN, so this failed with error:

ERROR: dependencies 'targets', 'tarchetypes' are not available for package 'jagstargets'

So instead I installed with

remotes::install_github("wlandau/targets", dependencies=TRUE)
remotes::install_github("ropensci/tarchetypes", dependencies=TRUE)
remotes::install_github("wlandau/jagstargets", dependencies=TRUE)

This worked fine.

2021-05-06 I tried using install_github() as above again and had no problems.

  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests:
    • Unit tests cover essential functions of the package and a reasonable range of inputs and conditions.
      • goodpractice::gp reports: 17% of code lines are covered by test cases (this excludes running vignettes).
      • covr::package_coverage(type="all") reports: 99.57% coverage 👍
    • All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing: 9 (this is high as it includes reading the rOpenSci review documentation)

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

For some background on where I'm coming from with this review: I am a statistician, but I work primarily with ecologists (who tend to be pretty keen on JAGS and JAGS-like programming), so I have some experience with these kinds of workflows. I don't have experience with targets, so that is new to me. I'm not an RStudio user, so this is based on using R from the command line or within Nvim-R, I doubt this makes any difference but just an FYI.

General comments

I realise this is probably a daunting number of comments. In general jagstargets is an awesome piece of work! Most of my comments are aimed at making the package more accessible for people like me who want to get started using jagstargets. This was a fun package to review, and I learnt a lot. I hope the comments below are useful for you!

README.md

The first two paragraphs give a summary of the package but they feel like they are written for people who already know what targets does (e.g., "factories", "targets", "pipelines" are all jargon to begin with; though I appreciated the links to the blog post, I can see this turning people off right away). Perhaps there could be some adaptation so that someone who is a JAGS user can immediately understand how the package will save them time and headaches. Maybe even a "for JAGS users" and a "for people familiar with targets" section might help? (FWIW, I found the text in ?jagstargets-package to be a bit more transparent.)

Testing

  • As stated above, all tests ran on my machine post installation. They took about 8s.
  • Tests didn't use context() to provide extra information on what exactly was being tested each time. Filenames were semi-informative, but might be better to have more info there for contributors. This might be more useful if tests are expanded further.
  • Not much in the way of comments in the test files. Some are self-explanatory but the large tests are harder to follow.

MCMC pipelines vignette

  • Could this be renamed to "Getting started with..." or "An introduction to..." to make it clear this is where to start?
  • It might be nice to have a bit of table setting at the start of this vignette to talk about the kinds of outputs we want, even showing how an existing JAGS workflow works and how that maps into jagstargets (and then you can say how similar this is to the DAG generated by tar_visnetwork? In order to get people on board I think it's necessary to show clear mapping between their current workflow and "doing something fancy" and a bit of intro here might help that happen.
  • I found the list of targets a little confusing initially (around here). I wasn't sure where I was supposed to be looking for those objects and what was going on. Perhaps running tar_manifest() first, showing the effect of the tar_jags call, then explaining what these entries mean might make that easier to understand? Does the tar_jags call have to be wrapped in list() in that chunk (if so maybe explain why).
  • The last part of the vignette starts talking about a second model y.jags though we don't define that. Is it worth make just a quick note here that tar_visnetwork doesn't check that the model files exist?

Scaling MCMC pipelines vignette

  • Might be worth having something up top saying to start with the other vignette first?
  • There could be a little more chat at the start of this vignette explaining that you want to find out about how well model.jags estimates alpha and beta[].
  • This is a bit of an odd model (in my opinion at least!) in that you have fixed the covariate values and have beta be a random effect (you generate n random slopes), rather than a fixed beta with random data each time. I wonder if a classic linear model situation (where data vary and there is only one beta) would be a clearer situation for people to think about? It also has the advantage of producing much less output, as well as having the same DGP as the other vignette.
  • With my stats hat on, I'd say write out the equations for the model/DGP, so it's super-clear what's happening here.
  • There is some setup in the chunks here that isn't explained (e.g., options(crayon.enabled = FALSE), tar_option_set(memory = "transient", garbage_collection = TRUE) here that is worth explaining to novice users like myself. Perhaps this is only necessary for this vignette format, but it would be good to know that if so.
  • I found the _targets.R definition hard to follow, particularly the arguments to tar_jags_rep_summary() were hard to relate to what is going on...
    • The terminology could be explained here at the same time: "branches", "replications", "batches" and "iterations" all appear almost at once. I think I've figured out what they mean by the end, but it would be nice to have that clear as those terms are and why we use them here. This could even be woven into the first 1-2 intro paragraphs?
  • The names of the nodes in these graphs are hard-going (this was less obvious above when the model was simpler). model_model vs. model vs. model_lines_model: these make sense once you know what you're doing I'm sure but I found them difficult to keep track of (see comment in "Package code" below). Worth pointing people to the documentation in ?tar_jags to explain the naming convention at this point.
  • Is there any useful information that can be gleaned from the .rep column? It looks like a hash, can it tell us anything re: reproducibility or is it just for internal bookkeeping?

Trying this out for myself

I tried to construct a comparison between mgcv::gam and mgcv::jagam using jagstargets. You can find the resulting code here. Mostly I found this fairly intuitive once I got my head around the above. tar_visnetwork() was essential to debugging. Feel no obligation to look at this but it might be useful to see what weird things people do when working from scratch and seeing where some bottlenecks are.

Package code

I can't offer too much comment here as the majority of the code is calls to targets which I am not really qualified to comment on in any depth. In general I found the code clear and understandable throughout, even though I don't know the ins and outs of targets.

  • Per @wlandau's issue at Improve error messages when prep steps fail ropensci-review-tools/goodpractice#129 (which turned up when I I google'd the error), I was unable to get all checks running (though I don't have the same issue with other packages?). This happened with both the CRAN and github versions of goodpractice, with the following message (obvs, this is not necessarily a problem with jagstargets but I thought I would report this here anyway):
Warning messages:
1: In MYPREPS[[prep]](state, quiet = quiet) :
  Prep step for cyclomatic complexity failed.
2: In MYPREPS[[prep]](state, quiet = quiet) :
  Prep step for description failed.
3: In MYPREPS[[prep]](state, quiet = quiet) :
  Prep step for namespace failed.
  • If jagstargets-package is a landing page, perhaps it could contain a link back to the pkgdown site, and perhaps a @seealso to the most useful starting manpage(s) (tar_jags?)
  • tar_jags_example_data.R should explain the data generating process, as well as what the components of the returned object.
  • Naming scheme for targets: are there more descriptive names for the x_file_y and x_lines_y? I don't have stellar suggestions on replacements but "file" and "lines" feel very generic and like they might get confused in a large workflow? The other labels seem very clear though.
  • Following-on from this, it is possible to combine the x_file_y and x_lines_y targets? I see in the stantargets vignette there doesn't seem to be a "compile" step, just the x_file_y then x_mcmc_y -- is there something that is different here that means we need both steps to be separate?

Issues raised/pull requests

@wlandau
Copy link
Author

wlandau commented May 9, 2021

Thanks for your review, @dill. Your feedback helped clean up the documentation and make it more accessible to new users.

but (2021/04/29) targets had been removed from CRAN, so this failed with error:

targets is back up on CRAN, and I resubmitted tarchetypes on May 4. If all goes well, it may be accepted next week.

covr::package_coverage(type="all") reports: 99.57% coverage

That's odd, https://app.codecov.io/gh/wlandau/jagstargets and my local machines both report 100% coverage. Do you happen to know if any tests were skipped due to missing optional packages?

The first two paragraphs give a summary of the package but they feel like they are written for people who already know what targets does (e.g., "factories", "targets", "pipelines" are all jargon to begin with; though I appreciated the links to the blog post, I can see this turning people off right away). Perhaps there could be some adaptation so that someone who is a JAGS user can immediately understand how the package will save them time and headaches. Maybe even a "for JAGS users" and a "for people familiar with targets" section might help? (FWIW, I found the text in ?jagstargets-package to be a bit more transparent.)

In ropensci/jagstargets@b6d12e3, I rewrote the introductory paragraph and moved the technical details to a later section of the README.

Tests didn't use context() to provide extra information on what exactly was being tested each time. Filenames were semi-informative, but might be better to have more info there for contributors. This might be more useful if tests are expanded further.

testthat::context() is superseded, and I am trying to follow the updated recommendation to use file names as context.

Not much in the way of comments in the test files. Some are self-explanatory but the large tests are harder to follow.

ropensci/jagstargets@6528f14 inserts more inline prose to explain the intent of the tests.

Could this be renamed to "Getting started with..." or "An introduction to..." to make it clear this is where to start?

Sure, implemented in ropensci/jagstargets@885cce7.

It might be nice to have a bit of table setting at the start of this vignette to talk about the kinds of outputs we want, even showing how an existing JAGS workflow works and how that maps into jagstargets (and then you can say how similar this is to the DAG generated by tar_visnetwork? In order to get people on board I think it's necessary to show clear mapping between their current workflow and "doing something fancy" and a bit of intro here might help that happen.

I am not sure I fully understand, but ropensci/jagstargets@63ad892 does work a little harder to put the targets pipeline in the context of an ordinary JAGS workflow. Please let me know if this different from what you have in mind.

I found the list of targets a little confusing initially (around here). I wasn't sure where I was supposed to be looking for those objects and what was going on. Perhaps running tar_manifest() first, showing the effect of the tar_jags call, then explaining what these entries mean might make that easier to understand?

ropensci/jagstargets@0ef2c23 rewords that section to try to be more helpful.

Does the tar_jags call have to be wrapped in list() in that chunk (if so maybe explain why).

list() is not required in this specific instance, but it underscores that _targets.R should end with a list of targets in the general case. ropensci/jagstargets@e371d0f adds a comment to clarify this. (Returning a single target is also acceptable but not very common.)

The last part of the vignette starts talking about a second model y.jags though we don't define that. Is it worth make just a quick note here that tar_visnetwork doesn't check that the model files exist?

tar_jags() (and thus tar_visnetwork()) does error out if the JAGS file does not exist. (The original vignette wrote the model in a hidden code chunk.) ropensci/jagstargets@cbe845e exposes the new model to the reader.

Might be worth having something up top saying to start with the other vignette first?

Added a mention of the intro vignette.

There could be a little more chat at the start of this vignette explaining that you want to find out about how well model.jags estimates alpha and beta[].

ropensci/jagstargets@c5af606 adds a comment near the beginning to clarify a bit about what I mean by "validation".

This is a bit of an odd model (in my opinion at least!) in that you have fixed the covariate values and have beta be a random effect (you generate n random slopes), rather than a fixed beta with random data each time. I wonder if a classic linear model situation (where data vary and there is only one beta) would be a clearer situation for people to think about? It also has the advantage of producing much less output, as well as having the same DGP as the other vignette.

ropensci/jagstargets@c5af606 changes the model to have one slope and one intercept. My intention with the different model was to demonstrate that jagstargets can handle non-scalar parameters when it comes to simulation studies that look at coverage in posterior intervals.

With my stats hat on, I'd say write out the equations for the model/DGP, so it's super-clear what's happening here.

Added in ropensci/jagstargets@057c47c.

There is some setup in the chunks here that isn't explained (e.g., options(crayon.enabled = FALSE), tar_option_set(memory = "transient", garbage_collection = TRUE) here that is worth explaining to novice users like myself. Perhaps this is only necessary for this vignette format, but it would be good to know that if so.

options(crayon.enabled = FALSE) is internal to the vignette. It makes sure printed output is rendered properly in the R Markdown document, and it does not appear in the rendered HTML document. ropensci/jagstargets@4ddaf44 adds a comment to indicate that the other options help targets behave more efficiently with computer memory. Those options are explained in more detail in ?targets::tar_option_set.

I found the _targets.R definition hard to follow, particularly the arguments to tar_jags_rep_summary() were hard to relate to what is going on...
The terminology could be explained here at the same time: "branches", "replications", "batches" and "iterations" all appear almost at once. I think I've figured out what they mean by the end, but it would be nice to have that clear as those terms are and why we use them here. This could even be woven into the first 1-2 intro paragraphs?

Hopefully ropensci/jagstargets@586f75e clears that up.

The names of the nodes in these graphs are hard-going (this was less obvious above when the model was simpler). model_model vs. model vs. model_lines_model: these make sense once you know what you're doing I'm sure but I found them difficult to keep track of (see comment in "Package code" below). Worth pointing people to the documentation in ?tar_jags to explain the naming convention at this point.

From the documentation of tar_jags()

#'   The target names use the `name` argument as a prefix, and the individual
#'   elements of `jags_files` appear in the suffixes where applicable.
#'   As an example, the specific target objects returned by
#'   `tar_jags(name = x, jags_files = "y.jags", ...)` returns a list
#'   of `targets::tar_target()` objects:

Still takes getting used to though, I know.

Is there any useful information that can be gleaned from the .rep column? It looks like a hash, can it tell us anything re: reproducibility or is it just for internal bookkeeping?

Yes, it is just a hash to distinguish among individual simulation replications. Dynamic branching in targets automatically mashes all the reps together in the same data frame, and having a rep ID helps if the user wants to do custom post-processing within reps, e.g. tar_read(model) %>% group_by(.rep) %>% group_modify(~derive_special_parameter(.x)) %>% ....

I tried to construct a comparison between mgcv::gam and mgcv::jagam using jagstargets. You can find the resulting code here. Mostly I found this fairly intuitive once I got my head around the above. tar_visnetwork() was essential to debugging. Feel no obligation to look at this but it might be useful to see what weird things people do when working from scratch and seeing where some bottlenecks are.

Awesome! Thanks for sharing.

If jagstargets-package is a landing page, perhaps it could contain a link back to the pkgdown site, and perhaps a @Seealso to the most useful starting manpage(s) (tar_jags?)

Added in ropensci/jagstargets@240d5cd.

tar_jags_example_data.R should explain the data generating process, as well as what the components of the returned object.

ropensci/jagstargets@3a8fb67 adds a @details section to explain the data-generating process (drawing from the prior predictive distribution). tar_jags_example_data() has a @format section to explain the elements of the returned JAGS data list.

Naming scheme for targets: are there more descriptive names for the x_file_y and x_lines_y? I don't have stellar suggestions on replacements but "file" and "lines" feel very generic and like they might get confused in a large workflow? The other labels seem very clear though.
Following-on from this, it is possible to combine the x_file_y and x_lines_y targets? I see in the stantargets vignette there doesn't seem to be a "compile" step, just the x_file_y then x_mcmc_y -- is there something that is different here that means we need both steps to be separate?

All this has to do with how jagstargets and stantargets handle high-performance computing. Some users just run models on their local laptops, while others distribute multiple models on computing clusters. On clusters, some users configure jobs to run at the root directory of the project, while others use node-specific storage. And eventually (hopefully) the parallel backends of targets will be able to submit jobs to the cloud, and those environments will not have direct access to the original JAGS model files the users start with.

jagstargets is the simpler case. All we need to do is:

  1. Hash the upstream JAGS model file and watch it for changes.
  2. Make sure parallel workers have access to the JAGS model even if they cannot access the original model file.

I solve (2) by reading the actual lines of text of the JAGS model file in a separate target. A downstream MCMC target writes these lines to a new temporary JAGS model file and runs the temporary file instead of the user's original file. Because of the way file tracking works in targets, (1) and (2) need to be separate targets and cannot be combined into one.

stantargets is more complicated because Stan models require compilation. On systems where parallel workers do not share storage, we need the "lines" trick so the workers have access to the model, and that means each worker needs to compile its own copy of the model. But compilation takes a lot of time, so on systems that let parallel workers share storage efficiently, stantargets should pre-compile each model once ahead of time and then let the workers run in parallel on the pre-compiled models. The compile argument allows the user to choose when to compile models, and this argument changes which targets are created and what they do.

I hope this clarifies the roles of the "file" and "lines" targets. They do need to be separate, and I do think these names accurately describe what the targets do, even if the specifics are esoteric for most users. Happy to consider specific suggestions for alternative names if you have any.

@dill
Copy link

dill commented May 11, 2021

Thanks for the changes and explanations @wlandau!

I pulled 3a8fb6775e1a1e635846f827664af152b007bef1 before addressing the points below.

  • coverage: I looked at the report and it says tar_jags_df.R lines 68:70 are not covered (relating to tar_jags_df_summary()), so that doesn't look like an optional package issue?
  • vignette additions: obviously I know more now than I did when I first looked at these but I think they've really helped clarify what's going on, thanks for adding them! Your additions to the intro vignette mapping the steps between JAGS and jagstargets are exactly what I was looking for.

Thanks also for the explanations of the JAGS compilation process vs. Stan. I've got a much better understanding of how and why that works now. I had a think about other names for "file" and "lines" and didn't come up with anything less ambiguous (e.g., "source" but that could be used for either option) that wasn't very ugly (e.g., "jagsfile"/"jagslines"). I guess I'm just thinking about a very specific use case where I might compare similar models fitted in JAGS vs Stan and how I can tell which bit of the model is which but I guess the solution there is naming the previous targets better rather than the template. I'm happy to let that drop!

@wlandau
Copy link
Author

wlandau commented May 12, 2021

coverage: I looked at the report and it says tar_jags_df.R lines 68:70 are not covered (relating to tar_jags_df_summary()), so that doesn't look like an optional package issue?

Thanks, that was a big hint. Hopefully ropensci/jagstargets@58621cb has better coverage.

I guess I'm just thinking about a very specific use case where I might compare similar models fitted in JAGS vs Stan and how I can tell which bit of the model is which

Yeah, jagstargets and stantargets remove file extensions when they compute target names. So users who call both in the same pipeline would need to disambiguate the model files in other ways. Not ideal for that specific use case, but targets lets you know pretty quickly when there are name conflicts. On my end, changing the name policy is always a risk because it invalidates everyone's workflows so they have to rerun those targets.

@dill
Copy link

dill commented May 12, 2021

Re-running covr::package_coverage(type="all") on ropensci/jagstargets@58621cb gives 100% coverage 🎉

That's helpful extra info on name conflicts, thanks for that!

@noamross
Copy link
Contributor

@ropensci-review-bot add @ernestguevarra to reviewers

@ropensci-review-bot
Copy link
Collaborator

@ernestguevarra added to the reviewers list. Review due date is 2021-06-15. Thanks @ernestguevarra for accepting to review! Please refer to our reviewer guide.

@noamross
Copy link
Contributor

@ernestguevarra Your review was due 2021-06-15. Please let us know when you are able to get it in.

@ernestguevarra
Copy link

ernestguevarra commented Jul 14, 2021

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally

On running the workflow in the worked example in the vignette entitled "Bayesian simulation pipelines with jagstargets" that adds a target called coverage, I see the following errors:

Error: callr subprocess failed: could not find function "%>%"
Error: callr subprocess failed: could not find function "summarize"

In the _targets.R file, I added the following call:

library(dplyr)

to be able to get the updated workflow running.

When I reviewed simulation.Rmd which is the Rmd for this vignette, I noted that in the setup R code chunk, calls to load the following packages were made:

library(R2jags)
library(dplyr)
library(targets)
library(jagstargets)

I am unsure if this is correct thinking on my part but I am guessing because {dplyr} is loaded in the setup code chunk of the vignette that when building the vignette, the specific example is not erroring but when I run the example outside of the Rmd that the workflow errors with the message above? If so, would it be appropriate to add a call to load the package {dplyr} in the _targets.R file example as follows:

# _targets.R
library(targets)
library(jagstargets)
library(dplyr)

generate_data <- function (n = 10L) {
  beta <- stats::rnorm(n = 2, mean = 0, sd = 1)
  x <- seq(from = -1, to = 1, length.out = n)
  y <- stats::rnorm(n, beta[1] + x * beta[2], 1)
  # Elements of .join_data get joined on to the .join_data column
  # in the summary output next to the model parameters
  # with the same names.
  .join_data <- list(beta = beta)
  list(n = n, x = x, y = y, .join_data = .join_data)
}

list(
  tar_jags_rep_summary(
    model,
    "model.jags",
    data = generate_data(),
    parameters.to.save = "beta",
    batches = 5, # Number of branch targets.
    reps = 4, # Number of model reps per branch target.
    variables = "beta",
    summaries = list(
      ~posterior::quantile2(.x, probs = c(0.025, 0.975))
    )
  ),
  tar_target(
    coverage,
    model %>%
      group_by(variable) %>%
      summarize(
        coverage = mean(q2.5 < .join_data & .join_data < q97.5),
        .groups = "drop"
      )
  )
)

I am a newbie with using targets and these are common errors that I encounter also and my usual approach will be to make a call within the targets workflow to load the package that is causing the error. Whether this is the correct approach, I am not sure but I am flagging it here as others who would follow the example may face the same errors I saw/faced.

  • Function Documentation: for all exported functions
  • Examples (that run successfully locally) for all exported functions
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.

On install of {jagstargets} using devtools::install(dependencies = TRUE) from within the project, I see the following error:

Error: object ‘tar_assert_chr’ is not exported by 'namespace:targets'

Warning message:
In i.p(...) :
  installation of package ‘/tmp/RtmpsW4y3O/file615c3eabc513/tarchetypes_0.2.1.9000.tar.gz’ had non-zero exit status

But {tarchetypes} package was installed. I think this is more an issue with {tarchetypes} maybe?

  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.

On covr::package_coverage(), I see the following result:

jagstargets Coverage: 14.46%
R/tar_jags_rep.R: 0.00%
R/utils_language.R: 0.00%
R/tar_jags.R: 1.72%
R/tar_jags_rep_summary.R: 2.44%
R/tar_jags_rep_dic.R: 2.63%
R/tar_jags_rep_draws.R: 2.63%
R/tar_jags_df.R: 100.00%
R/tar_jags_example_data.R: 100.00%
R/tar_jags_example_file.R: 100.00%
R/utils_assert.R: 100.00%
R/utils_data.R: 100.00%
R/utils_jags.R: 100.00%
R/utils_logic.R: 100.00%

However, codecov results show 100% coverage across the same functions. I am unsure why this is the case but maybe from new edits to the functions since last code coverage check? I may have contributed to this due to my tardiness in the package review so my sincerest apologies for this.

  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

The package is consistent with rOpenSci packaging guidelines and also equally importantly with the {targets} ecosystem within which it belongs. Given that the {targets} package is also rOpenSci reviewed has helped ensure the thematic and packaging coherence of {jagstargets}. The package name, branding, tone and construct of README and vignettes all are coherent as a standalone package and also consistent within the package ecosystem it inhabits. As a newbie and recent user of {targets}, I found the {targets} story continue well into {jagstargets} particularly the function naming and documentation, the README/website, and the vignettes.

Estimated hours spent reviewing:

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

Firstly, I would like to apologise for the extreme delay in reviewing this package. I have been severely disorganised with my work and this has impacted on my ability to get to this review promptly.

I thank @wlandau for his work on the {jagstargets} which, from my review, has built on the strength of the {targets} package and facilitates its easy application for Bayesian data analysis using JAGS. I approach this review from the perspective of someone who hasn't formally used JAGS and someone who has only recently learned and currently using {targets}. The author's note in the README of requiring basic knowledge of {targets} and familiarity with Bayesian statistics and JAGS fit me as a new user and because of two really well worked out vignettes on how to get started and on simulation, I was able to go through a quick and easy introduction and orientation to the package and gives me enough foundation to be able to apply the package to appropriate tasks and related work in the future. It also definitely helps that {targets} is well documented with lots of really good examples and has an accompanying manual/handbook reference that I was able to use before to learn how to use it. With the {jagstargets} package being coherently written and documented in relation to the {targets} package, anyone who has already invested time in learning {targets} will find the {jagstargets} extremely easy to learn how to use and immediately handy in implementing a {targets} workflow for JAGS easily and quickly.

I have made a few comments above in the appropriate review topics on some minor issues I encountered when installing the development version of the package, on following one example in the simulation vignette, and on code coverage. In addition, on Good Practice check, I see the following result:

It is good practice towrite unit tests for all functions, and all package code in general. 14% of code lines are covered by
    test cases.

    R/tar_jags_rep_dic.R:100:NA
    R/tar_jags_rep_dic.R:101:NA
    R/tar_jags_rep_dic.R:102:NA
    R/tar_jags_rep_dic.R:103:NA
    R/tar_jags_rep_dic.R:104:NA
    ... and 557 more linesfix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd
    problems. LaTeX errors found:

The first point above reflects the result I am seeing on code coverage check. For the second point, I am unsure where this is arising from. I do not get any warnings on devtools::check().

I see no spelling issues after running spelling::spell_check_package() and I only see false positive spelling issues when running spelling::spell_check_files("README.Rmd"). These words are already in inst/WORDLIST so these won't get flagged on CRAN checks.

I would also like to mention that there were few and very minor edits that can be made in the vignettes (missing words, punctuation) that don't impact on readability and comprehension of the material but the author might want to fully correct/edit. If so, I would be happy to make those edits into a pull request.

Thank you again to @wlandau for your great work on this package and for the whole {targets} package ecosystem that you have and are continuing to create.

@wlandau
Copy link
Author

wlandau commented Jul 14, 2021

Thank you for your review, @ernestguevarra.

On running the workflow in the worked example in the vignette entitled "Bayesian simulation pipelines with jagstargets" that adds a target called coverage, I see the following errors...

ropensci/jagstargets@30e4ef6 repairs the _targets.R files for that example. The problem has to do with some code duplication that I unfortunately found necessary at the time to embed the example in an R Markdown vignette: there is one code chunk to show _targets.R to the reader and another code chunk to actually write _targets.R.

(Knowing what I know now after implementing Target Markdown, I think I could hack a custom knitr engine that writes a script file instead of evaluating the code, and this should help avoid duplicated code in new documentation I create from scratch for projects in the future.)

On install of {jagstargets} using devtools::install(dependencies = TRUE) from within the project, I see the following error

This is because jagstargets currently relies on some functions exposed in development targets that are not yet on CRAN (see Remotes: ropensci/targets in the jagstargets DESCRIPTION file). (I plan to update targets on CRAN on July 21.) If you run remotes::install_github("ropensci/targets") first, you should be able to install jagsttargets.

On covr::package_coverage(), I see the following result

Some tests use testthat::skip_on_cran() in order to reduce check time and other CRAN-related issues. If you set the NOT_CRAN environment variable to "true", these tests should run and coverage should show as 100%. (See usethis::edit_r_environ().) When I commented out NOT_CRAN=true in my own ~/.Renviron file, I saw the exact coverage pattern you posted.

fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd
problems. LaTeX errors found:

Often this particular error has something to do with the local LaTeX installation, e.g. https://stackoverflow.com/questions/10819959/diagnosing-r-package-build-warning-latex-errors-when-creating-pdf-version. I was unable to reproduce it locally.

I would also like to mention that there were few and very minor edits that can be made in the vignettes (missing words, punctuation) that don't impact on readability and comprehension of the material but the author might want to fully correct/edit. If so, I would be happy to make those edits into a pull request.

Happy to accept pull requests about spelling and grammar, or correct any specific spelling/grammar issues you identify.

Please let me know if there is something I missed in my response.

@wlandau
Copy link
Author

wlandau commented Aug 3, 2021

Hi everyone, it has been a while since I responded to @ernestguevarra's review. Is there something else I forgot to do?

wlandau-lilly pushed a commit to ropensci/jagstargets that referenced this issue Aug 3, 2021
@noamross
Copy link
Contributor

noamross commented Aug 4, 2021

Hi @wlandau, I tend to expect a second response with changes incorporating both reviewers feedback, but I realize your workflow incorporated both responses and all edits. That said, @ernestguevarra and @dill, do @wlandau's changes address your comments?

@dill
Copy link

dill commented Aug 5, 2021

absolutely!

@wlandau
Copy link
Author

wlandau commented Sep 21, 2021

Thanks, @dill! @ernestguevarra, is there anything else you would like me to work on?

@ernestguevarra
Copy link

Hi @wlandau. Apologies. Yes, your comments above has responded to my feedback. Thanks for explaining those issues to me and good to know that you are aware and have actually addressed those even before my comments.

I have nothing else to ask to be worked on. Thank you again for your work on this package!

@noamross
Copy link
Contributor

@ropensci-review-bot approve jagstargets

@ropensci-review-bot
Copy link
Collaborator

Approved! Thanks @wlandau for submitting and @dill, @ernestguevarra for your reviews! 😁

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname). If Appveyor does not pick up new commits after transfer, you might need to delete and re-create the Appveyor project. (Repo transfers are smoother with GitHub Actions)
  • Please check you have updated the package version to a post-review version and that you documented all changes in NEWS.md
  • We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

@wlandau
Copy link
Author

wlandau commented Sep 27, 2021

Thanks so much @noamross, @dill, and @ernestguevarra!

I transferred the repo. @noamross, would you grant me repo admin permissions? I seem to have lost them with the transfer.

@ernestguevarra, would you like to be included as a reviewer like @dill? https://github.com/ropensci/jagstargets/blob/e27d1ef3f260a41d005ed81d8b99e2364099400f/DESCRIPTION#L28-L33. Wondering because "Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file" was unchecked in your review.

@noamross
Copy link
Contributor

You should have repo permissions now, @wlandau .

@wlandau
Copy link
Author

wlandau commented Sep 27, 2021

Thanks Noam!

@wlandau
Copy link
Author

wlandau commented Sep 28, 2021

Looks like the jagstargets site is up, and I think I completed the other to-dos in this thread. Remaining to-dos on my end are JOSS and then CRAN. In the meantime, @ernestguevarra, feel free to snag that "rev" role whenever you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants