Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process feedback rOpenSci review #30

Closed
richelbilderbeek opened this issue Mar 19, 2020 · 4 comments
Closed

Process feedback rOpenSci review #30

richelbilderbeek opened this issue Mar 19, 2020 · 4 comments

Comments

@richelbilderbeek
Copy link
Member

Request is here.

@richelbilderbeek
Copy link
Member Author

richelbilderbeek commented Mar 21, 2020

The first reviewer's comments:

Installation

I had trouble with the package installation due to issues with rJava, and never got it working on my Mac (macOS 10.14.6). The CRAN version of rJava is looking for a specific JDK which isn't the one on my machine. Compiling it from source appeared to work, but running any function crashes the R console. Installing on a Linux cluster worked.

It looks to me as if the dependency on rJava is only used in the get_default_java_path function, which seems disproportionate with the amount of trouble it is. I would recommend exploring whether the dependency can be removed, and if not providing a work-around for people who cannot get rJava working - one option would be to ask users to provide the path to Java manually.

I completely agree that getting rJava to work is the toughest nut to crack! I've seen this is true for Linux, Mac and Windows. Yet, beastier (a dependency of mcbette) needs it to -indeed- get the default Java path as well getting the Java version, which helps out in processing bug reports (created by beastier::beastier_report).

However, I feel hand-coding the same subset of functionality is not the way to go. rJava is the package to work with Java and I hope (and predict) the rJava maintainers to simplify its installation. I feel the responsibility about get Java info is at rJava, instead of mcbette: mcbette is there for model comparison.

Note that the rJava maintainer strongly recommends upgrading to rJava 0.9-12 at the 2020/03/23 rJava release update: maybe that update fixed exactly the problem Joelle had.

Other comments

instructions on how to configure to use an existing BEAST2 installation would be helpful, if it is possible.

I have documented using BEAST2 installed at a custom location with this commit.

On Mac mcbette is not looking for it in the default place it is installed to, so that could lead to many duplicate installations.

This would indeed be weird! I looked into this, by writing a test that installs BEAST2 at a custom location, after which mcbette is called to use that BEAST2 install (see this commit). The test, however, passes. If this weird behavior could be reproduced, that would definitely be hulpful, but up until then, I cannot fix a problem I cannot reproduce.

Documentation

the documentation (paper, README, vignette) was generally unclear as to which models can be selected using NS - the only mention I found was in the DESCRIPTION, saying that some substitution and clock models are available. It would be useful to include this in the README along with a list of available models. One example showing clock model comparison, in the vignette for instance, would also be useful.

I agree info on this was thin. I've created a vignette in beautier that showcases the inference model options.
In README, vignette and paper, I added the available site, clock and tree models.

in general, the documentation assumes that people using mcbette will already be familiar with the other packages of the suite, particularly beautier. I think this expectation needs to be stated much more clearly in the README and vignette, and that a link to an appropriate introduction vignette should be provided.

I've described the relation to the beautier package in more places and link to a vignette in beautier that showcases the inference model options.

the example in the README should describe the tested model briefly (JC69, strict clock, etc).

I added this.

there is a create_ns_inference_model function in the package, but the examples (README, vignette, package doc) use the beautier::create_test_ns_inference_model function instead - it's unclear what the difference is between the two, or which one should be used for a real analysis. If the test function is only used to set up a shorter analysis (as the README seems to imply ?), then it would be better to set the shorter analysis with the real function and modified parameters, and explain what is being done.

I moved create_ns_inference_model to the beautier package and made the documentation of the create_[something]_inference_model functions link to each other.

Additionally, I followed your suggestion and replaced:

x <- create_ns_test_inference_model()

by

x <- create_ns_inference_model()
x$mcmc <- create_test_ns_mcmc()

in the default parameter documentation, beast2_options should link to the create_mcbette_beast2_options function used in all the examples, instead of the beastier::create_beast2_options function. Similarly inference_model should link to either create_ns_inference_model or beautier::create_test_ns_inference_model rather than beautier::create_inference_model.

I made these link to one another. Note that the inference_model functions have been moves to beautier, and the beast2_options functions to beastier

when autocompleting in RStudio, the description shown on the function is the beginning of the Description section. This means that for instance check_mcbette_state shows only Will stop otherwise.. Same issue with check_beast2_ns_pkg, interpret_bayes_factor.

This is a neat observation, that I was unaware of. I have fixed this.

contrary to what the doc says, est_marg_lik does not return either estimates or trees.

Well spotted! I've updated the doc in this commit.

the error message when est_marg_lik fails prints the inference model nicely, but not the BEAST2 options (it shows beast2_1c6e311fc10d3.xml.stateNANAFALSETRUE).

Hmmm, I wish I could reproduce that! Because if I look into the code, I do not treat the inference model different than the BEAST2 options:

  tryCatch({
      bbt_run_out <- babette::bbt_run_from_model(
        fasta_filename = fasta_filename,
        inference_model = inference_model,
        beast2_options = beast2_options
      )
      ns <- bbt_run_out$ns
    },
    error = function(e) {
      stop(
        "Could not estimate the marginal likelihood. \n",
        "Error message: ", e$message, "\n",
        "fasta_filename: ", fasta_filename, "\n",
        "beast2_options: ", beast2_options, "\n",
        "inference_model: ", inference_model, "\n"
      )
    }

I would enjoy fixing this, but if I cannot check my results, I cannot see if I have fixed it. Could you reproduce the bug? If yes, I will happily add it!

it would be useful to have an example in the interpret_bayes_factor doc of how to use that function with the output from est_marg_liks.

I agreed, up until I added interpret_marg_lik_estimates, which interprets the results of est_marg_liks (#34), which -IMHO- makes the interpret_bayes_factor redundant. I hope you agree!

Minor comments

the doc for est_marg_lik links to itself, same with est_marg_liks

Well spotted! It now links correctly to est_marg_liks

* `est_marg_lik.R`, l16 "Actual" -> "Current"

Renamed.

Functionality

the main feature I think is missing from the package at the moment is the ability to keep the log and tree output of the babette runs, in addition to the marginal likelihood. The documentation says that the trees are returned, but that does not appear to be the case. I would expect that users doing model selection will also be interested in the parameter estimates for the best model.

To the mcbette main documentation (?mcbette) and mcbette::est_marg_liks I've added how to obtain the parameter estimates, posterior trees and other things stored in temporary files.

my other suggestion is to have a function similar to interpret_bayes_factor which could directly process the output from est_marg_liks, and print the table with the best model highlighted and how all the others compare to that one.

Great idea, I did so by adding interpret_bayes_factor_log_lik (#33). It even comes with a text plot!

JOSS paper comments

* two references are missing DOIs (required according to JOSS guidelines)

I added DOIs to all journal articles.

* there is a `> Example text below` left in l18

Blimey, well spotted! I've put the Summary there 👍

* the single quotes in the title (`'babette'`) give me a yaml parser error when building the file

I removed these quotes altogether. I've added a Travis CI test to verify that knitr::knit does work on the file.

* l59-61 I think you need to clarify that you are only talking about living species here rather than all species. It's also unclear to me why you say we "should be able to reconstruct" rather than "can reconstruct".

Agreed, I've used to wording you suggested.

* l61-62 I would introduce phylogenies earlier, when showing the primates tree.

I rewrote most of the paper, after which phylogenies are introduced sooner.

* l70-71 I think this sentence is too vague. I would suggest replacing it with "However, constructing a phylogeny is non-trivial, as it requires the researcher to choose how to model the evolutionary process, including..."

Agreed, I've used your more precise wording. Thanks!

* l80-85 This section needs to mention that the 'best' model is always defined in regards to a particular dataset. It's explained later but needs to be made clear earlier.

I have reworded in a such a way to avoid using the word 'best', and it includes that it is dependent on the alignment/dataset:

mcbette is an R package that determines the model with the most
evidence (aka marginal likelihood) for having generated the alignment,
from a set of models.

* l85 I don't think you can reuse this definition of 'best' for phylogenies - what would it mean for a phylogeny to be "simple" ?

Agreed, I removed it. Thanks!

Minor comments

* l49 then -> than

Fixed. Thanks!

* l63 a the -> the

Fixed. Thanks!

  • Add bjoelle to DESCRIPTION as reviewer

Did so.

@richelbilderbeek
Copy link
Member Author

richelbilderbeek commented May 24, 2020

Second reviewer's reply:

Documentation

The package includes all the following forms of documentation:

[ ] A statement of need clearly stating problems the software is designed to solve and its target audience in README
A statement is present, but I do not think it states the problem clearly enough. What is a "Model Comparison"? The phrase is too generic to give a user a sense of what mcbette intends to solve. Although someone familiar with BEAST can likely guess this for him/herself, it should not be up to the reader to interpret. I don't think the author needs to go into a whole lot of detail, but I'd suggest adding 2-3 more sentences to elaborate on what the point of fitting the models is and why the models are being compared at the top of the readme where the package is introduced. There's a few sentences in the paper.md that achieve this nicely, so I suggest just incorporating them here as well. I'd also rephrase "solve a common problem" in the Example section to give an explanation of what the problem in the example is and why it is being solved. Again, I anticipate that a user of this package will likely know how all this works. Rather, the point here is to make it more immediately clear what mcbette does when a user first encounters the package.

Indeed, it was a bit concise. I have rephrased this as such:

mcbette allows for doing a Model Comparison using babette (hence the name),
that is, given a (say, DNA) alignment, it compares multiple phylogenetic inference
models to find the model that is likeliest to generate that alignment.
With this, one can find the phylogenetic inference model that is
simple enough, but not too simple.

To do so, mcbette uses babette [Bilderbeek and Etienne, 2018] with the addition
of using the BEAST2 [Bouckaert et al., 2019] nested sampling package
as described in [Russell et al., 2019].

[x] Installation instructions: for the development version of package and any non-standard dependencies in README
In contrast to the other reviewer, I did not encounter any issues with installation or running code (macOS 10.15.4).

Great!

[x] Vignette(s) demonstrating major functionality that runs successfully locally
Great work! Nice explanation of a clear example. I'll suggest linking this on the readme too as it is very helpful for understanding what the package does and how it works.

Great!

[x] Function Documentation: for all exported functions in R help

Happy to read I did a good job 👍

[x] Examples for all exported functions in R Help that run successfully locally

Happy to read I did a good job 👍

[x] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Happy to read I did a good job 👍

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

[ ] A short summary describing the high-level functionality of the software

Indeed, somehow this short summary is missing! I've added it 👍

Some of this may be a matter of style, but I have a bit of feedback about how this paper is written. Firstly, the introductory material (all paragraphs and figures before "Constructing a phylogeny, however, is non-trivial...") are too basic for my taste -- I don't think evolution itself needs to be described in fundamental detail at all. This explanation is not appropriate for the package's audience -- we know all this already. Rather, I suggest starting with the logic of paragraph 4 ("Constructing a phylogeny, however, is non-trivial...") which states the problem at hand nicely. Although I like the core idea of this paragraph, the author may also consider fleshing out an explanation of why statistical models are used at all.

Thanks to your feedback, I have removed the whole -indeed slow- introduction and started at the point you suggested. With some rewriting, I feel I have introduced the need for statistical models enough (by introducing the term 'evidence', aka marginal likelihood) and I hope you agree.

What I will also more strongly recommend is that the author give the reader a sense of where to start when using the package. Broadly speaking, what is the analytical pipeline one should use with mcbette? I think inclusion of a worked example in the paper would help users know where to start, how to structure their data, which functions to look into, and which vignettes they may want to look into. This would be great to have as either the penultimate or the final paragraph. I don't think it needs to go into specific detail, but just give the reader a general sense of how the package works.

Agreed! I have added a section 'Getting Started' that should introduce the reader to the field and direct him/her to the literature.

  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).
    As noted by the other reviewer, some DOIs are missing and it would be great to have them included

Agreed, thanks! I have added these!

Functionality

[x] Installation: Installation succeeds as documented.
Nice touch with the can_run_mcbette() function! Very handy!

Thanks! If you like that one, you'll love mcbette_report 😎 👍

  • Functionality: Any functional claims of the software been confirmed.

Cool!

  • Performance: Any performance claims of the software been confirmed.

Awesome.

  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.

I am unsure why, but testing failed on my machine. Here are the relevant lines
from the devtools check:

── Test failures ───────────────────────────────────────────────── testthat ────

library(testthat)
library(mcbette)
before <- get_mcbette_state()
test_check("mcbette")
── 1. Error: Package style (@test-style.R#2) ───────────────────────────
invalid 'path' argument
Backtrace:

  1. lintr::expect_lint_free()
  2. lintr::lint_package(...)
  3. base::normalizePath(path, mustWork = FALSE)
  4. base::path.expand(path)

══ testthat results ════════════════════════════════════════════════════
[ OK: 88 | SKIPPED: 5 | WARNINGS: 0 | FAILED: 1 ]

  1. Error: Package style (@test-style.R#2)

Error: testthat unit tests failed
Execution halted
1 error x | 0 warnings ✓ | 1 note x
Error: R CMD check found ERRORs
Execution halted
Exited with status 1.

I have the same error. This was not the case upon the submission of mcbette.
A bug report has already been filed here in October 2019.
I've removed the tests, as it is tested by Travis CI anyways.

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Let's see what this reviewer thinks about this 👍

Estimated hours spent reviewing: ~3.5

* [x]  Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Done, including the ORCID ID 👍

Review Comments

Firstly, I am quite happy to see that this package exists and was submitted to rOpenSci. mcbette is poised to be very helpful towards biologists who work on tree inference and phylogenetic comparative analyses. By using BEAST and accompanying software, mcbette builds off the author's previous babette and helps expand a user's ability to rely on reproducible pipelines for analyzing genetic data. As someone who will likely use babette and other packages in this ecosystem in the near future, I appreciate what mcbette brings to the table: giving a reproducible approach to model comparisons of sequence evolution.

In running the worked examples as well as feeding in sequences from my own data sets, I did not encounter any issues with how the code ran or the results it produced. To the best of my ability to assess this, mcbette seems to produce correct results.

Happy to hear this!

That being said, testing failed on my machine. Please see above for the relevant lines from the devtools check output. Also in investigating testthat.R, the result of line 6 test_check("mcbette") was:

Error: No tests found for mcbette

As noted above, the devtools issues are caused by something upstream (in the lintr package) and needs to be resolved there.

Regarding the second error, I can perfectly reproduce this. I predict this is a feature, not a bug. With CTRL+SHIFT+T in R Studio or devtools::test() the tests do start as expected. Also, mcbette is tested by Travis CI (for Linux and Mac) and AppVeyor (for Windows) and its code coverage is measured from these tests. Therefore, I assume the tests works and that the No tests found for mcbette message is some R feature I and you misinterpret.

Unlike the other reviewer, I did not encounter issues during install (using macOS 10.15.4 with R 3.6.2 and then again on R 4.0.0) so I am sorry to not be able to offer any insight on troubleshooting. I should note that I already had rJava installed, which may help explain the lack of problems.

I am happy to have this confirmed 👍

My inline comments above are geared towards enhancing the clarity of the documentation. I will agree with the other reviewer's assessment -- the documentation and the accompanying paper are a little too sparse as presently written. I'd appreciate seeing expansions of both the high-level functionality (in a broad sense) as well as more detailed descriptions of how the author envisions users feeding data into analytical pipelines. I don't mind having similar information repeated in the readme, description, vignettes, and paper; doing so helps not only reinforce the message of the package and also helps users' understanding regardless of their point of entry into the package.

I added quite some documentation, due to feedback from the other reviewer indeed. I hope this will also satisfy your needs.

I agree with all the specific bullet points the other reviewer raised regarding the Documentation and Functionality sections and for the sake of brevity won't repeat them here. I'll add that providing a full(er) list of potential models would be great to have, perhaps as its own vignette.

I do not necessarily agree with the other reviewer's comments regarding the JOSS paper but this is primarily because I feel the paper should be restructured (please see inline comments above).

I have restructured the paper from a mix of both reviews. I hope you both like it. I, for sure, do!

Thanks again for the opportunity to review this package. Again, my suggestions are motivated by wanting to enhance the clarity of presentation and make it easier for potential users to see the value of relying on mcbette. It's nice work!

turtle

Thanks so much for reviewing!

richelbilderbeek pushed a commit that referenced this issue Jul 16, 2020
richelbilderbeek pushed a commit that referenced this issue Sep 5, 2020
@richelbilderbeek
Copy link
Member Author

Done!

@richelbilderbeek
Copy link
Member Author

Posted these replies back on ropensci/software-review#360.

Closing this one 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant