Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shiny DADA2 #277

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Shiny DADA2 #277

wants to merge 5 commits into from

Conversation

joey711
Copy link
Collaborator

@joey711 joey711 commented Jul 1, 2017

@benjjneb
Here is a working (on my system) prototype of a Shiny-DADA2 app.

After installing dada2 from this branch, you should be able to just run the following to test it out.

shinyDADA2()

Not really ready to merge it into master. I'm still ironing out some things, like some of the final output data structures and summary plots; and also the on-app doc. It isn't obvious anywhere in the doc, but your folder of fastqFiles must have an info.txt file, which has a tab-delimited table with columns Sample, Direction, and File.

Here is a link to a version of the MiSeqSOP data for testing -- which simply means it has the info.txt file already created. It also includes a tutorial showing how that file was made: stage.Rmd and stage.html.

Looking forward to feedback!

@joey711
Copy link
Collaborator Author

joey711 commented Jul 1, 2017

I was tracking on this locally and with manual snapshots the last few days, but since it works and I can version control it in a branch on the dada2 repo, I thought that was probably better. Plus I can get your feedback/help earlier rather than later.

@benjjneb
Copy link
Owner

benjjneb commented Jul 1, 2017

Very cool! Will try this out this week.

@joey711
Copy link
Collaborator Author

joey711 commented Jul 30, 2017

This is now supported as a docker image:

https://hub.docker.com/r/joey711/dada2-shiny-devel/

with instructions on dada2docker:

https://github.com/joey711/dada2docker

@benjjneb
Copy link
Owner

Some quick notes for myself as I finally get around to testing this:

install_github("benjjneb/dada2", ref=github_pull(277))

On first install ran into the following error:

Error in packageVersion("shinyFiles") : package ‘shinyFiles’ not found

Solved by install.packages("shinyFiles") and then reloading.

@benjjneb
Copy link
Owner

benjjneb commented Aug 13, 2017

First impressions, I think the basic interface layout looks good. Filter/Learn/Denoise is a good way to factorize the workflow. It would probably be good to add chimera filtering (or just roll it into the Run DADA command with another option). Also a new tab for assigning taxonomy is probably desirable?

Testing-wise though, I'm stuck on step 1 -- the Shiny interface won't let me pick a folder in any of the folder select interfaces: I just get blank options there:

screen shot 2017-08-13 at 6 20 03 pm

PS:

R version looks okay:
3.4.0
Bioconductor version 3.5 (BiocInstaller 1.26.0), ?biocLite for help
data.table package version:
1.10.4
dada2 package version:
1.5.3
DT package version:
0.2
magrittr package version:
1.5
ggplot2 package version:
2.2.1
shiny package version:
1.0.3
shinyFiles package version:
0.6.3

@joey711
Copy link
Collaborator Author

joey711 commented Aug 16, 2017

@benjjneb are you using the docker version? or this branch?
I think I probably hard-coded the directories that are exposed to the shinyFiles host-directory explorer. If you look at where shinyFiles widget gets defined, you should see some paths. You can change this in your local version, or commit a change to include the root path as "/". I found it easier for testing to pick a path very close to my test directories so that iterative testing was faster.

For the docker version, the host environment is always the same (so hard-coding it works really well there), and the user just maps their desired sequence files to that location. The details of that are shown in the docker command for dada2docker/shiny-dada2 (or whatever i called it).

Update on CLI

In short, I'm using a very effective CLI-generator called docopt (maybe you've heard of it), and rather than just dump a bunch of scripts with if-statements to handle different cases, I've written an R6-based object-oriented class to wrap the different approaches, with a parent generic to define the general features, importers, etc. To formally and generally handle file manifest and pairing, I define a simple tab-delimited manifest file (we could easily support a few other formats if they are natural tables with the columns we expect), as well as a YAML-based file ("parameters.yml") for passing in parameters for different methods in each workflow step. The expected names in that file are the same as in dada2 itself, so no need to re-define parameters or their doc. This keeps it a natural extension of dada2 rather than an end-around using the package. My plan is to also update the shiny-dada2 prototype to use these classes so that both the CLI and shiny workflow behaviors are consistent and more easily maintained.

I will try to post a new branch with this CLI code soon just to start tracking and sharing it. Next step would be to merge that with the shiny branch and migrate the shiny code to use those classes before testing and adding these additional features.

Comments on your first impressions

  • glad you like the interface. Feel free to point at improvements. Glad also that you like the three major divisions. It matches how we want separate things for "cloud" implementation. I'd like to add more diagnostics. The quality profile and error rates plots are helpful, but we can imagine additional routine diagnostics that will help flag anomalies we already know to check for.
  • "It would probably be good to add chimera filtering (or just roll it into the Run DADA command with another option)." -- Completely agree. The current CLI class I mentioned above includes chimera filtering in this step, so when I migrate the shiny app to this, it will include this.
  • "Also a new tab for assigning taxonomy is probably desirable?" -- I agree. And taxonomy is often a very good diagnostic for the data overall, relative to what the user knows/expects.
  • What about including tree-building? DECIPHER/phangorn seems to work well enough that we could relatively easily include it.
  • OUTPUT: I think it probably makes sense to support biom-format output (and also alignment and newick if we build a MSA and tree), specifically for interop with other tools that aren't dada2 or phyloseq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants