Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several new suggestions for the "Explore Zika virus evolution" tutorial #54

Open
davidcroll opened this issue Mar 24, 2021 · 0 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@davidcroll
Copy link

Explore Zika virus evolution

Suggestions

  • Mention that all the code can be copied to the command line, despite the backslashes. Even as a Linux user since 2012, it was new to me that the backslashes actually work on the command line interface; they are not just "line breaks" for our documentation page
  • why does augur not have a manpage?
    • no big issue, but augur --help should be mentioned

Setup

  • Augur and Auspice are already installed
  • I assume that nextstrain-cli already works

Build steps

no issues

Prepare the Sequences

Suggestions

  • A FASTA sequence is shown - PAN/CDC_259359_V1_V3/2015 - and also the contents of a tsv file which contains the metadata. Why not display the part of the tsv file that actually contains PAN/CDC_259359_V1_V3/2015?
  • The shown tsv file is tab-delimited. Would it be nicer to show a screenshot of an Excel sheet? Or a Markdown table? That way, "virus" would actually appear above "zika". Much more intuitive.
  • Explain "Accession". Is it the unique ID of that sequence? But in the first column of the tsv file, there is already something that should be unique...

Index the Sequences

Filter the Sequences

Suggestion

  • (Needs a bigger change...) The code below is self-explanatory, but... but "exclude". With --exclude, can I specify which sequences to ignore? Or does the file specified by --exclude store the sequences which have been excluded?
augur filter \
  --sequences data/sequences.fasta \
  --sequence-index results/sequence_index.tsv \
  --metadata data/metadata.tsv \
  --exclude config/dropped_strains.txt \
  --output results/filtered.fasta \
  --group-by country year month \
  --sequences-per-group 20 \
  --min-date 2012

Align the Sequences

Suggestion

There is the following input...

augur align \
  --sequences results/filtered.fasta \
  --reference-sequence config/zika_outgroup.gb \
  --output results/aligned.fasta \
  --fill-gaps
  

Would it be sensible to tell users what kind of format zika_outgroup.gb has? Can one use a FASTA file, too? Could it also be nucleotide sequence instead of a amino acid one?

Construct the Phylogeny

without any problems

Get a Time-Resolved Tree

Suggestion

  • maybe explain some of the options/flags, for example --coalescent.

Annotate the Phylogeny

Reconstruct Ancestral Trais

no issues

Infer Ancestral Sequences

no issues

Identify Amino-Acid Mutations

no issues

Export the Results

no issues

Visualize the Results

it works

little suggestion - for later

  • auspice could launch the browser

Automate the Build with Snakemake

Suggestions

  • tell user to install snakemake first, but he'll get reminded to install it anyway

And:

If you've installed Augur & Auspice, simply run

  • I would change this to reflect that Augur & Auspice are also contained in conda's nextstrain environment, so the user should simply activate it!
@davidcroll davidcroll added the documentation Improvements or additions to documentation label Mar 24, 2021
@huddlej huddlej self-assigned this Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants