Skip to content
Pavel V. Dimens edited this page Dec 7, 2021 · 2 revisions

gust logo

An easy breezy snp-based whole genome phylogenetic pipeline

Gust is really easy to use, so there isn't a ton to write in this wiki. It is built using preexisting software, so I want to make the effort to acknowledge the hard work of all the teams and individuals involved in building their respective applications.

The workflow language in which gust is written

  • Köster, Johannes, and Sven Rahmann. "Snakemake—a scalable bioinformatics workflow engine." Bioinformatics 28.19 (2012): 2520-2522.

Converts fasta files to fastq format

Creates the sliding window fragments

Maps the genome fragments onto the reference genome

  • Vasimuddin Md, Sanchit Misra, Heng Li, Srinivas Aluru. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE Parallel and Distributed Processing Symposium (IPDPS), 2019.

Filters, compresses, sorts, merges, and indexes the alignments

  • Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H, Twelve years of SAMtools and BCFtools, GigaScience (2021) 10(2) giab008 [33590861]

Splits the reference genome into equal segments, calls the snps

  • Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907 [q-bio.GN] 2012

Controls the parallelization of freebayes. Often used, rarely cited!

  • O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.

Variant Call Format file compression and filtering

  • Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H, Twelve years of SAMtools and BCFtools, GigaScience (2021) 10(2) giab008 [33590861]

Performs the multiple-sequence alignment of the SNPs

  • Kazutaka Katoh, Daron M. Standley, MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability, Molecular Biology and Evolution, Volume 30, Issue 4, April 2013, Pages 772–780, https://doi.org/10.1093/molbev/mst010

Builds and bootstraps the phylogenetic trees

  • Alexey M Kozlov, Diego Darriba, Tomáš Flouri, Benoit Morel, Alexandros Stamatakis, RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference, Bioinformatics, Volume 35, Issue 21, 1 November 2019, Pages 4453–4455, https://doi.org/10.1093/bioinformatics/btz305

Final tree plotting

Clone this wiki locally