`Mycovista` - a pipeline to assemble (highly repetitive) bacterial genomes

Installation

In order to run Mycovista, you'll need to install:

Usage

To use Mycovista, edit the config.yaml file

specify the pipeline you want to use: long or hybrid mode
insert additional information:
- path to input short reads
- path to input long reads
- path to output folder
- names of the bacterial strains

Note: The raw read files must contain the name of the strain in the file name.

A python script helps you to generate all folders required for the output. Just run:

python scripts/create.py

Afterwards, all required data should be linked in the raw_data/ folder. Please check this!

Now you can get started and assemble your bacterias. You can start Mycovista by using:

snakemake -s <mode> -c <#threads> --use-conda

mode - assembly mode file (long or hybrid)

Tip: You can use -n for a dry run to check if snakemake will start all required rules.

Tools

When using Mycovista, please cite all incorporated tools as without them this pipeline wouldn't exist.

FastQC and NanoPlot are used for quality check of the reads (raw and preprocessed). Short reads are filtered by fastp for adapter clippling and Trimmomatic for quality trimming. Long reads are filtered by length using Filtlong. Flye assembles the long reads first. The assembly is then polished with long reads by Racon using minimap2 as mapper inbetween. Afterwards, medaka is incorporated as additional polishing step with long reads. In hybrid mode, the assembly postprocessed further with Racon and minimap2 using short reads. The final assembly is annotated by Prokka and general assembly statistics are calculated by QUAST.

Downstream analysis of our M. bovis assembly panel included pangenome analsis followed by a genome-wide association analysis (GWAS). We provided the R script for the GWAS in scripts/gwas.R.

Click here for all citations

fastp
- Chen S, Zhou Y, Chen Y, Gu J (2018) fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34(17):i884–i890
Trimmomatic
- Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120
Filtlong
- Wick R (2018) Filtlong. Available: https://github.com/rrwick/Filtlong
Flye
- Kolmogorov M, Yuan J, Lin Y, Pevzner PA (2019) Assembly of long, error-prone reads using repeat graphs. Nature Biotechnology 37(5):540–546
Racon
- Vaser R, Sovi ́c I, N N, Šiki ́c M (2017) Fast and accurate de novo genome assembly from long uncorrected reads. Genome Research 27:737–746
minimap2
- Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34(18):3094–3100
medaka
- Ltd. ONT (2018) medaka: Sequence correction provided by ONT Research. Available: https://github.com/nanoporetech/medaka
FastQC
- Andrews S, et al. (2012) FastQC (Babraham Institute. Available: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/
NanoPlot
- De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C (2018) NanoPack: visualizing and processing long-read sequencing data. Bioinformatics 34(15):2666–2669
QUAST
- Gurevich A, Saveliev V, Vyahhi N, Tesler G (2013) QUAST: quality assessment tool for genome assemblies. Bioinformatics 29(8):1072–1075
Prokka
- Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069
R
- R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
envs		envs
pic		pic
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
hybrid		hybrid
long		long

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

envs

envs

pic

pic

scripts

scripts

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

config.yaml

config.yaml

hybrid

hybrid

long

long

Repository files navigation

`Mycovista` - a pipeline to assemble (highly repetitive) bacterial genomes

Installation

Usage

Tools

Pipeline Overview

About

Releases 2

Packages

Languages

License

rnajena/mycovista

Folders and files

Latest commit

History

Repository files navigation

Mycovista - a pipeline to assemble (highly repetitive) bacterial genomes

Installation

Usage

Tools

Pipeline Overview

About

Resources

License

Stars

Watchers

Forks

Languages

`Mycovista` - a pipeline to assemble (highly repetitive) bacterial genomes