nextflow-kraken2

A relatively simple metagenomics analysis pipeline written in nextflow [1]. The pipeline is based on kraken2/bracken and kaiju, and is supplemented with Krona visualizations and interactive html tables. It is written with the idea to get taxonomic and abundance information for many samples, and not to compare different taxonomy assignment tools (but can be used for this as well).

Description

The pipeline runs in a docker container by default. Both Illumina and Nanopore data can be processed (separately). For a set of fastq files it executes:

fastp - filter and trim reads with default parameters
kraken2 [2] - taxonomic assignment of the reads
bracken [3] - abundance estimation at a single level in the taxonomic tree, e.g. species, using the kraken2 output
kaiju [4] - taxonomic classification of the reads based on maximum exact matches on protein level
krona [5] - plots are generated from the output of kraken2
DataTables - generates an interactive HTML table with the results from bracken for each sample, as well as a summary table for all the samples
MultiQC [6] - aggregates the results into a single html report

The pipeline runs kraken2/bracken or kaiju depending on the parameters supplied: use --kraken_db to run kraken2/bracken or --kaiju_db to run kaiju (or both parameters to run both).

The --kraken_db parameter is a path to a previously downloaded kraken2 database. A collection of ready-to-use kraken2/bracken RefSeq indexes can be downloaded from here.

The --kaiju_db can be one of refseq, progenomes, viruses, plasmids, fungi, nr, nr_euk, mar or rvdb. See the links above for available databases for each tool.

If none of these parameters is used, the pipeline will just run fastp.

Installation and running the pipeline

Nothing to install, as soon as you have docker and nextflow. Choose a kraken2 and/or a kaiju database (see below), and run the pipeline:

# run with a test dataset (included)
nextflow run angelovangel/nextflow-kraken2 -profile test

# see options and how to run
nextflow run angelovangel/nextflow-kraken2 --help

Output

All output files are in the folder results-kraken2, which is found in the folder with reads data used for running the pipeline. An example of the outputs, generated with a small Illumina dataset can be downloaded here.

The outputs are:

timmed_fastq/ - directory with fastq files after trimming, these are also used for taxonomic profiling
bracken_summary_heatmap/table.html- standalone html files with summary information from bracken. Note that these files will be generated only if there are less than 34 samples
bracken_summary_long/wide.csv- summary bracken information (all found taxa in all samples), in different formats
kraken2taxonomy_krona.html- an interactive Krona plot of the kraken2 output for all samples
samples/ - directory with individual (per sample) kraken2 and bracken-corrected report files and with the abundance table from bracken (as html and tsv). Tip: the report files can be directly imported in Pavian for nice interactive visualizations.

Choosing a `kraken2` and/or `kaiju` database

`--kraken_db`

An absolute path to a folder containing a kraken2 database. See the kraken2 homepage or Ben Langmead's collection for a list of avalable pre-built databases. These databases have the required Bracken files included (for read lengths 50, 100, 150, 200 and 250). Take care to use the correct --readlen parameter according to your reads data.

Note: although still controversial, recent work has shown that kraken2 may be performing better than QIIME in the analysis of 16S amplicons.

`--kaiju_db`

This argument can be one of refseq, progenomes, viruses, plasmids, fungi, nr, nr_euk, mar or rvdb. When this parameter is used, a source database and the taxonomy files are downloaded from the NCBI FTP server, converted into a protein database and indexed (kaiju-makedb). Check the memory and space requirements here before using.

References

This pipeline just uses some really nice work from others:

[1] P. Di Tommaso, et al. Nextflow enables reproducible computational workflows. Nature Biotechnology 35, 316–319 (2017) https://doi.org/10.1038/nbt.3820

[2] Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019) https://doi.org/10.1186/s13059-019-1891-0

[3] Lu J, Breitwieser FP, Thielen P, Salzberg SL. 2017. Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science 3:e104 https://doi.org/10.7717/peerj-cs.104

[4] Menzel, P., Ng, K. & Krogh, A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun 7, 11257 (2016). https://doi.org/10.1038/ncomms11257

[5] Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011;12:385. Published 2011 Sep 30. https://doi.org/10.1186/1471-2105-12-385

[6] Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics (2016). https://doi.org/10.1093/bioinformatics/btaa559

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
bin		bin
testdata		testdata
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

testdata

testdata

.dockerignore

.dockerignore

.gitignore

.gitignore

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

environment.yml

environment.yml

main.nf

main.nf

nextflow.config

nextflow.config

Repository files navigation

nextflow-kraken2

Description

Installation and running the pipeline

Output

Choosing a `kraken2` and/or `kaiju` database

`--kraken_db`

`--kaiju_db`

References

About

Releases

Packages

Languages

License

angelovangel/nxf-kraken2

Folders and files

Latest commit

History

Repository files navigation

nextflow-kraken2

Description

Installation and running the pipeline

Output

Choosing a kraken2 and/or kaiju database

--kraken_db

--kaiju_db

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

Choosing a `kraken2` and/or `kaiju` database

`--kraken_db`

`--kaiju_db`