Skip to content

NBISweden/contigtax

Repository files navigation

contigtax

install with bioconda dependencies Docker update history CI

contigtax is a tool that assigns taxonomy to metagenomic contigs by querying contig nucleotide sequences against a protein database using diamond blastx and parses hits using rank-specific thresholds. The use of rank-specific thresholds was first introduced by Luo et al 2014 and used with some modification as explained in Alneberg et al 2018.

Install

Simplest way to install contigtax is via conda:

conda install -c bioconda contigtax

Alternatively, pull the docker image:

docker pull nbisweden/contigtax

Usage

  1. Download fasta file
contigtax download uniref100
  1. Download NCBI taxonomy
contigtax download taxonomy
  1. Reformat fasta file and create taxonmap
contigtax format uniref100/uniref100.fasta.gz uniref100/uniref100.reformat.fasta.gz
  1. Build diamond database
contigtax build uniref100/uniref100.reformat.fasta.gz uniref100/prot.accession2taxid.gz taxonomy/nodes.dmp
  1. Search (here assembled contigs are in file assembly.fa)
contigtax search -p 4 assembly.fa uniref100/diamond.dmnd assembly.tsv.gz
  1. Assign (here output from the contigtax search step are in file assembly.tsv.gz)
contigtax assign -p 4 assembly.tsv.gz assembly.taxonomy.tsv

Running contigtax with Docker

To run contigtax with docker simply substitute contigtax in the commands above with docker run --rm -v $(pwd):/work nbisweden/contigtax, e.g.:

  1. Download fasta file
docker run --rm -v $(pwd):/work nbisweden/contigtax download uniref100
  1. Download NCBI taxonomy
docker run --rm -v $(pwd):/work nbisweden/contigtax download taxonomy
  1. Reformat fasta file and create taxonmap
docker run --rm -v $(pwd):/work nbisweden/contigtax format uniref100/uniref100.fasta.gz uniref100/uniref100.reformat.fasta.gz
  1. Build diamond database
docker run --rm -v $(pwd):/work nbisweden/contigtax build uniref100/uniref100.reformat.fasta.gz uniref100/prot.accession2taxid.gz taxonomy/nodes.dmp
  1. Search (here assembled contigs are in file assembly.fa)
docker run --rm -v $(pwd):/work nbisweden/contigtax search -p 4 assembly.fa uniref100/diamond.dmnd assembly.tsv.gz
  1. Assign (here output from the contigtax search step are in file assembly.tsv.gz)
docker run --rm -v $(pwd):/work nbisweden/contigtax assign -p 4 assembly.tsv.gz assembly.taxonomy.tsv