Skip to content
masikol edited this page May 26, 2023 · 14 revisions

Welcome to the Barapost wiki!

The Barapost command line toolkit is designed for binning FASTA, FASTQ and FAST5 files (i.e. separation into different files) according to taxonomic classification of nucleotide sequences stored in them. Classification is implemented as finding the most similar reference sequence in a nucleotide database: remotely using NCBI BLAST web serveice or on a local machine with BLAST+ toolkit.

Applications (possible use cases)

  • Demultiplexing whole genome sequencing reads (basically, “long” reads) without barcoding. This demultiplexing is based on taxonomic annotation, therefore organisms of interest should be distant (having in mind naive classification algorithm and lateral gene transfer). Usually, it is enough for organisms to belong to different genera.
  • Genome assembly: Barapost can detect and remove contigs assembled from cross-talks. You might have seen such contigs: they are short and have low coverage, they don't belong to genome of interest and should be removed.
  • Comparing different sets of contigs of the same genome (see Example 7 on barapost-local page).
  • I dare to surmise this list isn't complete :-)

The workflow and what Barapost does

  1. barapost-prober.py -- this script submits several sequences (i.e. only a part of your data set) to NCBI BLAST server in order to determine what taxons are present in data set. barapost-prober.py saves accession numbers of best hit(s) of each submitted input sequence. Processing all sequences in this way takes too much time, what leads us to barapost-local.py.

  2. barapost-local.py -- this script firstly downloads best hits “discovered” by barapost-prober.py from GenBank, then composes a database from downloaded reference sequences on local machine and finally classifies the major part of data using created database. barapost-local.py creates a database and BLASTs input sequences with the BLAST+ toolkit.

  3. barapost-binning.py -- this script bins (divides into separate files) nucleotide sequences according to results of barapost-prober.py and/or barapost-local.py

Wiki table of contents