Skip to content
Elaina edited this page May 25, 2021 · 15 revisions

BinSanity v0.5.4

BinSanity contains a suite a scripts designed to cluster contigs generated from metagenomic assembly into putative genomes. What makes BinSanity unique is the usage of Affinity Propagation (AP) and its biphasic approach. The biphasic approach whereby contigs are clustered initially with contig coverage followed by refinement using GC% and k-mer frequencies yields more complete bins than similar methods (See the BinSanity Publication)

Additionally, AP is a deterministic algorithim that does not need to set the number of clusters. Affinity Propagation has been shown to be more effective than methods such as k-means in clustering of pictures using a facial recognition as well as identifying regulated transcripts (Check out the paper here).

Summary of to Binsanity v0.5.4

  1. Updates version inconsistency
  2. Changes way coverage file is processed to attempt fixing the read error users have experienced
  3. Adds Binsanity2-beta to the usage options

Core BinSanity Scripts:

More detailed descriptions of each script is given under the Usage section

  • Binsanity
    • BinSanity implements Affinity Propagation to cluster contigs into putative genomes using contig coverage as an input
  • Binsanity-refine
    • BinSanity-refine incorporates tetranucleotide frequencies, GC%, and optionally incorporates the coverage profile
  • Binsanity-wf
    • Binsanity-wf runs Binsanity and Binsanity-refine sequentially to optimize cluster results
  • Binsanity-profile
    • Binsanity-profile uses featureCounts to produce the coverage profiles requires in Binsanity, Binsanity-refine, and Binsanity-wf
  • Binsanity-lc
    • Binsanity-lc is written for large metagenomic assemblies (e.g >100,000 contigs) where Binsanity and Binsanity-refine become to memory intensive. It uses K-means to subset contigs based on coverage before implementing Binsanity **
  • Binsanity2-beta
    • Binsanity2-beta is written to ultimately replace the Binsanity-wf and Binsanity-lc by merging the two workflows. Binsanity2-beta has the option for initial kmeans based subsetting of contigs for large assemblies and by default automatically implements kmeans on assemblies larger than 75,000 contigs. This approach is being adapted currently into a new snakemake workflow and new data will be released along with the workflow demonstrating the improved functionality.

Citation

Graham ED, Heidelberg JF, Tully BJ. (2017) BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation. PeerJ 5:e3035 https://doi.org/10.7717/peerj.3035

Clone this wiki locally