Skip to content
Lyndon Coghill edited this page Jul 13, 2015 · 15 revisions

Description

Phyloboost is a python pipeline that allows the reconstruction, augmentation and visualizaton of the similarity-cluster-based tree sets constructed from the Phylota pipeline. These trees are unrooted trees built from datasets encompassing all of eukaryota from Genbank with more than 60,000 genera. Phyloboost allows you to augment those trees by adding new sequences to the clusters, rebuilding the alignments and trees, and then attempt to build rooted trees from those trees sets via graph-based methods by aligning them with the NCBI taxonomy.

Purpose

  • Denovo clustering of eukaryotic Genbank DNA sequences.
  • Filter the clusters removing any 'model organisms' identified by algorithms.
  • Filter the clusters removing taxonomically misidentified sequences.
  • Filter the clusters removing any 'known' tranposable elements.
  • Build new alignments for all of the cluster sets.
  • Build unrooted trees for all of those alignments.
  • Attempt to root those trees via convex subtree graph methods.
  • Compare and visualize those trees aligned to the NCBI taxonomy.

Installation

  • Install the software requirements
  • Install the needed databases
  • Prime the databases
  • Clone the Phyloboost Repo
  • Run the pipeline