Skip to content

EvolBioInf/andi

Repository files navigation

Build Status Coverage Status

About

This is the andi program for estimating the evolutionary distance between closely related genomes. These distances can be used to rapidly infer phylogenies for big sets of genomes. Because andi does not compute full alignments, it is so efficient that it scales even up to thousands of bacterial genomes.

This readme covers all necessary instructions for the impatient to get andi up and running. For extensive instructions please consult the manual.

Installation and Usage

Stable versions of andi are available via package managers. For manual installation see below.

For Debian and Ubuntu:

sudo apt-get install andi

For macOS with Homebrew:

brew tap brewsci/bio
brew install andi

For ArchLinux with aura:

sudo aura -A andi

With a successful installation you can get the usage instructions via --help or the man page.

$ andi --help
$ man andi

You can simply use andi with your genomes in FASTA format.

$ andi S1.fasta S2.fasta
2
S1     0.0  0.1
s2     0.1  0.0

From this distance matrix the phylogeny can be inferred via neighbor-joining. Check the manual for a more thorough description.

Manual installation

If your system does not support one of the above package managers you have to manually build the latest stable release from a tarball. See the manual for extensive building instructions.

This program has the following external dependencies: libdivsufsort and the GSL. Please make sure you installed both before attempting a build. If you did get the source, not as a tarball, but straight from the git repository, you will also need the autotools.

Assuming you have installed all prerequisites, building is as easy as follows.

$ autoreconf -fi -Im4  # optional when building from tarball
$ ./configure
$ make
$ make install

Excessive build instructions are located in INSTALL.

Links and Additional Resources

The release of this software is accompanied by a paper from Haubold et al.. It explains the used anchor distance strategy in great detail. The maf2phy.awk script used in the validation process is located under scripts. Simulations were done using our own simK tool. For a demo visualising the internals of andi visit our GitHub pages.

Data Sets

  1. 29 E. coli and Shigella strains: data
  2. 109 E. coli ST131 strains (paper):
  3. 3085 Streptococcus pneumoniae strains (paper): ftp://ftp.sanger.ac.uk/pub/pathogens/Streptococcus/pneumoniae/Maela_assemblies.tgz

License

Copyright © 2014 - 2021 Fabian Klötzl
License GPLv3+: GNU GPL version 3 or later.

This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. The full license text is available at http://gnu.org/licenses/gpl.html.

Some files may be licensed differently.

Contact

In case of bugs or unexpected errors don't hesitate to send me a mail: kloetzl@evolbio.mpg.de