Skip to content

A tool to identify and annotate homoplasies on a phylogeny and sequence alignment

License

Notifications You must be signed in to change notification settings

JosephCrispell/homoplasyFinder

Repository files navigation

GitHub stars


HomoplasyFinder

Author: Joseph Crispell

Licence: GPL-3

Requires: R (>= v3.3.3) & rJava (>= v10.0.1)



Description

HomoplasyFinder is an open-source tool designed to identify homoplasies on a phylogeny and its nucleotide alignment. HomoplasyFinder uses the consistency index to identify sites in the nucleotide alignment that are inconsistent with the phylogeny provided. The current R package was written to allow easy use of the Java code (which HomoplasyFinder uses) in R. Full documentation is provided on the HomoplasyFinder wiki.

Installation

install.packages("devtools")
devtools::install_github("JosephCrispell/homoplasyFinder")
devtools::install_github("JosephCrispell/basicPlotteR") # Makes annotated plotted phylogeny prettier :-)
library(homoplasyFinder)

Executing

# Find the FASTA and tree files attached to package
fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder")
treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")

# Get the current working directory
workingDirectory <- paste0(getwd(), "/")

# Run the HomoplasyFinder jar tool
inconsistentPositions <- runHomoplasyFinderInJava(treeFile=treeFile, 
                                                  fastaFile=fastaFile, 
                                                  path=workingDirectory)
 
# Get the current date
date <- format(Sys.Date(), "%d-%m-%y")
 
# Read in the output table
resultsFile <- paste0(workingDirectory, "consistencyIndexReport_", date, ".txt")
results <- read.table(resultsFile, header=TRUE, sep="\t", stringsAsFactors=FALSE)
 
# Read in the annotated tree
tree <- readAnnotatedTree(workingDirectory)
 
# Plot the annotated tree
plotAnnotatedTree(tree, inconsistentPositions, fastaFile)

You should get the following plot:

Now extended to deal with the presence/absence of INDELs

HomoplasyFinder can now calculate the consistency of INDELs (or any regions) on a phylogeny. To do this simply replace the FASTA file with a CSV formatted table reporting the presence/absence of regions. Here is an example of a format:

start,end,isolateA,isolateB,isolateC
34802,35208,0,1,0
39068,39069,0,0,1

Test it out using the following:

# Find the FASTA and tree files attached to package
presenceAbsenceFile <- system.file("extdata", "presenceAbsence_INDELs.csv", package = "homoplasyFinder")
treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")

# Get the current working directory
workingDirectory <- paste0(getwd(), "/")

# Run the HomoplasyFinder jar tool
inconsistentPositions <- runHomoplasyFinderInJava(treeFile=treeFile, 
                                                  presenceAbsenceFile=presenceAbsenceFile, 
                                                  path=workingDirectory)
 
# Get the current date
date <- format(Sys.Date(), "%d-%m-%y")
 
# Read in the output table
resultsFile <- paste0(workingDirectory, "consistencyIndexReport_", date, ".txt")
results <- read.table(resultsFile, header=TRUE, sep="\t", stringsAsFactors=FALSE)

Source code

Java source code is available here and R package (wrapper) code here.

Citation

If you use HomoplasyFinder in your research, it would be great if you could cite the following article: Crispell, J., Balaz, D., & Gordon, S. V. (2019). HomoplasyFinder: a simple tool to identify homoplasies on a phylogeny. Microbial Genomics. https://doi.org/10.1099/mgen.0.000245

About

A tool to identify and annotate homoplasies on a phylogeny and sequence alignment

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages