Skip to content

JuliaHealth/CAOS.jl

Repository files navigation

CAOS

Characteristic Attribute Organization System (CAOS) implementation in Julia.

MacOS / Linux Windows Test Coverage Documentation Lifecycle
Build Status AppVeyor Codecov Docs Docs Lifecycle

Installation

Requirements

  • BLAST 2.7.1+ installed and accessible in your PATH (eg. you should be able to execute $ blastn -h from the command line).

Install BLAST with Anaconda:

conda install blast -c bioconda

Instal CAOS.jl

using Pkg
Pkg.clone("https://github.com/bcbi/CAOS.jl")

Contributing

Contributions consistent with the style and quality of existing code are welcome. Be sure to follow the guidelines below.

Check the issues page of this repository for available work.

Committing

This project uses commitizen to ensure that commit messages remain well-formatted and consistent across different contributors.

Before committing for the first time, install commitizen and read Conventional Commits.

pip install commitizen

To start work on a new change, pull the latest develop and create a new topic branch (e.g. feature-resume-model, chore-test-update, bugfix-bad-bug`).

git add .

To commit, run the following command (instead of git commit) and follow the directions:

cz commit

Project Status

The package is tested against the current Julia 1.0 and Julia 1.1 release on OS X and Linux.

Contributing and Questions

Contributions are very welcome, as are feature requests and suggestions. Please open an issue if you encounter any problems or would just like to ask a question.

Documentation

The two functions provided by this package are located in the user_functions.jl file. This file also leverages helper functions from the files: caos_functions.jl, tree_functions.jl , utils.jl, classification.jl, gap_imputation.jl.

To use this package to classify a sequence, first run the generate_caos_rules function on your tree in the required NEXUS format. This will create the necessary CAOS rules and files to use for classification. Once you have generated CAOS rules, run the classify_new_sequence function on the sequence you wish to classify. The resulting classification will be written to file in your defined output directory.

Generate CAOS rules from a phylogenetic tree, writes CAOS rule files to the output directory.

generate_caos_rules(tree_file_path::String, output_directory::String)
  • tree_file_path : The path leading to the NEXUS file containing the phylogentic tree to be used to create CAOS rules. The exact format of the NEXUS file is described below.

  • output_directory : The directory which will contain all files pertaining to CAOS rules and classification

  • The function will write 7 files to the output directory: caos_rules.json contains all the CAOS rules for the tree, character_labels.json and taxa_labels.json contain information connecting sequences to names and locations in the tree (internal use), and the 4 .fasta files will be utilized later for sequence alignment using BLAST during classification.

Classify a sequence using CAOS rules already generated, writes the classification label to file in the output directory.

classify_new_sequence(sequence_file_path::String, output_directory::String ; all_CA_weights::Dict{Int64,Dict{String,Int64}}=Dict(1=>Dict("sPu"=>1,"sPr"=>1,"cPu"=>1,"cPr"=>1)), occurrence_weighting::Bool=false, tiebreaker::Vector{Dict{String,Int64}}=[Dict{String,Int64}()])
  • sequence_file_path : The path leading to the text file containing the sequence you wish to classify. The file should only contain the characters of the sequence

  • output_directory : The directory which contains all files pertaining to CAOS rules and classification

  • all_CA_weights : An optional argument for the weights to be given to different types of CA's (default is all 1)

  • occurrence_weighting : An optional argument for whether to use occurrence weighting for private rules (default is false)

  • tiebreaker : An optional argument for whether to use a tiebreaker (next set of CA weights), or return the entire subtree

NEXUS File Format

In order for the parser to correctly extract all relevant information from your phylogeneitc tree, your NEXUS file must be in the exact format described below (most NEXUS files will already be in this format, but if you are having issues with your file being read properly, here is how to format it):

  • The tree must in Newick format (only parentheses, commas, and numbers)
  • The tree must be on a line with the words "TREE" and "=", and only contain parentheses as part of the Newick representation
  • The character labels (names associated with each sequence of characters) should be exactly 3 lines beneath a line with the word "MATRIX" (this should be the only time the word "MATRIX" appears in the file)
  • Each character label should be its own line, with the name followed by a number of space, and then the character sequence
  • After your last character label, the following line should contain only a ";"
  • Taxa labels (taxa numbers for the position in the newick formatted tree associated with each character sequence name) should appear directly after a line containing the word "TRANSLATE" (this should be the only occurrence of that word in the file)
  • Each taxa label should be its own line, with the taxa number followed by the character sequence name (at least one space in between the two)
  • The line with the last taxa label should end with a ";"

An example NEXUS file is provided in the repository : S10593.nex

An example sequence file is provided in the repository : Example_Sequence.txt

About

Julia implementation of the characteristic attribute organization system (CAOS) algorithm

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages