Skip to content

Evaluating the consistency of molecular diagnostic characters (signature characters) detected by DeSignate.

License

Notifications You must be signed in to change notification settings

maxganser/consistency-script

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Consistency script

Tool description

In a first step, the tool utilizes DeSignate to detect signature characters for a selected query group in a reference alignment and alternative alignments comprising identical sequences. Secondly, consensus signature characters congruently detected in all alignments are identified.

For more details and an example application, please read our manuscript:

Ganser M.H., Santoferrara L.F., and Agatha S. (2022). Molecular signature characters complement taxonomic diagnoses: a bioinformatic approach exemplified by ciliated protists (Ciliophora, Oligotrichea) @ Molecular Phylogenetics and Evolution (in press)

Usage

Requirements

This script requires DeSignate (Hütter et al. 2020), a tool that detects molecular signature characters for taxon diagnoses. To use DeSignate, clone its repository to the root directory of this repository:

git clone https://github.com/DatabaseGroup/DeSignate

Input files

  1. Alignment files in fasta format
    • Example:
    >Sequence-1-label
    -TTGGCTGTCACAGTGTC-
    >Sequence-2-label
    --TGGTACTGACAGTGT--
    ...
    
  2. Two separate files with comma separated sequence labels comprising the query and reference group (e.g., in txt or csv format)
    • Example:
    Sequence-1-label, Sequence-2-label, ...
    
    PLEASE NOTE: Sequence labels must be identical in the alignments and also exactly match those in the query and reference group files. Otherwhise, the program terminates with an error message stating the missing/wrong sequence labels.

Output files

  • consensus-sigchars.csv : Alignment positions of consensus signature characters + DeSignate results (character states, signature type, entropy values)
  • non-consensus-positions.csv : Reference alignment positions of non-consensus signature characters
  • designate-results.csv : Complete DeSignate results of reference alignment for the selected query and reference groups

Commands

To execute the script use the following command:

python consistency.py --alignments path/alignment_01.fasta path/alignment_02.fasta path/alignment_03.fasta --query_group path/query_group.txt --reference_group path/reference_group.txt
List of commands:
--alignments : Paths to alignment files. The first file represents the reference alignment, subsequent files represent alternative alignments.
--query_group : Path to query group file.
--reference_group: Path to reference group file.
--k_window : Two position analysis, default = 1 for one position analysis.
--consider_gaps : Include gaps as a character state, default = True.

About

Evaluating the consistency of molecular diagnostic characters (signature characters) detected by DeSignate.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages