Skip to content

timweh/MEMSA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MEMSA - A MEM Extracting Multiple Sequence Aligner

MEMSA is a mutltiple sequence alignment (MSA) tool, which identifies maximum exact matches (MEMs) before applying a traditional MSA algorithm, in order to speed up the alignment process. It was developed to investigate the effects of this heuristic on computation time and alignment accuracy and demonstrates that the preprocessing step can indeed positively impact the alignment of genomic sequences.

Requirements

This tool was developed for MacOS and Linux. In order to built it, the gcc compiler needs to be installed.

Manual

The tool can be executed by putting a single sequence in the reference FASTA-file and putting the all other sequences to be aligned in the input FASTA-file. The reference sequence can be picked arbitrarily from the dataset, as the choice of the reference does not affect the alignment. The generated alignment will be written into the output file.

Install

./install.sh

The installation script downloads the required dependencies slaMEM and MAFFT and builds an executable from the source code.

Usage

./memsa (<options>)

To run MEMSA for the provided example files and default parameters, just run ./memsa

Options:

  • -s : minimum seed length (default=20)
  • -g : maximum merge gap (default=1)
  • -r : reference file name (default="reference.fa")
  • -i : input file name (default="input.fa")
  • -o : output file name (default="alignment.fa")

Example:

./memsa
./memsa -s 50 -g 0
./memsa -r ref.fa -i sequences.fa -o result.fa

For the (extremely simple) example files provided, one can observe that for a minimum seed length -s of 5-8, MEMSA finds exactly one common seed across all sequences. For smaller seed sizes, the potential seeds are not consistent (duplicate and/or out of order) whereas for larger seed sizes, not a single seed present in all sequences is found.

Releases

No releases published

Packages

No packages published