Skip to content

jacobjinkelly/sequencing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sequencing Build Status

A repository of algorithms for genomic sequencing.

Algorithms

Boyer-Moore

The repository currently consists of an implementation of the Boyer-Moore algorithm for exact alignment of a sequencing read to a reference genome. The implementation uses a linear-time algorithm (the "Z algorithm") as described by Gusfield (lecture notes 4 & 5) to construct indexes of the query string used for the good suffix and bad character rules.

Tests

The full test suite consists of tests for correctness and performance of the implementation of the Boyer-Moore string matching algorithm, using the Naïve string matching algorithm as a baseline comparison.

Test cases for correctness are available in tests. The following command will convert these test cases into the FASTA format required to run the full test suite:

./format_test.sh

Test cases for performance must be downloaded. The reference genomes are from the Dec. 2013 (GRCh38/hg38) assembly of the human genome; chromosomes 1 and 20 were used and can be downloaded here and here respectively; place the (decompressed) downloaded files in the root directory of the repository. The query string is included in FASTA format in the file p.fa and is taken from Ben Langmead's slides.

The full test suite can then be run with the following command:

./run_test.sh

Acknowledgements

Ben Langmead's course was immensely helpful, and was a guide throughout developing this repository and learning about genomic sequencing.

About

Fast alignment of genomic sequences using Boyer-Moore with linear time construction of indexes using Z algorithm.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published