C++ Suffix Arrays

This is a set of Suffix Array implementations using the SA-IS algorithm and the Skew Algorithm.
LCP Array construction from a Suffix Array is implemented by the Kasai Algorithm and used to find the Longest Common Substring in k-Strings.
Resources to learn about the algorithim, Suffix Arrays, and their applications can be found at the end of this readme.

Compiling & Building

In order to build the project, you will need to install:

CMake
Make

Clone the Repository.

git clone "https://github.com/Tascate/Suffix-Array-Implementation.git"

Compile using CMake.

cmake .

Build using the Makefile.

make

Run to view test code.

./SuffixArray

Example Usage

Given the following code:

#include "Skew.h"

Skew sa;
sa.addString("cabbage");
sa.makeSuffixArray();
sa.printSuffixArray();

This will output the computed Suffix Array using the Skew Algorithm:

Suffix Array : 7 1 4 3 2 0 6 5

Things to note

The abstract class SuffixArray represents a generic base class that any class can derive from to implement different Suffix Array Construction Algorithms. The classes: Naive, Skew, SAIS, all derive from the SuffixArray class to implement their respective algorithms.

Skew Algorithm

O(n) time, O(n) work
The algorithm is fairly simple to implement, requiring less code overall than the SA-IS implementation. Because of this, it is quite useful for quick testing and usage of Suffix Arrays. This implementation has support for multiple strings which is useful for finding the LCS of k-Strings.

SA-IS Algorithm

O(n) time, O(n) work
This is an implementation of the SA-IS algorithm for my own understanding as well as improving my C++. The code is largely based on Screwtape's A walk through the SA-IS Suffix Array Construction Algorithm and may not necessarily be optimized. Screwtape's walkthrough is a great resource for learning the SA-IS algorithm!

Kasai Algorithm

O(n) time, O(1) work
A simple algorithm to construct the LCP Array from a Suffix Array in linear time. In this code, the LCP Array can be used to find the Longest Common Substring.

Longest Common String for k-Strings

O(n) time, O(n) work)
Finding the LCS is based upon the premise that for a sliding window from i to j on the LCP array. The LCP value for that sliding window is the minimum value from i+1 to j provided that the first character for the suffix i and suffix j are the same. Finding the minimum value in our sliding window takes amortized O(1) time using a deque. An algorithim for finding the minimum value of a sliding window can be found here.
We can then use bookkeeping to find the longest common substring.

Finding out which string a suffix belongs to:

For coding simplicity, finding out the parent string for a given suffix takes O(logm) time (where m = # of strings) and O(1) space using binary search. However, it is possible to optimize for O(1) time by tracking the parent string for a suffix in memory. (For example, a map or array)

Good Resources I used to Learn about Suffix Arrays

Algorithm Documentation:

SA-IS: "Linear Suffix Array Construction by Almost Pure Induced-Sorting" by G. Nong, S. Zhang and W. H. Chan"
Skew: "Simple Linear Work Suffix Array Construction" by Juha Kärkkäinen and Peter Sanders)
Kasai: "Linear-Time Longest Common-Prefix Computation in Suffix Arrays and Its Applications" by Toru Kasai, Gunho Lee, Hiroki Arimura, Setsuo Arikawa and Kunsoo Park

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Naive.cpp		Naive.cpp
Naive.h		Naive.h
README.md		README.md
SAIS.cpp		SAIS.cpp
SAIS.h		SAIS.h
Skew.cpp		Skew.cpp
Skew.h		Skew.h
SuffixArray.cpp		SuffixArray.cpp
SuffixArray.h		SuffixArray.h
main.cpp		main.cpp

Tascate/Suffix-Arrays-in-CPP

Folders and files

Latest commit

History

Repository files navigation

C++ Suffix Arrays

Compiling & Building

Example Usage

Things to note

Skew Algorithm

SA-IS Algorithm

Kasai Algorithm

Longest Common String for k-Strings

Finding out which string a suffix belongs to:

Good Resources I used to Learn about Suffix Arrays

Algorithm Documentation:

Learning:

Videos:

About

Topics

Resources

Stars

Watchers

Forks

Languages