Parallel wavele tree construction

In this repository we provide the implementation of two new parallel algorithms to the construction of wavalet trees. The first algorithm, pwt, has $O(n)$ time complexity and uses O$(n\lg\sigma+\sigma\lg n)$ bits of space, including the space of the final wtree and excluding the input. The second algorithm, dd, has $O(\lg n)$ time complexity and uses $O(n\lg\sigma+p\sigma\lg n\lg\sigma)$ bits of space, using $p$ threads.

Input

Our implementation is intended only for research. The input of our algorithms is limited to integer alphabets in binary format, where the alphabet is continuous.

Compilation

We provide two bash scripts to the compilation of our algorithms. The first script, build.sh, is intended to run experiments to measure the running time.

bash build.sh

The second script in intended to run experiments to measure the memory consumption.

bash build_memory.sh

By default, our algorithms use Cilk Plus and assume integer alphabets using 4 bytes to encode each symbol of the input. Below, we provide a list of additional options to compile our algorithms:

Options

-DNO_RANK_SELECT: This option avoids the construction of the rank/select structures of the $\lg\sigma$ bitarrays of the wavelet tree.
-DNO_CILK: This option allows to compile a sequential version of our algorithms, avoiding to use Cilk Plus.
-DCHARS: This option allows to read inputs with 1 byte per symbol.
-DSHORT: This option allows to read inputs with 2 bytes per symbol.
-DMALLOC_COUNT: This option allows to measure the memory consumption of our algorithms.

Usage

./pwt <input sequence> <alphabet size> [validation file]
./dd <input sequence> <alphabet size> [validation file]

A limitation of our current implementation is that the alphabet size is not computed by the implementation. Instead, the alphabet size has to be given as an argument. Note that the alphabet size has to be a power of two. For input sequences that has an alphabet size that is not a power of two, is the closest power of two that is greater that the original alphabet size.

For validation purpose, our algorithms accepts a third argument indicating the name of an output file. We fill this output file using the access operation of the wavelet tree from the position 0 up to n-1. In order to validate the construction, is it necessary to compare the input sequence with the output file. If both are equal, then the wavelet tree construction is correct. This can be done with the diff command.

To Do

Compute on-the-fly the alphabet size of the input.
Extend the input sequences to non-continous alphabets.
Include a make file to make the compilation simpler.

Notes

In our implementation we use bit array implementation of Isaac Turner and the implementation of rank/select structures of Rodrigo Gonzalez. To measure the memory consumption, we use the malloc_count wrapper of Timo Bingmann.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
bitrank		bitrank
LICENSE.txt		LICENSE.txt
README.md		README.md
basic_wt.h		basic_wt.h
build.sh		build.sh
build_memory.sh		build_memory.sh
dd.c		dd.c
malloc_count.c		malloc_count.c
malloc_count.h		malloc_count.h
pwt.c		pwt.c
util.c		util.c
util.h		util.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bitrank

bitrank

LICENSE.txt

LICENSE.txt

README.md

README.md

basic_wt.h

basic_wt.h

build.sh

build.sh

build_memory.sh

build_memory.sh

dd.c

dd.c

malloc_count.c

malloc_count.c

malloc_count.h

malloc_count.h

pwt.c

pwt.c

util.c

util.c

util.h

util.h

Repository files navigation

Parallel wavele tree construction

Input

Compilation

Options

Usage

To Do

Notes

About

Releases

Packages

Languages

License

jfuentess/waveletree

Folders and files

Latest commit

History

Repository files navigation

Parallel wavele tree construction

Input

Compilation

Options

Usage

To Do

Notes

About

Resources

License

Stars

Watchers

Forks

Languages