A new linear-time algorithm for centroid decomposition

The centroid of a tree is a node that, when removed, breaks the tree in connected components of size at most half of that of the original tree. By recursing this procedure on the components, one obtains the centroid decomposition of the tree, also known as centroid tree. The centroid tree has logarithmic height and its construction is a powerful pre-processing step in several tree-processing algorithms. The folklore recursive algorithm for computing the centroid tree runs in O(n*log(n)) time. To the best of our knowledge, the only result claiming O(n) time is unpublished and relies on (dynamic) heavy path decomposition of the original tree. In this repository, we include our implementation for the new linear-time algorithm that we proposed, based on the idea of applying the folklore algorithm to a suitable decomposition of the original tree.

Installing

The main scripts are cdstd.cpp and cdlin.cpp, which are respectively the standard O(n*log(n)) and our new O(n) algorithms for computing the centroid decomposition of a tree. We also provide the benchmark.cpp script that we used to measure the performance of our algorithm over the standard one, and some random tree generators, included in the ./tree_gen/ folder. To compile everything, just run make all.

Implementation details

Both T and T'' are internally represented with a vector<uint32_t>: each node of the two trees occupies a certain portion of this vector.

In T, the i-th node occupies the region T[i], ..., T[j]:

T[i] is the number of i's children, packed with a bitflag (MSB) saying if it is a cover element;
T[i+1] is i's parent;
T[i+2], ..., T[j] are i's children. Each one is represented by two integers T[h], T[h+1]:
- T[h] is the child's ID;
- T[h+1] is the weight of the subtree rooted at T[h].

In T'', the i-th node occupies the region T[i], ..., T[k]:

T''[i] is the number of i's children;
T''[i+1] is i's parent;
T''[i+2] is the size of the corresponding treelet on T;
T''[i+3] is the ID of the root of the corresponding treelet on T (i.e. the alpha function);
T''[i+4], ..., T''[k] are i's children. Each one is represented by three integers T''[h], ..., T''[h+2]:
- T''[h] is the child's ID;
- T''[h+1] and T''[h+2] are the two deltas on the edge connecting i to T[h].

Performances

Datasets

All datasets have been generated using the code in folder "tree_gen".

Funding

Nicola Prezza has been supported by the project Italian MIUR-SIR CMACBioSeq ("Combinatorial methods for analysis and compression of biological sequences") grant n.~RBSI146R5L, PI: Giovanna Rosone. Link: http://pages.di.unipi.it/rosone/CMACBioSeq.html

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
examples		examples
imgs		imgs
src		src
tree_gen		tree_gen
.gitignore		.gitignore
README.md		README.md
benchmark.cpp		benchmark.cpp
cdlin.cpp		cdlin.cpp
cdstd.cpp		cdstd.cpp
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

imgs

imgs

src

src

tree_gen

tree_gen

.gitignore

.gitignore

README.md

README.md

benchmark.cpp

benchmark.cpp

cdlin.cpp

cdlin.cpp

cdstd.cpp

cdstd.cpp

makefile

makefile

Repository files navigation

A new linear-time algorithm for centroid decomposition

Installing

Implementation details

Performances

Datasets

Funding

About

Contributors 3

Languages

davidedellagiustina/linear-centroid-decomposition

Folders and files

Latest commit

History

Repository files navigation

A new linear-time algorithm for centroid decomposition

Installing

Implementation details

Performances

Datasets

Funding

About

Topics

Resources

Stars

Watchers

Forks

Languages