AdaptDB

AdaptDB is an adaptive storage manager for analytical database workloads in a distributed setting.

It works by partitioning datasets across a cluster and incrementally refining data partitioning as queries are run.

AdaptDB introduces a novel hyper join that avoids expensive data shuffling by identifying storage blocks of the joining tables that overlap on the join attribute, and only joining those blocks. Hyper join performs well when each block in one table overlaps with few blocks in the other table, since that will minimize the number of blocks that have to be accessed.

To minimize the number of overlapping blocks for common join queries, AdaptDB users smooth repartitioning to repartition small portions of the tables on join attributes as queries run.

A prototype of AdaptDB running on top of Spark improves query performance by 2-3x on TPC-H as well as real-world dataset, versus a system that employs scans and shuffle-joins.

Name		Name	Last commit message	Last commit date
Latest commit History 470 Commits
gradle/wrapper		gradle/wrapper
pref		pref
scripts		scripts
spark-partitioner		spark-partitioner
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gradle/wrapper

gradle/wrapper

pref

pref

scripts

scripts

spark-partitioner

spark-partitioner

src

src

.gitattributes

.gitattributes

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

build.gradle

build.gradle

gradlew

gradlew

gradlew.bat

gradlew.bat

settings.gradle

settings.gradle

Repository files navigation

AdaptDB

About

Releases

Packages

Contributors 4

Languages

License

mitdbg/AdaptDB

Folders and files

Latest commit

History

Repository files navigation

AdaptDB

About

Topics

Resources

License

Stars

Watchers

Forks

Languages