GitHub - jkatzsam/blinkdb: BlinkDB: Sub-Second Approximate Queries on Very Large Data

Queries with Bounded Errors and Bounded Response Times on Very Large Data

BlinkDB is a large-scale data warehouse system built on Shark and Spark and is designed to be compatible with Apache Hive. It can answer HiveQL queries up to 200-300 times faster than Hive by executing them on user-specified samples of data and providing approximate answers that are augmented with meaningful error bars. BlinkDB 0.1.0 is an alpha developer release that supports creating/deleting samples on any input table and/or materialized view and executing approximate HiveQL queries with those aggregates that have statistical closed forms (i.e., AVG, SUM, COUNT, VAR and STDEV).

BlinkDB requires:

Scala 2.10.x
Spark 0.9.x

Name		Name	Last commit message	Last commit date
Latest commit History 1,279 Commits
bin		bin
conf		conf
data/files		data/files
hive_blinkdb @ 232639f		hive_blinkdb @ 232639f
lib		lib
matlab_tests		matlab_tests
papers		papers
project		project
sbt		sbt
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
build		build
run		run
scalastyle-config.xml		scalastyle-config.xml

License

jkatzsam/blinkdb

Folders and files

Latest commit

History

Repository files navigation

Queries with Bounded Errors and Bounded Response Times on Very Large Data

BlinkDB requires:

For current documentation, see the BlinkDB Wiki.

For more information about the BlinkDB Project, see the BlinkDB Website.

About

Resources

License

Stars

Watchers

Forks

Languages