Culvert: A Robust Framework for Secondary Indexing

Secondary indexing is a common design pattern in BigTable-like databases that allows users to index one or more columns in a table. This technique enables fast search of records in a database based on a particular column instead of the row id, thus enabling relational-style semantics in a NoSQL environment. This is accomplished by representing the index either in a reserved namespace in the table or another index table. Despite the fact that this is a common design pattern in BigTable-based applications, most implementations of this practice to date have been tightly coupled with a particular application. As a result, few general-purpose frameworks for secondary indexing on BigTable-like databases exist, and those that do are tied to a particular implementation of the BigTable model.

We developed a solution to this problem called Culvert that supports online index updates as well as a variation of the HIVE query language. In designing Culvert, we sought to make the solution pluggable so that it can be used on any of the many BigTable-like databases (HBase, Cassandra, etc.). Our goal with Culvert is to make an easy, extensible tool for use in the entire NoSQL community.

Building

Requirements:

Java 1.5
Maven 3 (though Maven 2 may work).
Hbase 0.92

To install:

Pull down the source and run: "mvn clean package". This outputs a compiled jar.
Install the jar on the classpath of all the servers hosting your table.
Install the jar on the local server (the 'client') from which to issue requests.
Create an index table and update your configurations
Create an instance of a com.bah.culvert.Client
Write your data into your primary table through the Client.

Resources

All support resources for Culvert are present under resources/. Currently, the folder consists of:

CulvertFormat.xml - Formatting for eclipse of the code. Set this for all the Culvert projects from Preferences > Java > Code Style >Formatter

Roadmap

Switch Joins to first attempting to use an in-memory table, server side, before dumping results into a 'scratch' table
Enable higher consistency puts through use of coprocessors in HBase
1. Switch to doing the table put before the index
2. Actually use CPs to ensure that a put has been made before updating the index (two phase commit)
Adding support for removes (consistent or otherwise)
Add support for batch indexing existing tables
Add more index types
1. Document Partitioned Index
2. N-grams index
3. Numeric indexes (integer, float, etc)
4. Web URL index

Community

Culvert is a brand new project and we are continually looking to grow the community. We welcome any input, thoughts, patches, etc.

Help

You can find help or talk development on IRC at #culvert on irc.freenode.net

Information on how to use culvert is also available at this blog post.

The original slides from the presentation at Hadoop Summit 2011 are available on slideshare

Disclaimer

Culvert is provided AS-IS, under the Apache License. See LICENSE.txt for full description.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
culvert-accumulo		culvert-accumulo
culvert-examples		culvert-examples
culvert-hbase		culvert-hbase
culvert-main		culvert-main
hive-culvert-handler		hive-culvert-handler
resources		resources
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

culvert-accumulo

culvert-accumulo

culvert-examples

culvert-examples

culvert-hbase

culvert-hbase

culvert-main

culvert-main

hive-culvert-handler

hive-culvert-handler

resources

resources

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

NOTICE.txt

NOTICE.txt

README.md

README.md

pom.xml

pom.xml

Repository files navigation

Culvert: A Robust Framework for Secondary Indexing

Building

Resources

Roadmap

Community

Help

Disclaimer

About

Releases

Packages

Contributors 2

Languages

Navigation Menu

License

booz-allen-hamilton/culvert

Folders and files

Latest commit

History

Repository files navigation

Culvert: A Robust Framework for Secondary Indexing

Building

Resources

Roadmap

Community

Help

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Languages