Skip to content

Key Value Stores

Archie L. Cobbs edited this page Jan 9, 2024 · 10 revisions

Overview

Permazen includes a well-defined key/value database API along with several implementations based on existing technologies. This page describes the API generally and discusses the provided implementations.

Click on the hyperlinks to view API Javadocs.

Overview of Key/Value API

The key/value API defines these concepts:

KVStore

A KVStore is just a map with byte[] keys and values. The map is sorted, meaning it's efficient to find the next or previous key, and keys iterate in either direction in unsigned lexicographical order. In other words, underneath the covers is some kind of balanced tree data structure (not just a simple hash table).

In some cases a KVStore may be read-only. Such a KVStore can be wrapped in a MutableView to make it mutable. The MutableView will simply record all of the mutations applied. Then later, you can grab the saved-up mutations and apply them somewhere else, serialize them to send over the network, or whatever.

AtomicKVStore

An AtomicKVStore is a read-only KVStore that, in addition, supports atomic "snapshots" and atomic application of batched mutations. This API closely follows that provided by log-structured stores like LevelDB and RocksDB.

KVDatabase and KVTransaction

A KVDatabase is a key/value database. This API is very simple; the only useful thing you can do with it is create a KVTransaction.

A KVTransaction is just a KVStore that provides the transaction's view into the key/value database, plus a few additional methods like commit() and rollback().

Some types of KVTransactions (especially those based on an underlying AtomicKVStore style technology like Google Spanner) support mutable snapshots via readOnlySnapshot(). This is an efficiently created, semi-permanent "freeze frame" copy of all the data entire transaction. Even when the original transaction is closed, the snapshot can still be accessed. Moreover, accessing it doesn't create conflicts with other transactions.

Some types of KVTransactions also support creating key watches via watchKey(). This basically sets up a "trigger" that outlives the original transaction and fires later when some other transaction commits a change to the specified key. Especially in clustered databases, it's a convenient way to wakeup threads on remote nodes.

KVTransaction Semantics

In general, the precise semantics (e.g., whether ACID-compliant) of a KVTransaction are derived from the underlying technology. In order to support a wide variety of technologies, there are a couple of subtleties built-in to the KVTransaction API you should be aware of.

First, virtually every database technology that supports concurrent transactions has notions of lock timeouts and/or optimistic lock failures. This broad category includes any condition where the database cannot allow a transaction to proceed because doing so would violate some guarantee such as consistency or isolation.

Because these errors happen randomly depending on what other activity happens to be occurring, they are considered transient conditions, and the appropriate response is to simply start the transaction over from the beginning and retry.

In Permazen, retryable errors appear as RetryTransactionException's. You can handle these manually yourself in a retry loop, or for Spring users, the DellRoad Stuff library provides the @RetryTransaction annotation that automatically retries transactions:

    @RetryTransaction
    @Transactional
    public void doSomething() {
        // Do work here ...
    }

Also see Spring Retry which does the same thing.

For some key/value databases, a RetryTransactionException thrown by commit() does not imply that the transaction did not actually get committed. Consider for example a clustered database where the network fails at just the right moment, such that the local node cannot know for sure whether the other nodes "got the message". Because of the uncertainty, the local node is forced to throw the exception.

This fact has an important implication: transactions should be written to be idempotent in case a committed transaction is retried.

Secondly, for some key/value databases, the data seen in a transaction is not guaranteed to be valid unless the transaction is committed - even for read-only transactions. This is typical for databases that are clustered over multiple machines such as Raft and Google Spanner (see below).

MVCC Databases

Permazen includes support for converting any AtomicKVStore into a full-fledged transactional KVdatabase, using a straightforward MVCC versioning scheme that provides linearizable semantics. As you might guess, this involves creating transactions by taking atomic snapshots and wrapping them in MutableView's, applying the batched mutations on transaction commit(), and performing conflict resolution to ensure consistency (see Reads and Writes).

Performance Considerations

When using SQL technologies, it is typical for an entire set of rows to be read at once (rather than one row/column cell one at a time), and it is also common for queries to contain joins that link multiple objects together into one rowset. In effect, the SQL language serves not only to specify what data to fetch, but also how to batch up that data for bulk transport. This means fewer database round trips are required, and therefore transaction performance is less sensitive to query latency. You are in effect sending a short computer program to the database, and the database is executing that program and sending you back the output.

The Permazen key/value API does not (yet) support an analogous way to "batch" query results. Each field access is converted directly into a lookup in the key/value database. As a result, when there is high latency associated with querying the key/value store (such as when using JDBC over a network connection), Permazen is likely to be much slower for bulk reads.

This can be partially mitigated by using the CachingKVDatabase, which will perform read-ahead on key/value pairs using background threads. Note however that if using SQL, some JDBC connections (e.g., MySQL Connector/J), while thread-safe, are not designed for multi-threaded operation. For these the CachingKVDatabase may perform better when configured with a single thread. But even if fully multi-threaded, the caching layer cannot predict queries to disconnected regions of the key space.

Extending the key/value API to support batch queries, and taking advantage of this support in the existing key/value stores (especially SQL ones), is an area of current development. What's needed is the key/value database equivalent of a "short program" that could be sent to the database, with the output returned. An obvious choice of language for the "short program" would Java. In fact, at least one commercial database vendor (VoltDB) is already using Java as its "stored procedure" language.

KVDatabase Implementations

Permazen includes the following key/value implementations and support modules. Feel free to contribute, or write your own!

Module Class Technology Description
kv-array ArrayKVDatabase MVCC using AtomicArrayKVStore ArrayKVStore is a read-only KVStore implementation based on a straightforward linear array of keys and values. AtomicArrayKVStore adds a write-ahead log and a background compaction thread. This is a good choice for a simple application that is not write-intensive and does not need to store a lot of data. Total key and value data must not exceed 2GB (each separately). Supports key watches, snapshots, and hot-copy backups.
kv-bdb BDBKVDatabase Berkeley DB Java Edition Straightforward implementation using BDB. Relatively frequent locking conflicts tend to generate a lot of RetryTransactionExceptions.
kv-caching CachingKVDatabase Background threads A caching and pre-fetching layer for key/value databases that have high latency. Useful with SQL-based databases when the database is located remotely over the network.
kv-cockroach CockroachKVDatabase CockroachDB Straightforward SQL-style implementation on top of CockroachDB
kv-fdb FoundationKVDatabase FoundationDB FoundationDB is back! It was swallowed up by Apple for a few years, but now they've open sourced it again. It's nearly a perfect match for Permazen.
kv-leveldb LevelDBKVDatabase Port of LevelDB to Java Works but not heavily tested or highly maintained. Could use some love.
kv-lmdb LMDBKVDatabase LMDB, specifically the lmdbjava Java port. Seems to work but I haven't played with it much.
kv-mssql MSSQLKVDatabase Microsoft SQL Server SQL-style implementation on top of Microsoft SQL Server
kv-mvstore MVStoreKVDatabase MVStore is a persistent, log structured key-value store. It is used as default storage subsystem of H2. Seems to work but I haven't played with it much. Also includes MVStoreAtomicKVStore
kv-mysql MySQLKVDatabase MySQL SQL-style implementation on top of MySQL
kv-raft RaftKVDatabase Raft Consensus Algorithm expanded into a key/value database Fully linearizable ACID-compliant distributed database that tolerates failure of up to half of the nodes in the cluster. This database is in production as part of a mission-critical commercial application and has been heavily tested. Requires an AtomicKVStore for storing local node state. Supports key watches, snapshots, and dynamically adding/removing nodes.
kv-raft FallbackKVDatabase RaftKVDatabase Adds a "standalone mode" option to RaftKVDatabase, where if a node is disconnected from the majority is can continue to function using its own private copy of the database. Merge hooks are provided to allow the application to reconcile itself when the majority is available again.
kv-rocksdb RocksDBKVDatabase RocksDB's Java API Works but not heavily tested or highly maintained. Could use some love.
kv-simple SimpleKVDatabase In-memory TreeMap Memory-only implementation. Supports concurrent transactions via a straightforward shared/exclusive range locking scheme.
kv-simple XMLKVDatabase In-memory TreeMap backed by a flat XML file A SimpleKVDatabase that persists the entire data to a flat file after each commit.
kv-spanner SpannerKVDatabase Google Spanner Spanner provides an AtomicKVStore type of API, but with the additional feature that Google does the conflict resolution for you when applying mutations. SpannerKVDatabase is a straightforward MVCC-style implementation on top of this. Recommended to also use a CachingKVDatabase layer (or someday this? issue #144183725). Supports snapshots.
kv-sql SQLKVDatabase SQL/JDBC Support class for SQL databases. Defines a single table with key and value binary columns. MySQLKVDatabase, MSSQLKVDatabase, SQLiteKVDatabase, and CockroachKVDatabase are subclasses of SQLKVDatabase. Likely to be slower if queries are going over a network.
kv-sqlite SQLiteKVDatabase JDBC Wrapper for SQLite Straightforward SQL-style implementation on top of the popular SQLite database. Good choice if you want a simple but mainstream solution.
kv-xodus XodusKVDatabase Xodus High performance key/value database from the folks at JetBrains.