Skip to content
Robbie Hanson edited this page Feb 27, 2014 · 18 revisions

This article does a deep-dive into the architecture of YapDatabase.

Overview

YapDatabase is built atop SQLite. It is designed from the ground up to use Write Ahead Logging (WAL), which is a feature added to SQLite beginning with version 3.7.0. (available on iOS since 4.3+). Here's what the SQLite documentation says about WAL:

  1. WAL is significantly faster in most scenarios.
  2. WAL provides more concurrency as readers do not block writers and a writer does not block readers. Reading and writing can proceed concurrently.

One of the primary concerns of YapDatabase is concurrency, and this article will deal extensively with how this is achieved. Internally (at the sqlite level) concurrency uses the WAL architecture to allow read operations to execute concurrently with a write operation. The YapDatabase architecture is designed to take full advantage of this functionality.

From the top

In order to use YapDatabase, you start by creating an instance of YapDatabase. The API at this top level is sparse. This is by design. Just as there is only a single sqlite database file, there is only a single instance of this top level database object. You cannot create a second instance of a top level database object that refers to the same database file. (If you try, it won't work and you'll get an error. This is enforced by the internal YapDatabaseManager class.) You gain access to the database via one or more connections (YapDatabaseConnection). More on this later.

Serialization & Deserialization

When you create the database instance, you specify routines (via blocks) to handle the serialization and deserialization of your objects and metadata. You can specify your own routines, or you can use any of the built-in routines. You can optionally specify separate routines for objects vs metadata.

The ability to specify your own serialization routines is part of the power and flexibility of YapDatabase. For more detailed information, see the Storing Objects wiki article.

Connections

From the database instance you use the newConnection method to get a 'connection' to the database. This is where the concurrency comes in:

  • You can have multiple connections.
  • Every connection is thread-safe.
  • You can have multiple read-only transactions simultaneously without blocking.
  • You can have multiple read-only transactions and a single read-write transaction simultaneously without blocking.
  • There can only be a single read-write transaction at a time. Read-write transactions go through a per-database serial queue.
  • There can only be a single transaction per connection at a time. Transactions go through a per-connection serial queue.

In SQLite, a database connection is needed in order to do pretty much anything. That is, an instance of sqlite3 *. A YapDatabaseConnection is essentially the equivalent, and in fact each connection contains such an ivar.

The YapDatabaseConnection also provides a variety of performance optimizations under the hood. For example, to execute an SQL query, the query string must first be compiled into a byte-code program by SQLite. (That is, to go from a string such as "SELECT object FROM database WHERE key = ?" into an executable routine.) This is done once by the connection (if needed), and then stored in ivars for re-use. You can find several such "compiled statements" (as they're called in sqlite) by searching for sqlite3_stmt *. Each such ivar is private, and has an associated internal method which transactions use to fetch that statement. Thus the statements are created on-demand, and then cached.

Additionally, YapDatabaseConnection provides its own internal cache. Caching is an extremely important performance optimization. That is, if you invoke objectForKey: multiple times with the same object, the cache can provide the result without hitting the disk or incurring the deserialization cost. (SQLite provides its own internal cache as well, which helps speed up performance when a trip to the database is needed.) For more information about the built-in cache, see the Cache wiki article.

Transactions

The transaction model is an important part of YapDatabase. SQLite itself is also transactional. That is, within a transaction one has a consistent view of the data. If you start a read-only transaction, then even concurrent modifications to the database by other connections won't change the consistent state of the in-progress read-only transaction.

Transactions are implemented using a block-based architecture:

[databaseConnection readWithBlock:^(YapDatabaseReadTransaction *transaction){
    // Any number of read operations permitted here
    object = [transaction objectForKey:key];
}];

[databaseConnection readWriteWithBlock:^(YapDatabaseReadWriteTransaction *transaction){
    // Any number of read and/or write operations permitted here
    [transaction setObject:updatedObject forKey:key];
}];

As you can see, the API for actually accessing or modifying the database is within the Transaction class. Internally, the transaction object itself is extremely lightweight. It consists of primarily an internal reference to its parent connection, and it is within the connection that it stores all its state.

This architecture is optimal as transactions come and go quickly, whereas connections are generally long-lived. Furthermore it allows for separation of code. The implementation of the connection object is focused on long-lived state and concurrency problems, whereas the implementation of the transaction object is focused on database access and updates.

Concurrency issues

One of the main problem sets that YapDatabase solves is the issue of concurrency. That is, if there are multiple connections, and any one of them may change the database, how do all the other connections ensure that their cache is consistent with what's on disk? YapDatabase solves this with the use of "snapshots & changesets".

  • A "snapshot" is a 64-bit number that is incremented with every write transaction. (It's only incremented if the write transaction actually made changes. Further, it is reset every time YapDatabase is initialized.)
  • A "changeset" is a data structure that allows other connections to see what changed in a write transaction. (It might contain a set of keys that were removed, or the updates made to various rows.)

The two are tied together. So if a connection is on, say, snapshot #29, then it can inspect the changeset for snapshot #30, and make any needed changes to its cache(s) in order to get up to speed.

When a write transaction finishes, the changeset is automatically forwarded to all sibling connections. The sibling connection processes the changeset as soon as it can, and then updates its snapshot number accordingly.

This is the idea in concept, put rather simply. But in practice there are a few race conditions.

Imagine there are two connections. Connection A is in the middle of its commit operation. Connection B is just starting its transaction. Will connection B see the changes from connection A? Or will its transaction predate the commit? It's a difficult timing issue. But YapDatabase can handle it just fine.

Every write transaction (that actually makes changes) increments the snapshot number, and writes it to the database. (It writes it to a reserved table named "yap". The main database table is named "database".) Furthermore, just before the write operation begins its sqlite commit operation, it preregisters the changeset with YapDatabase.

Other connections are aware of the situation. So if they start their transaction while a commit may be going on, then they start by reading the snapshot number from disk. If it doesn't match their state, then they simply fetch the pre-registered changeset from YapDatabse and process it before continuing.

The "snapshot and changeset" architecture is implemented in YapAbstractDatabase, which is the base class of YapDatabase. A snapshot is simply a uint64_t. And a changeset is just a dictionary. The contents of the changeset itself are largely unknown to YapAbstractDatabase. This allows YapDatabase to separately handle the changeset, however it works best for them. Furthermore, extensions can also take advantage of the changeset. (There are reserved top-level keys so databases and extensions can share the changeset dictionary.)

The creation of changesets, and changeset processing is the domain of connections. For more information see the following methods in YapDatabaseConnection:

- (NSMutableDictionary *)changeset
- (void)processChangeset:(NSDictionary *)changeset