Skip to content

Sharing a cache between multiple databases

agorrod edited this page Apr 12, 2013 · 5 revisions

Motivation

If an application has multiple WiredTiger databases managed by a single process, and are not sure which of those databases will benefit most from a larger cache or those needing a large cache is likely to change over time.

Requirements

  • Configuring a cache pool that can be shared by multiple databases.
  • Managing the allocation of the shared pool inside the WiredTiger storage engine.
  • Support dynamically adding and removing database connections

Design

In the following discussion, a connection is used to refer to a WT_CONNECTION object opened via wiredtiger_open. It is analogous to a database in the above discussion.

API changes

Add the following new configuration option to wiredtiger_open: shared_cache=() The shared cache configuration consists of the following sub-settings: name= /* Use the shared cache with given name. / size= / Set the shared cache pool size. / chunk= / Set the granularity with which a connections cache size is adjusted. / reserve= / Set the minimum amount of cache a connection can be allocated. Can be different for each entry in the cache pool. */

The first connection will initialize the cache pool.

Implementation

See the source code for the implementation: https://github.com/wiredtiger/wiredtiger/blob/master/src/include/cache.h https://github.com/wiredtiger/wiredtiger/blob/master/src/conn/conn_cache_pool.c

The first connection to specify a shared cache setting will setup the WT_CACHE_POOL structure. Each time a new connection joins the shared cache, an entry will be added to the shared cache queue.

There is a utility function that balances the allocation of the shared resources across connections. It is expected that the balancing will be done regularly (though not necessarily often).

Adding a new connection will trigger a call to the balancing function.

A single balancing pass will:

  • Check for new connections - and try to ensure an initial cache size can be found for any new connection.
  • Check for and remove inactive connections, and return their resources to the pool.
  • Review the statistics in each of the active entries. Reallocate resources amongst the connections if required. The balancing utility will update the WT_CONNECTION_IMPL::cache_size, WT_CACHE::evict_trigger and WT_CACHE::evict_target settings in a connection - each connections eviction thread is responsible for honouring the allocated resources.

Open issues

When to run the balancing

There are three options for running the balancing:

  1. Create a new cache pool manager thread that belongs to the WT_CACHE_POOL (i.e outside the scope of any one connection). The sole job of the thread will be to manage the cache pool. It will be terminated when the last connection leaves the cache pool. The primary concern with this approach is that there is no obvious session handle to use for error and verbose message logging.

  2. Have the existing eviction threads balance the cache pool. The first connection in the cache pool would be the only one to do any actual work. A call to the cache pool balance function would be made every N iterations of the eviction server. The primary concerns with this approach are that it adds more work for the eviction thread to do, it will also be run more often if a connection has a hot cache (because the eviction thread will be woken often).

  3. Have each connection in the cache pool create a thread to manage the cache pool. Only a single thread will do the actual management, but another will be ready to take over if the connection owning the current manager closes. The issue with this approach is that we create another thread for each connection (each thread will own a session handle). Not sure if this is a real problem.

Solution 1) is implemented here: https://github.com/wiredtiger/wiredtiger/pull/376

Solution 2) is implemented here (as an extension to solution 1): https://github.com/wiredtiger/wiredtiger/pull/377

Solution 3) has not been implemented, but is a simple change based on 2.

Locking

We already have a process wide lock that is used to ensure only a single connection to a database is opened at any one time.

There are two choices:

  1. Use the process lock to control all access to the cache pool. Which can potentially block opening a connection.

  2. Use the process lock to control creating and deleting a cache pool, use a new lock to manage access/updates to the cache pool. Having two locks complicates the locking code, but means cache pool usage does not interfere with non-cache pool operations.

The current implementation uses the second option, but could be simplified.

Keith: The work done during balancing is minimal enough that I don't think it matters that a connection open would be blocked for such a short period of time. Regardless, I don't feel strongly.

Cache overdraft

At the moment we allocate cache to a connection if we can find another connection that we can take it from. We don't wait for a connection to reduce the amount of cache actually in use before allowing the other connection to begin using it's allocated chunk.

It is likely that there will be a period when more than the total cache pool is in use.

This could be a large overhead if many new connections are started in parallel.

We can add code to wait for a connection to give up it's share of the pool before proceeding, but that will block new connections - meaning it can take an indeterminate amount of time for a connection to open.

Keith: I think the chance of many new connections being started, in parallel, after the cache is already established and in use, are minimal -- I don't think we should worry about this until there is a work-load we're trying to handle that has a problem with the simpler solution.

Multiple cache structures:

Keith: I suggest we split the existing cache structure (btree/bt_cache.c, include/cache.h) into two parts, eviction information and cache information, and move the cache information into the cache pool structure, that way there's a single piece of cache information that may or may not be shared and/or participate in balancing, and a separate piece of eviction information that's owned and operated by the eviction thread. This adds some complexity (don't start the balance thread until there's a named pool), but it simplifies at least as many things as it complicates. Specifically, the cache balance thread "owns" the cache information, and the cache information isn't split between two structures. I would suggest we rename the WT_CACHE_POOL_ENTRY structure to be WT_CACHE, and the eviction information from WT_CACHE would become some new structure (WT_EVICT?) This isn't a big change, it's really just a bunch of data shuffling.

The only question, I think, is where the read-generation value belongs. Right now, read-generations are local to the database, so there's no relationship between read generation values in a shared cache -- I think that's correct, I'd leave it the way it is, but I wanted to raise it as a question.