Skip to content

Live Compaction

Jon Moore edited this page Jul 31, 2014 · 13 revisions

In order to use live compaction in Sirius, you must choose a SiriusLog implementation that supports it. Currently, SegmentedUberStore (Set the config property SiriusConfiguration.LOG_VERSION_ID to "SEGMENTED_1.0") is the only SiriusLog that supports live compaction.

A SegmentedUberStore is very similar to a traditional UberStore, but instead of a single large data/index pair, the backend is split into segments. There is only one segment ever open for writing (called active), and when it reaches the maximum number of events, specified by SiriusConfiguration.LOG_EVENTS_PER_SEGMENT, a new segment is opened. A newly-split segment has a numerical "name", which is the previous segment's name + 1. These names monotonically increase.

The key idea in live compaction is this: since events in the log are idempotent, an event in the WAL overrides any number of previous events with the same key. Thus, in a given segment, all keys necessarily override any matching keys from previous segments. To compact, we will "apply" keys from segments to previous segments – removing the matching events from the previous segments.

The list of segments are divided into two parts:

- inactive, which consist of the read-only files from closed segments
- active, which is where new writes are applied. There exists only one active segment at a time

Compaction consists of:

- for each inactive segment S, where S.keys-applied == false:
  - gather the keys(S)
  - for each inactive segment S', where S'.name < S.name:
    - build S'.compacting, which consists of all events in S' whose keys
      were not found in keys(S)
  - for each inactive segment S', where there exists S'.compacting
    - replace S' with S'.compacting
  - mark S.keys-applied = true

There is no internal compaction of a segment. This may be a future enhancement.

####Segment Merging Merging is the process where adjacent compacted segments are combined. When two adjacent segments have an aggregate size less than the SiriusConfiguration.LOG_EVENTS_PER_SEGMENT, and both have had their keys applied in compaction, they can be merged into one. After a round of compaction has finished, a round of merges will be triggered. The time to do this will be included in the "compaction time" metric. It shouldn't be much in the general case.

Configuring Live Compaction

Live compaction can be enabled by setting the SiriusConfiguration.COMPACTION_SCHEDULE_MINS parameter to a non-zero value. This parameter specifies the number of minutes between compaction calls. It is set to 0 by default, which effectively turns compaction off. You should also choose a segment size (see below) and configure it with the SiriusConfiguration.LOG_EVENTS_PER_SEGMENT parameter.

Choosing a Segment Size

There are a number of factors that go into this decision, but first and foremost, you want your choice to satisfy the following statement:

The system should be able to apply the keys of a newly-closed segment in compaction FASTER than a newly-opened segment fills up with new events. In other words, you should choose a segment size s.t. compaction can keep up with updates.

This depends on a few factors: the ID space of your data, the throughput of updates, and the speed to process events in compaction. Given the following definitions:

S = max events per segment
N = number of unique keys in WAL
r(w) = rate of writes into system (events/sec)
r(c) = rate of compaction in system: Roughly, speed at which events can be read and then re-written to disk (events/sec).

You must ensure that:

S > (N * r(w)) / r(c)

For example, consider an application with the following characteristics:

N = 50 million unique keys
r(w) = 250 events/sec # experimentally, this is near the top-end we've seen of sirius throughput for a 3-node system
r(c) = 35,000 events/sec # estimated from 2KB/event, 70MB/sec spinning disk throughput; expect much higher from SSDs or server-grade hardware

S > (50M * 250) / 35K =~ 360K

Max Segment Size needs to be at least 360,000 or so.

Other Factors

Other factors in determining segment size:

  • Heap usage of compaction increases with Segment size. The exact amount will depend on your data set, but we've seen around 200MB per million events in a Segment. So, 2M will net you around 400MB of extra heap usage during compaction.
  • The higher the segment size, the higher the possibility for duplicate keys within a segment. Since internal compaction of segments is not implemented, these will not be compacted away until a future segment applies this key in compaction. Depending on your update patterns, this may be worth noting.
  • The higher the segment size, the quicker the original compaction step will be. This is a one-time cost, so it's probably not a good reason to choose any particular value, but it's worth knowing. The original compaction should really take along the lines of O(N * N/S) time, which is really O(N^2). Expect it to take a while.

Other factors in determining compaction schedule:

  • if a compaction is currently running and another one fires off, it will have no effect. It is safe.
  • if a compaction fires off and no new segment has been closed since the last compaction run, it will have no effect. It is safe.
  • compaction will use memory according to the segment size and is not very processor-intensive. It should be safe to run at any point.
  • since compaction is generally very safe, a reasonable compaction schedule is probably on the order of 15 minutes rather than 6 hours.