Skip to content

Garbage collector invariants

Stephen Dolan edited this page Jul 30, 2015 · 4 revisions

Garbage collectors work by maintaining quite subtle invariants about how objects are traced and marked. These can be quite difficult to understand by reading the implementation, so this page explains the multicore OCaml collector's invariants and where they come from.

GC Basics

The roots are the values that can be directly accessed by the program: global variables and values on the currently-executing stack. Each domain sees a slightly different set of roots: all domains see the global roots, yet each domain only sees its own stack roots.

A value is live if it is one of the roots, or pointed to by some other live value. Only live values can ever be used again.

The job of the GC is to make the memory occupied by dead (non-live) values free for re-use. It's OK if the GC fails to immediately free memory for a dead value (as long as it is eventually freed), but it must never free memory for a live value.

The GC divides the heap into the major (or shared) heap, and the minor heaps. There is one minor heap per domain. First, here's how the minor heap works (it's simpler than the major).

Generational GC

Most objects are not used for very long. So, we allocate objects at first in a region called the minor heap, using a stupid-but-fast allocator (pointer subtraction). When the minor heap fills, we copy every live object in the minor heap to the major heap.

Since most values in the minor heap are not live, this saves a lot of work by making the effective allocation rate in the major heap much lower. The minor heap acts as a filter, ensuring we only use the machinery of the major heap for the small proportion of values which survive for at least a short while.

Determining exactly which values in the minor heap are live would take far too long. Instead, we over-approximate: we consider as live any root, anything pointed to by a live minor value, and anything pointed to by any major value (live or not).

To determine which minor values are pointed to by major values, we use the remembered set, which is a growable array of pointers (caml_remembered_set). Pointers can only be made from the major heap to the minor heap by mutating some object already in the major heap to point to something in the minor heap. So, we detect this during caml_modify_field and record the pointer in caml_remembered_set (see shared_heap_write_barrier in memory.c)

The collector used to clear out the minor heap is almost identical to that used in stock OCaml.

GC cycles

Collecting the major heap is a more complicated matter. The major heap is too big to collect all at once without introducing a long pause, and the major heap is operated upon by several domains concurrently (unlike the minor heaps, which are private to a particular domain).

The major GC operates in cycles. During a cycle, the GC figures out which memory can be safely reused, and allocations are performed using memory known to be reusable from the previous cycle. Once the cycle completes, a new batch of memory is available for allocations.

All values on the major heap are in one of three states: MARKED, UNMARKED or GARBAGE, labelled by a two-bit field in the value's header word (there is a fourth state NOT_MARKABLE for things on the heap that the GC should ignore entirely, but it's not relevant here). A GC cycle completes when the following three conditions hold:

  1. All roots of all domains are MARKED.
  2. No MARKED value points to an UNMARKED value.
  3. There are no GARBAGE values.

These conditions imply that all UNMARKED objects are dead. When this happens, all domains synchronise and the heap is cycled (see caml_cycle_heap_stw in shared_heap.c), which means:

  • All MARKED objects become UNMARKED
  • All UNMARKED objects become GARBAGE

Rather than iterating over the heap, this is done by relabelling: the bit-pattern previously used for MARKED is now used for UNMARKED, that used for UNMARKED is now used for GARBAGE, and that used for GARBAGE is now used for MARKED. (This technique is from the VCGC collector).

Enforcing condition #3 (that there are no GARBAGE objects) is the job of the sweeper, which finds GARBAGE objects and makes them available for allocation by adding them to freelists. The sweeper is described elsewhere. Here, we're concerned with enforcing conditions #1 and #2.

A very simple GC could enforce conditions #1 and #2 directly, by stopping all domains and marking objects until the conditions become true. Such a collector causes truly obscene pauses, so we complicate matters in the name of performance by relaxing conditions #1 and #2.

Relaxing condition #1 is easy: we can do a final pass of the roots just before completing a GC cycle to restore it. Condition #2 is more complicated.

Incremental GC

Marking all values at once takes far too long, so we break up the work into smaller units. Each domain maintains a mark stack of values for which marking is in-progress. Values on the mark stack have been marked, but may still point to unmarked values. So, we weaken condition #2 above to the following invariant:

2b. All MARKED values which point to UNMARKED values are on the mark stack of at least one domain.

To maintain this invariant, when a value on the major heap is mutated, the new pointee is marked and placed on the mark stack of the domain that did the mutation. This is called darkening the pointee, and is done by caml_darken in major_gc.c. Maintaining invariant #2b in this way is known as the Dijkstra barrier.

This specifies "at least one" instead of "exactly one" domain, so it's possible for an object to be marked twice. This risks a small amount of duplicate work, but won't cause problems since marking is an idempotent operation (it amounts to changing some bits in the header from UNMARKED to MARKED). By allowing this duplicate work, we avoid needing mutual exclusion between domains marking the same object, and so can use unsynchronised operations to do the actual marking. See caml_mark_object in shared_heap.c.

When it is time to complete a GC cycle, each domain first ensures that all of its roots are MARKED (ensuring condition #1). Then, each domain marks objects on its mark stack until its mark stack is empty. Once all domains have empty mark stacks, the original condition #2 holds and the GC cycle completes.

Generally, only a small amount of marking is done at the end of a GC cycle, since most objects will have been marked during the cycle. The above process is more to verify that marking has completed, rather than to do a significant amount of marking.

Right now, domains do not load-balance marking. They should.

Multiple stacks

As well as heavyweight domains corresponding to system threads, multicore OCaml supports lightweight fibers. Fibers have their own stacks, and are subject to garbage collection.

Unlike most other data in OCaml, the stack is continually modified. Every function call pushes and pops values (local variables) to the stack. The write barrier described above, which checks for MARKED to UNMARKED pointers and major to minor pointers is far too expensive to run every time a local variable is initialised.

So, we temporarily suspend invariant #2b for stacks. Stacks may point to UNMARKED values, even though the stack itself is MARKED.

For the currently-running stack, this poses no problem. The values pointed to by the currently-running stack are roots, and so are marked at the end of a GC cycle, restoring invariant #2b.

However, stacks that are not currently running may violate #2b. We call any stack which may violate this invariant dirty. Each domain keeps track of its set of dirty stacks. So, invariant #2b is relaxed to the following:

2c. All MARKED values which point to UNMARKED values are either:

  • on the mark stack of at least one domain, or
  • dirty stacks

Whenever a domain context-switches from one stack to another, if the old stack is on the major heap then it is dirtied. This maintains invariant #2c without needing to instrument accesses to local variables.

Before a GC cycle completes, each domain cleans its dirty stacks, by darkening every value pointed to by one of its dirty stacks. When every domain has no dirty stacks, invariant #2b holds and GC can proceed as before.

Strictly speaking, it is only necessary for a domain to clean its dirty stacks right before the GC cycle completes. Domains actually do this at every minor GC, for several reasons: to prevent the set of dirty stacks growing large, because recently-dirtied stacks are likely in cache and cheap to scan, and to minimise time spent at the completion of a GC cycle.