-
Notifications
You must be signed in to change notification settings - Fork 68
Garbage collector invariants
Garbage collectors work by maintaining quite subtle invariants about how objects are traced and marked. These can be quite difficult to understand by reading the implementation, so this page explains the multicore OCaml collector's invariants and where they come from.
The roots are the values that can be directly accessed by the program: global variables and values on the currently-executing stack. Each domain sees a slightly different set of roots: all domains see the global roots, yet each domain only sees its own stack roots.
A value is live if it is one of the roots, or pointed to by some other live value. Only live values can ever be used again.
The job of the GC is to make the memory occupied by dead (non-live) values free for re-use. It's OK if the GC fails to immediately free memory for a dead value (as long as it is eventually freed), but it must never free memory for a live value.
The GC divides the heap into the major (or shared) heap, and the minor heaps. There is one minor heap per domain. First, here's how the minor heap works (it's simpler than the major).
Most objects are not used for very long. So, we allocate objects at first in a region called the minor heap, using a stupid-but-fast allocator (pointer subtraction). When the minor heap fills, we copy every live object in the minor heap to the major heap.
Since most values in the minor heap are not live, this saves a lot of work by making the effective allocation rate in the major heap much lower. The minor heap acts as a filter, ensuring we only use the machinery of the major heap for the small proportion of values which survive for at least a short while.
Determining exactly which values in the minor heap are live would take far too long. Instead, we over-approximate: we consider as live any root, anything pointed to by a live minor value, and anything pointed to by any major value (live or not).
To determine which minor values are pointed to by major values, we use the remembered set, which is a growable array of pointers (caml_remembered_set
). Pointers can only be made from the major heap to the minor heap by mutating some object already in the major heap to point to something in the minor heap. So, we detect this during caml_modify_field
and record the pointer in caml_remembered_set
(see shared_heap_write_barrier
in memory.c
)
The collector used to clear out the minor heap is almost identical to that used in stock OCaml.
Collecting the major heap is a more complicated matter. The major heap is too big to collect all at once without introducing a long pause, and the major heap is operated upon by several domains concurrently (unlike the minor heaps, which are private to a particular domain).
The major GC operates in cycles. During a cycle, the GC figures out which memory can be safely reused, and allocations are performed using memory known to be reusable from the previous cycle. Once the cycle completes, a new batch of memory is available for allocations.
All values on the major heap are in one of three states: MARKED
, UNMARKED
or GARBAGE
, labelled by a two-bit field in the value's header word (there is a fourth state NOT_MARKABLE
for things on the heap that the GC should ignore entirely, but it's not relevant here). A GC cycle completes when the following three conditions hold:
- All roots of all domains are
MARKED
. - No
MARKED
value points to anUNMARKED
value. - There are no
GARBAGE
values.
These conditions imply that all UNMARKED
objects are dead. When this happens, all domains synchronise and the heap is cycled (see caml_cycle_heap_stw
in shared_heap.c
), which means:
- All
MARKED
objects becomeUNMARKED
- All
UNMARKED
objects becomeGARBAGE
Rather than iterating over the heap, this is done by relabelling: the bit-pattern previously used for MARKED
is now used for UNMARKED
, that used for UNMARKED
is now used for GARBAGE
, and that used for GARBAGE
is now used for MARKED
. (This technique is from the VCGC collector).
Enforcing condition #3 (that there are no GARBAGE
objects) is the job of the sweeper, which finds GARBAGE
objects and makes them available for allocation by adding them to freelists. The sweeper is described elsewhere. Here, we're concerned with enforcing conditions #1 and #2.
A very simple GC could enforce conditions #1 and #2 directly, by stopping all domains and marking objects until the conditions become true. Such a collector causes truly obscene pauses, so we complicate matters in the name of performance by relaxing conditions #1 and #2.
Relaxing condition #1 is easy: we can do a final pass of the roots just before completing a GC cycle to restore it. Condition #2 is more complicated.
Marking all values at once takes far too long, so we break up the work into smaller units. Each domain maintains a mark stack of values for which marking is in-progress. Values on the mark stack have been marked, but may still point to unmarked values. So, we weaken condition #2 above to the following invariant:
2b. All MARKED
values which point to UNMARKED
values are on the mark stack of at least one domain.
To maintain this invariant, when a value on the major heap is mutated, the new pointee is marked and placed on the mark stack of the domain that did the mutation. This is called darkening the pointee, and is done by caml_darken
in major_gc.c
. Maintaining invariant #2b in this way is known as the Dijkstra barrier.
This specifies "at least one" instead of "exactly one" domain, so it's possible for an object to be marked twice. This risks a small amount of duplicate work, but won't cause problems since marking is an idempotent operation (it amounts to changing some bits in the header from UNMARKED
to MARKED
). By allowing this duplicate work, we avoid needing mutual exclusion between domains marking the same object, and so can use unsynchronised operations to do the actual marking. See caml_mark_object
in shared_heap.c
.
When it is time to complete a GC cycle, each domain first ensures that all of its roots are MARKED
(ensuring condition #1). Then, each domain marks objects on its mark stack until its mark stack is empty. Once all domains have empty mark stacks, the original condition #2 holds and the GC cycle completes.
Generally, only a small amount of marking is done at the end of a GC cycle, since most objects will have been marked during the cycle. The above process is more to verify that marking has completed, rather than to do a significant amount of marking.
Right now, domains do not load-balance marking. They should.
As well as heavyweight domains corresponding to system threads, multicore OCaml supports lightweight fibers. Fibers have their own stacks, and are subject to garbage collection.
Unlike most other data in OCaml, the stack is continually modified. Every function call pushes and pops values (local variables) to the stack. The write barrier described above, which checks for MARKED
to UNMARKED
pointers and major to minor pointers is far too expensive to run every time a local variable is initialised.
So, we temporarily suspend invariant #2b for stacks. Stacks may point to UNMARKED
values, even though the stack itself is MARKED
.
For the currently-running stack, this poses no problem. The values pointed to by the currently-running stack are roots, and so are marked at the end of a GC cycle, restoring invariant #2b.
However, stacks that are not currently running may violate #2b. We call any stack which may violate this invariant dirty. Each domain keeps track of its set of dirty stacks. So, invariant #2b is relaxed to the following:
2c. All MARKED
values which point to UNMARKED
values are either:
- on the mark stack of at least one domain, or
- dirty stacks
Whenever a domain context-switches from one stack to another, if the old stack is on the major heap then it is dirtied. This maintains invariant #2c without needing to instrument accesses to local variables.
Before a GC cycle completes, each domain cleans its dirty stacks, by darkening every value pointed to by one of its dirty stacks. When every domain has no dirty stacks, invariant #2b holds and GC can proceed as before.
Strictly speaking, it is only necessary for a domain to clean its dirty stacks right before the GC cycle completes. Domains actually do this at every minor GC, for several reasons: to prevent the set of dirty stacks growing large, because recently-dirtied stacks are likely in cache and cheap to scan, and to minimise time spent at the completion of a GC cycle.