Skip to content

Major heap allocation and sweeping

Stephen Dolan edited this page Jul 30, 2015 · 1 revision

This page describes how the allocator and sweeper for the major heap works. Allocation and sweeping are very much intertwined in the design of multicore OCaml, so are best described together.

Design and goals

Since all values allocated in OCaml are immediately initialised, a performance goal for the allocator is to make the cost of allocation roughly proportional to the cost of initialisation. In other words, very small objects (lists, options, etc.) must be allocated quickly, while it is acceptable for larger allocations to be slower.

Allocation is very, very frequent in OCaml, and would easily become a bottleneck if it involved heavy contention. So, it is important that allocations by one domain synchronise with other domains as little as possible.

Stock OCaml does not do incremental compaction, and neither does the multicore GC. Stock OCaml supports periodic stop-the-world compaction, which could be (but currently isn't) implemented for multicore. So, it is important that the allocator avoid fragmentation as much as possible.

Heap layout

The allocator is based roughly on the Streamflow allocator. Allocations are divided into large and small allocations. Any allocation of up to 128 words is considered small (that's 1KiB on 64-bit systems, and 512B on 32-bit ones).

Small allocations are divided into sizeclasses in such a way that any small allocation fits into some sizeclass with at most 10% wasted space. The optimal sizeclasses are determined by gen_sizeclasses.ml.

For small allocations, memory is allocated in fixed-size pools of 4096 words (32/16 KiB, according to wordsize). Each pool is carved up into equal-sized slots, and allocates values of only one sizeclass.

Pools which contain at least one value are owned by a particular domain, and only the owning domain may allocate from a given pool. When a pool becomes empty, it is returned to a central, shared list of free pools (although each domain maintains a small cache of free pools to cut down on synchronisation).

Allocation and sweeping

Within a pool, the free slots are kept in a singly-linked list. To allocate a small object, the allocator first checks if it has a partially-full pool of the correct sizeclass. If so, the next free slot from this pool is returned, and otherwise a free pool is acquired and carved up into slots.

Using the next free slot in a partially-filled pool is very cheap, while acquiring a free pool is more expensive. There are roughly 4096/n slots in a pool of sizeclass n, and so a free pool must be acquired once every 4096/n allocations of a given sizeclass. So, the probability of hitting the slower path of allocation is linear in the size of the allocation, making the expected cost of allocation proportional to allocation size.

A pool is swept by iterating over each of its slots, and adding any slots containing GARBAGE values to the list of free slots. Pools may be swept in any order, but only by their owning domain.

For each sizeclass, the allocator maintains four linked lists of pools:

  • avail_pools: Partially filled, swept pools
  • full_pools: Fully filled, swept pools
  • unswept_avail_pools: Partially filled but not swept pools
  • unswept_full_pools: Fully filled but not swept pools

To allocate an object of a given sizeclass, the allocator must find a non-full pool of this sizeclass. First, it looks at avail_pools. If this is empty, it sweeps pools in unswept_avail_pools and unswept_full_pools, adding them to avail_pools and full_pools as appropriate. If this fails to produce a non-full pool, a new pool is acquired from the central store (or the OS) and carved up into appropriate-sized slots. Only this final case of acquiring a new pool requires any mutexes or other synchronisation.

So, instead of proceeding strictly in address order like many collectors (and stock OCaml), the multicore OCaml GC sweeps pools on-demand. When an allocation of 5 words is required, the allocator sweeps pools of 5-word objects in order to find a slot that can be reused.

As well as being driven by allocations, the sweeper can be invoked explicitly as caml_sweep. This is done to ensure that all GARBAGE values are swept before the completion of the GC cycle.

Large values

Large allocations (of over 128 words) are allocated using system malloc, with a small header. The header is used to place large allocations in a per-domain linked list, to allow sweeping and freeing. This is much more expensive than the pool allocator, but as described above performance is less critical for large allocations.