Skip to content
Stephen Dolan edited this page Sep 7, 2018 · 2 revisions

At all times, every word on the major heap is one one of the states MARKED, UNMARKED, GARBAGE or FREE. During a GC cyle, the marker turns blocks from UNMARKED to MARKED, the sweeper turns blocks from GARBAGE to FREE, and the allocator turns blocks from FREE to MARKED.

Write unmarked[n], marked[n], garbage[n], free[n] for the number of words with the given status at the start of major GC cycle n. Write unmarked'[n], marked'[n], garbage'[n], free'[n] for the number of words with the given status at the end of major GC cycle n.

At the end of a GC cycle, all garbage has been collected (and so become free), so garbage'[n] = 0.

GC cycling means that:

unmarked[n+1] = marked'[n]
garbage[n+1] = unmarked'[n+1]
marked[n+1] = 0
free[n+1] = free'[n]

Write live[n] for the number of words reachable from the roots at the start of cycle n. Write alloc[n] for the number of words allocated on the major heap during cycle n.

Marking operates on the new allocations and the blocks not collected in the previous cycle, giving:

marked'[n] + unmarked'[n] = unmarked[n] + alloc[n]

Since we have a snapshot-at-the-beginning (Yuasa-barrier) collector, the blocks that end up marked are those that were live, plus any new allocations (which are always marked), so we can split the above into:

marked'[n] = live[n] + alloc[n]
unmarked'[n] = unmarked[n] - live[n]

The blocks that are allocated (and unmarked) at the start of the cycle are those that are live, as well as any that became dead during the last cycle but have not been collected yet. The former will be marked during the current cycle, but the latter will not.

The statistics computed by the allocator will tell us, at the end of every GC cycle, how many words are in the heap and how many of those are in use. That is, we get (since garbage'[n] = 0):

words_in_heap'[n] = free'[n] + marked'[n] + unmarked'[n]
words_in_use'[n] = marked'[n] + unmarked'[n]

Combining these, we get:

words_in_use'[n] = marked'[n] + unmarked'[n]
                 = unmarked[n] + alloc[n]
                 = marked'[n-1] + alloc[n]
                 = live[n-1] + alloc[n-1] + alloc[n]

That is, at the end of cycle n the words in use are whatever was live at the start of cycle (n-1) (which is now MARKED, if was still alive at the start of cycle n, or UNMARKED, if it died during cycle (n-1)), plus whatever was allocated during cycle (n-1) (ditto), plus whatever was allocated during cycle n (which is unconditionally MARKED).