1 Introduction
-
To the best of our knowledge, this paper presents the first completed formal verification of a generational garbage collector. However, it seems that the CertiCoq project [1] is in the process of verifying a generational garbage collector.
-
We present a pragmatic approach to dealing with mutable state, such as ML-style references and arrays, in the context of implementation and verification of a generational garbage collector. Mutable state adds a layer of complexity since generational collectors need to treat pointers from old data to new data with special care. The CertiCoq project does not include mutable data, i.e. their setting is simpler than ours in this respect.
-
We describe how the generational algorithm can be verified separately from the concrete implementation. Furthermore, we show how the proof can be structured so that it follows the intuition of informal explanations of the form: a partial collection cycle in a generational collector is the same as running a full collection on part of the heap if one views pointers to old data as non-pointers.
-
This paper provides more detail than any previous CakeML publication on how algorithm-level proofs can be used to write and verify concrete implementations of garbage collectors for CakeML, and how these are integrated into the full CakeML compiler and runtime. The updated in-logic bootstrapped compiler comes with new command-line arguments that allow configuration of the generational garbage collector.
2 Approach
-
The intuition behind the copying garbage collection is important in order to understand this paper. Section 3.1 provides an explanation of the basic Cheney copying collector algorithm. Section 3.2 continues with how the basic algorithm can be modified to run as a generational collector. It also describes how we deal with mutable state such as ML-style references and arrays.
-
Section 3.3 describes how the algorithm has been modelled as HOL functions. These algorithm-level HOL functions model memory abstractly, in particular we use HOL lists to represent heap segments. This representation neatly allows us to avoid awkward reasoning about potential overlap between memory segments in the algorithm-level proofs. It also works well with the separation logic we use later to map the abstract heaps to their concrete memory representations, in Sect. 4.2.
-
Section 3.4 defines the main correctness property, \(\textsf {gc\_related}\), that any garbage collector must satisfy: for every pointer traversal that exists in the original heap from some root (i.e. program variable), there must be a similar pointer traversal possible in the new heap.
-
A generational collector can run either a partial collection, which collects only some part of the heap, or a full collection of the entire heap. We show that the full collection satisfies \(\textsf {{gc\_related}}\). To show that a run of the partial collector also satisfies \(\textsf {{gc\_related}}\), we exploit a simulation argument that allows us to reuse the proofs for the full collector. Intuitively, a run of the partial collector on a heap segment h simulates a run of the full collector on a heap containing only h. Section 3.4 provides some details on this.
-
The CakeML compiler goes through several intermediate languages on the way from source syntax to machine code. The garbage collector is introduced gradually in the intermediate languages DataLang (abstract data), WordLang (machine words, concrete memory, but abstract stack) and StackLang (more concrete stack).
-
The verification of the compiler phase from DataLang to WordLang specifies how abstract values of DataLang are mapped to instantiations of the heap types that the algorithm-level garbage collection operates over, Sect. 4.1. We prove that \(\textsf {{gc\_related}}\) implies that from DataLang’s point of view, nothing changes when a garbage collector is run.
-
For the verification of the DataLang to WordLang compiler, we also specify how each instantiation of the algorithm-level heap types maps into WordLang’s concrete machine words and memory, Sect. 4.2. Here we implement and verify a shallow embedding of the garbage collection algorithm. This shallow embedding is used as a primitive by the semantics of WordLang.
-
Further down in the compiler, the garbage collection primitive needs to be implemented by a deep embedding that can be compiled with the rest of the code. This happens in StackLang, where a compiler phase attaches an implementation of the garbage collector to the currently compiled program and replaces all occurrences of \(\textsf {{Alloc}}\) by a call to the new routine. Implementing the collector in StackLang is tedious because StackLang is very low- level—it comes after instruction selection and register allocation. However, the verification proof is relatively straight-forward since the proof only needs to show that the StackLang deep embedding computes the same function as the shallow embedding mentioned above.
-
Finally, the CakeML compiler’s in-logic bootstrap needs updating to work with the new garbage collection algorithm. The bootstrap process itself does not need much updating, illustrating the resilience of the bootstrapping procedure to such changes. We extend the bootstrapped compiler to recognise command-line options specifying which garbage collector is to be generated:
--gc=none
for no garbage collector;--gc=simple
for the previous non-generational copying collector; and--gc=gen
size for the generational collector described in the present paper. Here size is the size of the nursery generation in number of machine words. With these command-line options, users can generate a binary with a specific instance of the garbage collector installed.
compiler/backend/gc
; the first implementation at the word-level, i.e. the shallow embedding, is in compiler/backend/proofs/word_gcFunctionsScript.sml
and its verification is in compiler/backend/proofs/data_to_word_gcProofScript.sml
; the StackLang deep embedding is in compiler/backend/stack_allocScript.sml
; its verification is in compiler/backend/proofs/stack_allocProofScript.sml
.3 Algorithm Modelling and Verification
3.1 Intuition for Basic Algorithm
3.2 Intuition for Generational Algorithm
3.3 Formalisation
3.4 Verification
3.5 Combining the Partial and Full Collectors
4 Implementation and Integration into the CakeML Compiler
4.1 Representing Values in the Abstract Heap
[]
, NONE
, ()
) are represented in a word wrapped as \(\textsf {{Data}}\). All other \(\textsf {{Block}}\) values are represented as \(\textsf {{DataElement}}\)s that carry the name \( n \) of the constructor that it represents. Constructor names are numbers at this stage of compilation.4.2 Data Refinement Down to Concrete Memory
*
, and use \(\textsf {{fun2set}}\) to turn a memory function \( m \) and its domain set \( dm \) into something we can write separation logic assertions about. The relevant definitions are:
4.3 Implementing the Garbage Collector
BYTES_IN_WORD
is a target-specific constant specifying the number of bytes in a machine word: for 32-bit architectures it is 4 and 64-bit architectures it is 8. In HOL, the corresponding constant, \(\textsf {{bytes\_in\_word}}\), is a constant whose value depends on its type, e.g. \(\textsf {{bytes\_in\_word}}\):\(\textsf {32\;\textsf {word}}\)\(=\) 4 and \(\textsf {{bytes\_in\_word}}\):\(\textsf {64\;\textsf {word}}\)\(=\) 8. The actual deep embedding as StackLang code is the following: Here \(0w\) denotes the machine word where all bits are 0, and \(1w\) the machine word where all bits are 0 save for the LSB.
5 Timing the GC
imp-for
where the generational GC performs worse by two orders of magnitude, to smith-normal-form
where the generational GC performs approximately 4 times better.
copying
|
no-gen
|
small-gen
|
large-gen
| |
---|---|---|---|---|
merge
| 7520 (63%) | 6830 (61%) | 11,300 (74%) | 6700 (62%) |
imp-for
|
8.7 (0.08%)
| 9.7 (0.08%) | 5810 (34%) | 2360 (17%) |
life
| 12.5 (0.08%) | 13 (0.08%) | 33 (0.2%) |
17 (0.1%)
|
pidigits
|
10 (0.09%)
| 11 (0.1%) | 75 (0.7%) | 37 (0.34%) |
logic
| 7570 (33%) |
5700 (27%)
| 10,300 (41%) | 5700 (28%) |
mpuz
|
50.4 (0.37%)
| 56.3 (0.41%) | 98.3 (0.74%) | 51.6 (0.39%) |
smith-normal-form
| 69.0 (0.47%) | 69.0 (0.47%) | 22.4 (0.15%) |
19.6 (0.13%)
|
imp-for
consists of 7 nested for-style loops that allocate around 100 million references in total, and uses very little immutable data. That a program with such an allocation pattern does not benefit from our generational garbage collection scheme is hardly surprising: when most data is mutable, running collection cycles on only the immutable data is a waste of time. The other example where generational collection underperforms is pidigits
, which is based on an encoding of lazy lists as a function for producing the tail; more research is needed to determine why the generational GC would be bad for such an application.smith-normal-form
has several features that seem a good fit for generational collection. It uses references, albeit more sparingly than imp-for
: it represents \(35\texttt {*}35\) matrices as int array
s, so collecting the references themselves would be of relatively little use. The fact that the references point at integers that change frequently means high turnover of data and a small footprint of live data at all times. Hence partial cycles are likely to be quick (not much more to see after following each reference to an integer) and free up plenty of space (1225 integers is very little live data).
copying
|
no-gen
|
small-gen
|
large-gen
| |
---|---|---|---|---|
depth5
| 1840 (21%) | 1920 (22%) | 30.8 (0.46%) |
16.9 (0.25%)
|
depth6
| 1810 (21%) | 1930 (22%) | 39.4 (0.59%) |
20.5 (0.31%)
|
depth7
| 1820 (21%) | 1930 (22%) | 54.5 (0.81%) |
29.0 (0.44%)
|
depth8
| 1820 (21%) | 1910 (22%) | 86.5 (1.29%) |
45.4 (0.68%)
|
depth9
| 1820 (21%) | 1920 (22%) | 146 (2.1%) |
78.5 (1.2%)
|
depth10
| 1820 (21%) | 1910 (22%) | 266 (3.9%) |
145 (2.1%)
|
depth11
| 1820 (21%) | 1910 (22%) | 504 (7.1%) |
278 (4.0%)
|
depth12
| 1980 (23%) | 2090 (24%) | 988 (13%) |
543 (7.6%)
|
depth13
| 1970 (23%) | 2090 (24%) | 2020 (23%) |
1130 (14%)
|
depth14
|
1970 (23%)
| 2080 (24%) | 4020 (38%) | 2270 (25%) |
depth15
|
1970 (22%)
| 2080 (24%) | 7630 (53%) | 4970 (43%) |
depth16
|
6420 (49%)
| 6470 (49%) | 12,600 (65%) | 10,600 (61%) |