Dieses Kapitel bietet eine eingehende Untersuchung dynamischer Verifikationstechniken für OCaml-Software, wobei der Schwerpunkt auf der Gospel-Spezifikationssprache und dem Ortac / QCheck-STM-Tool liegt. Es beginnt mit der Einführung von OCaml, einer Programmiersprache mit mehreren Paradigmen, die funktionale und zwingende Funktionen kombiniert und einzigartige Herausforderungen für das Testen und Verifizieren darstellt. Das Kapitel vertieft sich dann in das Gospel-Projekt, das OCaml mit einer Verhaltensspezifikationssprache ausstattet, und in Ortac, ein Framework zur automatischen Überprüfung der Laufzeitbehauptung. Der Kern des Kapitels ist dem QCheck-STM-Plugin gewidmet, das modellbasierte Zustandsmaschinentests in der Black Box ermöglicht. Die Implementierung von Ortac / QCheck-STM wird gründlich erklärt, mit einer detaillierten Vorgehensweise, wie es Code für das Testen veränderlicher Datenstrukturen generiert. Das Kapitel enthält auch eine praktische Demonstration einer einfachen Array-Bibliothek, die die Übersetzung von Spezifikationen und den generierten Code zeigt. Die Evaluierungsergebnisse unterstreichen die Effektivität des Tools beim Aufdecken von Fehlern und Inkonsistenzen in bestehenden OCaml-Bibliotheken. Das Kapitel schließt mit einer Diskussion über verwandte Arbeiten und zukünftige Richtungen, wobei das Potenzial von Ortac / QCheck-STM in kontinuierlichen Integrationspipelines und seine Rolle bei der Verifizierung von OCaml-Software hervorgehoben werden.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
This paper introduces the QCheck-STM plugin for Ortac, a framework for dynamic verification of OCaml code. Ortac/QCheck-STM consumes OCaml module signatures annotated with behavioural specification contracts expressed in the Gospel language, extracts a functional model of a mutable data structure from it, and generates code for automated runtime assertion checking. We report on the implementation of the tool, the structure of the generated code, and on errors found in established OCaml libraries.
1 Introduction
OCaml is an industrial strength, multi-paradigm programming language. While fundamentally functional at its core, OCaml includes many imperative features, such as references, mutable arrays, I/O, and exceptions. These pose unique challenges when trying to test and verify programs written in it.
While OCaml has been used as a platform for the implementation of various code analysis and verification tools, e.g., the interactive theorem prover Coq [44] and the C code analysis framework Frama-C [15], there is a lack of general purpose tools for the verification of OCaml programs themselves. In order to remedy this void, the Gospel project [9] equips OCaml with its own behavioural specification language.
Anzeige
The Gospel language is tool-agnostic, it only offers a way of expressing formal contracts, which can be leveraged by separate tools in order to perform analysis and verification tasks. Different such tools have been developed, including Cameleer [39], a deductive verification tool, and gospel2cfml1, a translator of annotated OCaml module signatures into separation logic terms embedded in Coq. In this paper, we focus on another tool consuming Gospel annotations called Ortac. Ortac provides a framework for automated runtime assertion checking, and is therefore a member of the family of dynamic verification tools. Ortac offers a modular architecture, where analysis and verification tasks are implemented as plugins. This paper highlights the QCheck-STM plugin, which focuses on black-box, model-based state-machine testing in the style of QuickCheck [3, 26].
Given its multi-paradigm nature, OCaml is naturally suited to a number of different verification strategies. While it is possible to test purely functional code with Ortac/QCheck-STM, its strength lies in the verification of specifications relating to mutable data structures. Therefore, this paper will put emphasis on such programs. It is customary to refer to such a data structure as the System Under Test (SUT), and the functions provided to work with it as its Application Programming Interface (API).
This paper provides an overview of how Ortac/QCheck-STM is implemented, and how it may be used as a dynamic verification tool. To do so, we demonstrate the translation of specifications for a simple array library. The OCaml interface for the library is introduced in Section 2, the Gospel contracts for it in Section 3. After a short overview of Ortac and its plugin structure in Section 4 we showcase the generated code for the array example in Section 5. In Section 6 we evaluate the approach and share examples of bugs found in existing OCaml libraries. Finally, we discuss related work in Section 7, before we remark on future work and conclude in Section 8.
Ortac is an open-source project and its source code is available from the following URL:
For illustration, we will test a library providing mutable arrays, which is an excerpt from OCaml’s standard library
module:
For those unfamiliar with the syntax of OCaml, the code above defines a
representing arrays of a parametric type
. In addition, this lists the type signatures of four OCaml functions. The first function
creates a fresh array from a given size and initialisation element. Function
accepts an array parameter and returns the size of it. Finally,
and
both take an array parameter and return and modify the element of an array at a given index, respectively.
3 Gospel by Example
In order to test the array library, the type and function signatures need to be annotated with Gospel specifications. To start, the type must be annotated with a model. In the tradition of other specification languages [6, 29], annotations are added as special comments:
An array can conceptually be thought of as a fixed capacity container. This logical model can be directly translated to Gospel by annotating
with two model fields, one for the immutable
and one for the mutable
of the array. The types
and
are part of the Gospel standard library and describe arbitrary precision integers and lists of values of type
, respectively.
The
function creates new array instances given a size and an initial element:
The
clause introduces a pre-condition that must hold at function entry. The two
clauses express that the resulting array has the expected size and that all entries are initialised to the given element
. In addition to a
clause, Gospel also offers a
clause. Unlike a
clause, with the above
clause the behaviour of
is well-defined in case the pre-state does not meet the condition, as it means that the function raises an
exception in that case. The function
is again part of the Gospel standard library.
The
function changes the value at a given position in the array:
Again, the function
is part of the Gospel standard library. The specification of
checks if the given index
is within the array bounds and modifies the contents of the argument array, which is indicated by the
clause. For each model field marked as modified, the user needs to provide a corresponding
clause specifying how to construct the modified model. Note that it is implicitly assumed that in case the
fails, the argument SUT remains unchanged. It is possible to define custom exceptions and give equations for the model state after such an exception has been raised. For further information about other Gospel features, the interested reader is referred to the documentation2.
For brevity, we will not explain all function contracts here, as both
and
follow analogously. The full specification of the example array library is shown below:
4 Ortac
Gospel itself does not perform any kind of verification. It is the job of other tools to take the provided specifications and perform further analysis.
The Ortac [18] tool provides functions for converting the given annotations into OCaml code. Ortac is extensible through plugins, which can make use of these functions to check specifications. Currently, three different plugins are implemented: Wrapper, Monolith, and QCheck-STM. The original Ortac prototype was developed as part of Clément Pascutto’s PhD work [38].
Ortac/Wrapper generates a wrapper module from an annotated module signature, which instruments each function with assertions on the argument and result values according to the given specifications. Ortac/Monolith [35] generates code to interface with Monolith [40], a fuzzing tool for OCaml. However, both the Wrapper and Monolith plugins currently do not support the definition of models. Therefore, they are of limited use when testing mutable data structures. The new QCheck-STM plugin lifts this limitation, by translating Gospel specifications into tests using the QCheck-STM framework. The basic idea behind Ortac/QCheck-STM is to extract a purely functional model of the SUT from the provided Gospel specifications and compare the behaviour of both while running random call sequences of the associated API.
5 Implementation
In this section, we describe the overall architecture of Ortac/QCheck-STM. We then describe how to generate code for testing a single SUT, and finally, how to generalise this approach to support multiple SUTs.
Fig. 1.
Inner architecture of Ortac/QCheck-STM.
5.1 Ortac Underneath the Hood
In order to understand the working of Ortac/QCheck-STM, it is insightful to look at its constituent parts, as illustrated in Figure 1. At its core, it uses the QCheck library3, which offers utilities for randomised property-based testing. If a user provides a random generator for a type
and a property as a function
, QCheck can then run a sequence of tests by randomly generating test inputs of the given type and checking that the property holds for each of them. In case of a violation, QCheck automatically shrinks the given test input to show a minimised counterexample to the user.
QCheck-STM [32] builds upon this functionality by providing a framework for testing a mutable data structure (the SUT) against an immutable reference model, as illustrated in Figure 2. It generates random API call sequences, and then checks the property that the behaviour of the SUT and its functional model coincide for each such test input. In case of a violation of that property, QCheck-STM also reports minimised call sequence traces.
Fig. 2.
Comparing the behaviour of the SUT and model on randomly generated API call sequences.
To test a particular data structure and its API with QCheck-STM, the user has to write the model manually. Ortac/QCheck-STM offers the ability to generate the functional model automatically from the Gospel specification. It uses the
annotations in order to generate the functional model, and the
and
clauses to update the model on each function call.
5.2 Generated code
To illustrate the working of Ortac/QCheck-STM, we will show parts of the generated code from the array specifications introduced in Section 2. First, however, Ortac/QCheck-STM needs another input besides the annotated interface file, which is the configuration of the SUT:
A minimal configuration needs to provide the
and a value of that type named
. As the name suggests,
defines the particular type we would like to test, which needs to be fully instantiated. E.g., here we instantiate the polymorphic array type to
. The
value specifies the initial SUT value from which to start each test. If the Gospel annotated interface is in a file array.mli and the configuration in config.ml, Ortac/QCheck-STM can be invoked as follows:
The generated code consists of multiple functor specifications and error reporting information. We will simplify the shown code examples to aid conciseness. As a starting point, the SUT and model are defined:
The astute reader will have realised that the definitions of
and
are taken from the provided configuration module (
is turned into a function here, as the generated test executable will run multiple instances of random API sequences, for which fresh initial SUT values need to be created). The
defines the two model fields from the specification of
shown in Section 2, where the type variable
has been instantiated to
according to the configuration. The initial state value has been synthesised from the given specification of the
function by comparing its signature to the
function from the configuration module. It does not need to be turned into a function, since it is immutable. The functions
and
are provided by the Gospel standard library.
Next, the type of available function calls (i.e., commands) is defined:
Each constructor carries argument values according to the formal arguments of the respective signature. Notice, that the SUT argument is missing, as it is not randomly generated, but rather kept and updated by the testing runtime. Furthermore, the
function is not present, as it returns a new SUT. This will be amended in Section 5.3.
In order to perform randomised property-based testing with QCheck-STM, a QCheck generator for the
type is defined:
This definition needs some explanation. The function
takes as the only input the current state (i.e., the functional model). This is currently unused, but might be used in the future to define smarter random command generators (see Section 8).
creates new instances of random generators for a given type by taking both a function that can print values (here defined by
which is left out for brevity) and a generator which can create random values of that type. The
module provides basic generators and combinators to define new ones.
randomly selects from a list of generators.
always returns its argument value. For simple cases like
, this is enough, however, some command constructors carry fields for their argument values, which need to be provided by random generators as well. For example, the
constructor needs an integer for the index it shall fetch from the array. The infix operator
can be used to turn a function generator into a generator of its return type by providing a generator of its argument type. This is done by using the provided generators of base types such as
and
.
Next, we generate a function that can run a command on a given SUT:
The function
matches on the current command and calls the respective array function. The result constructor
is provided by QCheck-STM and carries as fields the returned value and a pretty-printer for its respective type (e.g.,
and
). The function
turns functions raising exceptions into functions returning values of type
(in OCaml all exceptions are part of the extensible variant type
).
As the SUT value is mutable, its internal state will change in-place during the execution of
. The functional model of the SUT has to be updated separately. Therefore, a function
is defined:
Functions
and
do not mutate the array, and therefore they return the argument state unchanged. When setting a value at a particular index, the next state depends on if the index is within the array bounds. If so, the new state has the same size as the old one, and the contents are the same besides at the given index (
is again part of the Gospel standard library). If the check fails, the underlying array stays unchanged. The individual cases within
are extracted from the
clauses of the respective function contract in Section 3. This is why each field that is marked as modified needs to provide a corresponding post-condition describing the model of the post-state.
Finally, a post-condition function is generated:
The function
takes the current command, the current state, and the result after calling
as input, and returns an optional error report. As all post-conditions refer to the state after executing a given command, it first defines the new state by calling
(Gospel offers the
operator to refer to the pre-state, which we utilise in the specification of the
function). We only show the post-conditions for the
and
functions, the others follow analogously.
In the case of
we expect the returned integer to coincide with the
field of
. For
the returned value depends on if the given index was within bounds. If it was, the returned character should be the same as the one taken from the functional model, and otherwise we expect an
exception.
We have left out the actual error reporting mechanism for brevity. The careful reader may have realised that there are in fact two different categories of post-conditions. In the case of
and
, the
clauses define the model after executing the respective function, which is used in
. For
and
they state a property of the return value, which can be checked in
.
5.3 Functions returning and consuming multiple SUTs
Thus far, all testable functions only took one SUT argument as input, and did not return a SUT value as an output (recall that the
function was temporarily omitted). Let us extend the available array API with another function
from the OCaml standard library’s
module. The function
takes two arrays, and returns a fresh array with the contents of both arguments appended. The specification is straightforward, when utilising the sequence concatenation operator
from the Gospel standard library:
In order to allow testing functions that take multiple arguments of the SUT type, or likewise return a value of that type, the generated code is adapted:
The
is not a single SUT any more, it represents a (mutable) stack of SUT values (
is part of the OCaml standard library). Correspondingly,
must describe a functional model of the SUT stack, as is done here through a list of model elements.
The variable
is defined as the maximum number of SUT arguments ever needed by any single function of the API under test. For our running array example,
is 2, as
expects two arguments of type
. This number can easily be determined during code generation. By starting with a SUT stack already filled with the maximum number of initial elements ever needed by any API call, each available function can immediately be used.
The generator for arbitrary commands now includes the full API:
Notice, that the size argument
to
is not provided by the
generator. Ortac/QCheck-STM uses a simple heuristic of classifying functions that produce a SUT but take no SUT argument as initialisation functions, i.e., functions that create new SUT instances. For these functions, any
generator is automatically changed to
in order to keep the runtime of the generated test executable low.
Fig. 3.
Function call with arguments and return value on the SUT stack.
For brevity, we will not show all the adapted code examples from the previous section again, but only describe the overall behaviour. When a function requires multiple SUT arguments, the required amount of SUTs is popped from the stack, the function is run, and the SUTs pushed back onto the stack in reverse order (this allows to capture changes to the argument SUTs in the post-condition). If a function returns a SUT, this value is pushed on the stack as well (so that the post-condition has access to it). This is illustrated in Figure 3 for the
function.
Fig. 4.
Number of API functions covered by the generated code.
6 Evaluation
We have used Ortac/QCheck-STM to test 6 OCaml modules as summarised in Figure 4, including the
,
,
, and
modules from the OCaml standard library. We have found 5 crashing bugs, and 1 function from the standard library needing a documentation fix (the source code for these tests, including the respective Gospel contracts, is available online [25]). Resolving the reported issues later revealed 2 additional unexpected exceptions in one of the tested modules. We will elaborate further on these findings later in this section.
Figure 4 displays the number of API functions covered by the generated testing code. With Ortac/QCheck-STM version 0.3 without the extension described in Section 5.3, this covers 11% (7/65) to 58% (11/19) of the module APIs. Ortac/QCheck-STM version 0.4 adds support for testing functions that take multiple SUT arguments and return new SUT values, as described in Section 5.3. Doing so increases the module API coverage further to 24% (11/46) to 74% (14/19). Interestingly, 7 out of 8 of the reported errors in this paper are all due to functions that were not testable with version 0.3. Ortac automatically skips a function for which no suitable Gospel annotation is provided. The biggest remaining contributor to untested functions is higher-order functions, such as
and
. We expand on lifting this restriction in Section 8.
While no bugs were found within the standard library modules, one curiosity was discovered: The function
is documented in the following way4:
A natural Gospel specification would therefore be to model the hash table type as an association list (similar to dictionaries in other languages). The specification of the
function could then look similar to the following:
Running the test generated by Ortac/QCheck-STM reveals the following:
It turns out, that the initial guess can be negative, in which case it has the same effect as providing zero. After reporting this, the documentation for the
function has been revised in OCaml 5.35.
Outside of the standard library, 2 libraries from the official OCaml package repository have been tested as well, which revealed errors in both of them:
The
library6 is a mature, 25-year-old OCaml library implementing mutable bit-vectors of arbitrary but fixed length, with an API very similar to arrays. In three of its functions (
,
, and
) Ortac/QCheck-STM tests discovered that their index-bound checking could lead to an integer overflow, resulting in a segmentation fault on at least one tested platform (the usage of unsafe language features or external code can lead to diverging behaviour on different operating systems or processor architectures). The issue has been reported and fixed7 consequently. Furthermore, Ortac/QCheck-STM found two cases of unexpected exceptions being raised when trying to rotate a zero-length vector. The issue has been reported8 and fixed9 as well.
The
library10 implements extensible arrays. It uses an intricate data structure [22] along with some unsafe OCaml tricks in order to obtain good amortised performance, of which many can lead to crashing programs if used incorrectly. Such a crashing scenario was found when starting from an empty array and then adding and removing an element from different ends of the array11. Anecdotally, tests of the
library have been part of the Ortac code-base almost from the initial release for internal testing. These discovered an initial bug early on12. The latest error, however, was only recently discovered when Ortac/QCheck-STM was extended to cover functions returning SUT values as described in Section 5.3.
Given the automated nature of the code generated by Ortac/QCheck-STM, it is particularly suited to be included in a Continuous-Integration (CI) pipeline, as is demonstrated by the CI used in Ortac’s GitHub repository13. So far, all observed test runs have shown runtimes in the range of hundreds of milliseconds, even with the extension described in Section 5.3. Despite these tests still not reaching full API coverage, we believe this highlights the tool’s usability in CI.
7 Related Work
Fundamental ideas within modern verification can be traced back to Hoare. This is the case for invariants and pre- and post-conditions as found in Hoare logic triples [23] as well as proving an implementation correct with respect to a model [24]. Meyer later put the concepts into use in the design-by-contract methodology of the Eiffel programming language [31]. The require(s) and ensure(s) keywords of modern specification languages such as Gospel thus have roots in Eiffel.
The Gospel specification language for OCaml follows a line of specification languages such as the Java Modeling Language (JML) for Java programs [29], the ANSI/ISO C Specification Language (ACSL) for C programs [6], and Spec# for C# programs [5].
Software engineering tools to validate program specifications can roughly be divided into two categories: One group of tools works by dynamic runtime assertion checking, whereas another group of tools performs static verification. The JML-consuming ESC/Java tool [21] targets both of these categories. The ACSL-consuming Frama-C tool [15] targets the latter. The Gospel-consuming Ortac tool targets the former, but other Gospel tools (under development) [39] target the latter.
Whereas the above specification languages and verification tools have been developed for existing programming languages a posteriori, a newer class of programming languages have verification fundamentally built in from the beginning. This is the case for Lean [33], Why3 [19], F* [43], and Dafny [30], among others.
Another approach to formally verified software development is correct-by-construction techniques. Filliâtre et al. [17] describe a development process that builds upon both Gospel and Why3 as part of the VOCAL [10] project. The user provides an OCaml module signature with Gospel annotations, which is translated to a WhyML specification. The module is then implemented in WhyML, proven correct with respect to the translated specification by Why3, and automatically translated back to OCaml code.
The term property-based testing was introduced by Fink and Bishop [20] originally. It became popular with QuickCheck, an embedded domain-specific language for the functional programming language Haskell [12], and has since been ported to numerous other languages, including OCaml [46]. QuickCheck introduced modular combinators for building up generators of complex test inputs, how each input is tested on properties in the form of Boolean-valued functions, and test input shrinking when finding a counterexample. Whereas the original QuickCheck formulation targeted purely functional code, in follow-up work Claessen and Hughes presented extensions to target monadic, effectful Haskell code, including the idea of model-based testing [13]. While QuickCheck was primarily conceived as a testing tool, the method has since been introduced in various interactive theorem provers in order to quickly provide counterexamples during the proof development process. Examples of this can be found in Isabelle [7, 8], Coq [37], and Agda [16].
With roots in model-based testing from outside the functional programming language community [45], the Gast framework for the Clean programming language offered state machines to specify the intended behaviour of stateful reactive systems [27]. The design of the state-machine framework for the commercial Erlang QuickCheck [3, 26] has since influenced framework ports for other languages, e.g., for Scala [34] and QCheck-STM for OCaml as used in this work [32]. The state-machine approach furthermore extends to testing stateful code for race conditions under concurrent usage [14].
In an impressive feat, Arts et al. [4] have developed state-machine models of the AUTOSAR specification to test automotive software. The range of defects found in doing so underlines the usefulness of the approach. Other successful QuickCheck applications include testing of telecommunication software [3], data structures [2], election software [28], computational geometry algorithms [41], compilers [36], and run-time systems [32].
8 Conclusion and Future Work
In this paper, we have presented Ortac/QCheck-STM, a tool that consumes behavioural contracts expressed in the Gospel specification language, and generates code to automatically test a given OCaml module against a functional reference model derived from these contracts. Despite being a relatively young tool with the first version released in October 2023, Ortac/QCheck-STM has already proven useful in finding bugs in established OCaml libraries, as well as pointing out inconsistencies in documentation. We expect to find more errors with it as we continue annotating more libraries with Gospel contracts.
While this paper focuses on Ortac/QCheck-STM, previous work [42] has investigated verification of OCaml code by leveraging both static and dynamic verification tools for Gospel, including Ortac. In that work, the authors remark on various limitations of Ortac, of which many have been lifted (including, for example, the verification of functions taking multiple SUT arguments, or returning SUT values). However, Ortac/QCheck-STM still has various limitations, which we would like to lift in future work:
Given its nature as a dynamic verification tool, Ortac is inherently restricted to the executable fragment of the Gospel specification language. At times, this results in contracts that are not as natural as their purely logical counterpart. By extending the accepted syntactic forms, contracts could be written in ways that would make them more amenable to other forms of verification (and their respective tools) as well.
By design, the Gospel specification language implicitly requires that mutable arguments do not alias, i.e., that they occupy separate memory locations in the OCaml heap [9]. Therefore, Ortac/QCheck-STM does not attempt to generate calls with aliased SUTs. Once Gospel is enhanced to express aliasing properties, we would like to extend Ortac/QCheck-STM accordingly to exercise such specifications.
The majority of functions in Figure 4 currently not covered by the generated code comprise idiomatic higher-order functions such as
,
, and
. Gospel currently does not have a way of specifying effectful function arguments, whereas it is possible to specify the behaviour of functions accepting pure function arguments [9]. As a first step, Ortac should be able to lift a restriction for the latter, e.g., using Claessen’s approach to function generation [11]. Secondly, once Gospel has set on a way for specifying effectful function arguments, we hope to extend Ortac to cover such API calls as well.
Currently, preconditions introduced by
clauses are not considered during the generation of arbitrary command lists on which to test the SUT and the model. During the execution of each test case, if a given pre-state does not fulfil the stated pre-condition for a particular command, it is simply skipped. The random generator could be extended to take
clauses into account while generating random sequences of commands, which would increase the efficiency of the tool.
QCheck-STM was originally developed to test the new multicore runtime arriving with OCaml 5 [32]. It can therefore also produce parallel sequences of random API calls, and test if the observed behaviour is sequentially consistent by reconciling each run with a sequential execution of a given model. By extending Ortac to use the parallel test generator, it would be possible to also test concurrent data-structures (as for example done by Artho et al. [1]).
Acknowledgments
This work was funded by ANR grant ANR-22-CE48-0013 and Tarides.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.