Dieses Kapitel behandelt die entscheidende Notwendigkeit robuster formaler Methoden in der Systementwicklung und konzentriert sich auf die Transformation von Formeln der Mission-time Linear Temporal Logic (MLTL) zu regulären Ausdrücken. Es unterstreicht die Bedeutung dieser Transformation für die Überprüfung der Korrektheit von Werkzeugen, die das Verhalten des Systems rechtfertigen und zertifizieren. Das Kapitel baut auf bestehenden MLTL-Bibliotheken in Isabelle / HOL auf, um formal die Richtigkeit des WEST-Algorithmus zu beweisen, der MLTL-Formeln in reguläre Ausdrücke konvertiert. Diese Formalisierung stellt sicher, dass die erzeugten regulären Ausdrücke logisch äquivalent zu den eingegebenen MLTL-Formeln sind, was den Spezifikationsengpass in formalen Methoden adressiert. Das Kapitel behandelt auch die Implementierung des WEST-Algorithmus, einschließlich eines Vereinfachungsschrittes zur Leistungsoptimierung, und bietet eine detaillierte experimentelle Bewertung, die die verifizierte Implementierung mit bestehenden Tools vergleicht. Die Ergebnisse zeigen die praktische Durchführbarkeit des formalisierten Ansatzes und bieten eine verifizierte und effiziente Methode zur Validierung der MLTL-Spezifikation. Darüber hinaus untersucht das Kapitel das Potenzial zukünftiger Arbeiten zur Optimierung der Äquivalenzprüfung regulärer Ausdrücke und zur weiteren Verbesserung der Leistung der verifizierten Implementierung.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
Mission-time Linear Temporal Logic (MLTL), a widely used subset of popular specification logics like STL and MTL, is often used to model and verify real world systems in safety-critical contexts. As the results of formal verification are only as trustworthy as their input specifications, the WEST tool was created to facilitate writing MLTL specifications. Accordingly, it is vital to demonstrate that WEST itself works correctly. To that end, we verify the WEST algorithm, which converts MLTL formulas to (logically equivalent) regular expressions, in the theorem prover Isabelle/HOL. Our top-level result establishes the correctness of the regular expression transformation; we then generate a code export from our verified development and use this to experimentally validate the existing WEST tool. To facilitate this, we develop some verified support for checking the equivalence of two regular expressions.
1 Introduction
As formal methods tools become increasingly integrated into system development life cycles, it is necessary to offer stronger demonstrations of their correct implementation than piecemeal code analysis and experimental validation. After all, these are the tools justifying and verifying, e.g., the certification of systems; these tools must obey a higher standard for correctness. This starts with their input languages and specification validation.
Many formal methods tools, such as model checkers and runtime verification engines, reason over behavior specifications in LTL or related linear-time logics that extend LTL, e.g., to add intervals on the temporal operators like Signal Temporal Logic (STL) [32], Metric Temporal Logic (MTL) [1], and Metric Interval Temporal Logic (MITL) [2]. Mission-time Linear Temporal Logic (MLTL) [30, 40] represents a commonly used subset of these timed logics, and has a conversion to LTL [30]. Several tools use MLTL as a core specification language; these include the Formal Requirements Elicitation Tool (FRET) [4, 19, 34], the Realizable Responsive Unobtrusive Unit (R2U2) [23, 40, 43], and the Ogda runtime monitoring tool [35‐37]. Popular symbolic model checker nuXmv [9] supports a subset of MLTL [25] by allowing bounds on the Globally and Future operators (but not on Until or Release). The WEST tool [17, 51] transforms MLTL formulas into logically equivalent (and easier to analyze) regular expressions and facilitates the validation of MLTL specifications with an interactive GUI. Since WEST validates specifications, which are the fundamental basis for formal verification, it is especially critical to rigorously establish its correctness.
Anzeige
The research community has long recognized that specification is the biggest bottleneck in formal methods [42]; to that end LTL is formalized in Coq [14], in PVS [38], and in Isabelle/HOL [47], along with many algorithms for its use in formal verification [18, 44‐46, 48]. Libraries for related linear-time logics were inspired by, or directly built upon those for LTL, including formalizations of MTL in Coq [10] and PVS [12, 50]; a PVS formalization of MITL [41]; and Isabelle formalizations of the 3-valued variant LTL3 [3] and MLTL [27]. Further, the importance of ensuring correctness of formal methods tools naturally prompts using these formalizations to generate tools. For instance, an Isabelle/HOL formalization of the VeriMon tool for monitoring metric first-order temporal logic (MFOTL) generates (via code export) VeriMon’s codebase [8]. An Isabelle/HOL formalization of a metric dynamic logic (MDL) runtime monitoring algorithm also generated the Vydra tool [39].1 In Coq, a formalization of monitoring past-time MTL generates an OCaml monitoring engine [11].
We enrich this space by formalizing the WEST algorithm for specification validation. Building on an existing MLTL library in Isabelle/HOL [26, 27], we formally prove that the WEST algorithm generates regular expressions that are logically equivalent to the input MLTL formulas, filling in details omitted from the original tool’s correctness proofs. From our formalized algorithms, we generate a new implementation of WEST to validate the (unverified) implementations of WEST: the proof-of-concept original [17] and a highly optimized refactoring [51]. As WEST validates other MLTL tools, most notably the runtime verification engine R2U2 [40], our work helps to foster trust in a safety-critical space. Our experiments also show that our Isabelle-generated code is (in aggregate) close in performance to the optimized, unverified version of WEST.
Section 2 recaps the existing Isabelle/HOL MLTL library [26, 27], introduces the trace regular expressions fundamental to the WEST algorithm, and sets up the definitions underlying our formalization. Section 3 presents our formalization of the WEST algorithm. Section 4 gathers our formalization insights to inform future efforts that build on our contributions. Section 5 experimentally evaluates the new version of WEST generated via Isabelle’s code export utility in comparison with two previous, hand-coded versions [17, 51], while Section 6 concludes with a discussion. Our formalization (totaling \(\approx \) 7400 lines of code) is available on the Archive of Formal Proofs (AFP) [52].
2 MLTL and Regular Expressions
In this section, we present the syntax and semantics of MLTL and explain our formalization of the WEST regular expressions used by the WEST tool, highlighting some key datatypes; when appropriate, we intersperse mathematical definitions with Isabelle/HOL code. We also introduce some useful functions that are important in the correctness proofs later on.
Anzeige
Other works formalize regular expressions in different contexts. An algorithm for matching extended regular expressions via symbolic derivatives was formalized in Lean [56], and the Myhill-Nerode theorem was restated in Isabelle/HOL using regular expressions (instead of automata, which is more common) [54]. There has also been work formalizing decision procedures to check equivalence of regular expressions in Rocq [13] and Isabelle/HOL [29]. The latter is particularly relevant; we are interested in potentially incorporating it in future work to improve our (currently naive) regular expression checking procedure.
2.1 Syntax and Semantics of MLTL
Let \(\texttt {AP}\) be a finite set of atomic propositions. Let \(\texttt {p} \in \texttt {AP}\) be an atomic proposition, and \(a, b \in \mathbb {N}\) be natural numbers such that \(a \le b\); MLTL formulas are defined by the following grammar; the temporal operators \(\texttt {F}, \texttt {G}, \texttt {U}, \texttt {R}\) denote “Future”, “Globally”, “Until”, and “Release”, respectively.
A trace\(\pi \) is a finite sequence \(\pi = \pi [0], \pi [1], \ldots \) of sets of atomic propositions, where \(\pi [i] \subseteq \texttt {AP}\) for all i. We refer to the i-th element of a trace \(\pi \) as the i-th state of the trace, and intuitively interpret \(\pi [i]\) as the set of propositions that are true at time i. We denote the length of a trace \(\pi \) by \(|\pi |\), and the suffix of a trace \(\pi \) starting at time i by \(\pi _i\); that is, \(\pi _i = \pi [i], \pi [i+1], \ldots \) and \(\pi _0 = \pi \). The existing MLTL library in Isabelle/HOL [26] encodes a trace as a list of sets of natural numbers; each set represents the atomic propositions that are true at each timestep. For example, the trace \(\pi = \{p_0, p_1\}, \{p_0\}\) is encoded in Isabelle as the [{0, 1}, {0}], which has type nat set list.
A trace \(\pi \) satisfies an MLTL formula \(\phi \), denoted \(\pi \models \phi \), as follows [30, 40], where \(\psi \) is another MLTL formula:
Fig. 1.
For \(\texttt {AP} = \{p_0, p_1\}\), the bit string of trace \(\{p_0\}, \{p_0, p_1\}, \{\}, \{p_1\}\) is 10,11,00,01 (following the source material [17], we use commas to separate timesteps for readability) which is encoded in Isabelle as [[1,0], [1, 1], [0, 0], [0, 1]] (type nat list list).
2.2 Trace Regular Expressions
The WEST algorithm [17] takes an MLTL formula as input and recursively computes a WEST regular expression representing exactly the set of traces that satisfy that formula. Intuitively, we can think of this as happening in two steps. First, we represent traces as bit strings; here, instead of encoding each state in a trace as a set, we encode each state as a bit string of length n (where n is the number of variables in the formula). Next, we define WEST regular expressions (WEST regexes) as a compact way to represent a set of traces.
More precisely, we assume that \(\texttt {AP} = \{p_0, p_1, \ldots , p_{n-1}\}\) and impose (without loss of generality) an ordering on these atomic propositions; we use this ordering to construct the bit string of a trace \(\pi \) of length m as the length mn string of 0’s and 1’s such that the value of atomic proposition \(p_k\) at timestep i corresponds to the \((ni+k)\)-th character of the bit string [17, Definition 2]. We visualize an example in Fig. 1. We encode bit strings in Isabelle as lists of lists.
In Isabelle/HOL, we obtain an ordering on our set of atomic propositions by constraining them to be natural numbers, of type nat. Following WEST’s implementation [51], we choose not to fix n globally (which we could accomplish using a locale [6, 7]) but instead pass the number of variables as an argument to the helper functions in the WEST algorithm (in the top-level function, we compute the right value to pass to the helper functions).
We then collate these bit string representations in trace regular expressions,2 or trace regexes for short, which are strings consisting of 0, 1, and S, where S is a shorthand for the regular expression \(\texttt {0} | \texttt {1}\). For example, fixing the number of atomic propositions to be \(n = 3\), the trace regex 10S matches only the two bit strings 101 and 100 (each representing a trace of length 1), and the trace regex S00,0S0 matches the four bit strings (each representing a length 2 trace) “100,010”, “100,000”, “000,010”, and “000,000”.
In Isabelle/HOL, trace regexes have type WEST_bit list list, where our custom datatype WEST_bit is comprised by Zero, One, and S. We represent trace regexes with WEST_bit list list and not WEST_bit list because the number of atomic propositions, n, is critical for the interpretation of traces from their bit string representations. We must ensure that each WEST_bit list, referred to as a state regex, has length n in the overall list; having a list of lists facilitates this check. For this, we define the function trace_regex_of_vars which takes as inputs trace regex r and the number of atomic propositions n, and checks that each state regex in r has length n. Here, ! is Isabelle/HOL syntax for the i-th element of L.
Then, we build a list of trace regexes as a WEST_regex of type WEST_bit list list list, the final return type of the WEST algorithm. A WEST regex L is well-defined for n atomic propositions if each trace regex r in L satisfies trace_regex_of_vars r n. We summarize the datatypes of objects in our encoding in Table 1. While the nested lists may seem unwieldy at first glance, they ensure modularity in the implementation and, more crucially, in the correctness proofs. We turn to an example of this modularity now, as we build up to formalizing the notion of a WEST regex matching a trace.
Table 1.
Summary of the datatypes of each object in our encoding.
Terminology
Description
Isabelle Type
WEST bit
Custom Isabelle datatype
WEST_bit
state regex
List of WEST bits that encodes states as bit strings
WEST_bit list
trace regex
List of WEST states that represents
sets of traces compactly as regular expressions
WEST_bit list list
WEST regex
List of WEST traces that represents the union of
all sets of traces represented by the WEST traces
WEST_bit list list list
2.3 Useful Definitions
The notion of matching is foundational to the WEST algorithm because it is crucial for connecting the semantics of MLTL formulas to the semantics of WEST regexes. We define that a state regex rmatches a state if r equals the bit string representation of the state or if r generalizes the bit string by replacing some characters in the bit string with S’s. This notion lifts to traces: a trace regex r matches a trace \(\pi \) iff r matches the bit string representation of \(\pi \). Furthermore, we may lift this to WEST regexes. For trace regexes \(r_1, r_2, ..., r_k\), we can combine them by alternations as \(r_1 | r_2 | ... | r_k\); we abbreviate this as the WEST regex \(L = [r_1, r_2, ..., r_k]\), and define that L matches a trace \(\pi \) iff some \(r_i\) matches \(\pi \).
We contribute a formal mathematical definition of the notion of matching, which previous work [17] supplied only an intuition for. We do this in three steps. First, we define matching a state regex (of type WEST_bit list) to a state in a trace (of type nat set) in the definition match_timestep:
This definition checks that for all i, r!i equaling One implies the i-th atomic proposition \(p_i\) holds at the input state (i.e., \(p_i \in {\texttt {\textit{state}}}\)), and r!i equaling Zero implies \(p_i\) does not hold at this state. If r!i is S, then \(p_i\) can be either true or false at this state. For example, the state regex [0, 1, S] matches \(\{1\}\) and \(\{1, 2\}\).
Next we define matching a trace regex (of type WEST_bit list list) to a trace (of type nat set list) in the definition match_regex:
This definition takes as input a trace \(\pi \) and a trace regex r, and checks that match_timestep holds for all regex states in trace (i.e., for all r ! time) on the corresponding state in the trace (\(\pi \)! time). It also checks that the length of \(\pi \) is at least the length of r (a well-definedness condition, as we need to access \(\pi \)! time for all time up to the length of r).
Finally, we define matching a WEST regex (of type WEST_bit list list list) to a trace (of type nat set list) in the definition match:
This definition checks that match_regex holds for some trace regex L ! i in L and the trace \(\pi \). We may intuitively view WEST regexes as compactly representing the behavior of a set of traces; then, the WEST algorithm transforms a given MLTL formula into a WEST regex that captures the set of satisfying traces.
Another important function, WEST_num_vars, counts the number of atomic propositions in a given MLTL formula by recursively computing the maximum number of atomic propositions in all subformulas. For example, WEST_num_vars of an atomic proposition p is p+1 (as atomic propositions are indexed from 0), and WEST_num_vars of And_mltl\(\phi \)\(\psi \) is the maximum of WEST_num_vars\(\phi \) and WEST_num_vars\(\psi \). This function is used frequently in our correctness results.
Fig. 2.
High-level overview of key components in our formalization of the WEST algorithm.
3 Formalizing the WEST Algorithm
Intuitively, the WEST algorithm recursively computes a list of trace regexes for the subformulas of an MLTL formula, and then combines these lists using the WEST_and and WEST_or operations for taking intersections and unions of sets of traces. The finite semantics of MLTL formulas ensures that all existential and universal quantifiers can be translated to a finite number of WEST_and and WEST_or operations on trace regexes; thus the WEST algorithm directly defines the temporal operators in terms of WEST_and and WEST_or. For these temporal operators, we also need a shifting operation, shift, which the source material [17] implicitly uses but does not explicitly define. Intuitively, shift ensures that we are analyzing the locations in the trace specified by the temporal operators; we will see this in an example in Sect. 3.2. Fig. 2 visualizes the overall structure of the WEST algorithm.
We first discuss our formalization of the core operators WEST_and and WEST_or along with our formalization of an important simplification step in Sect. 3.1. Then, we present how the temporal operators are built on top of these core operators in Sect. 3.2, using the shift operation. Finally, we discuss the top-level WEST algorithm WEST_reg and our overall correctness result in Sect. 3.3.
3.1 The Core Operations of WEST
The WEST_or operation simply combines two WEST regexes (i.e., lists of trace regexes) into one WEST regex. We implement this in Isabelle/HOL using the built-in @ operator for list concatenation. The top-level correctness theorem shows that for two WEST regexes L1 and L2, L1 matches a trace \(\pi \) or L2 matches \(\pi \) iff L1@L2 matches \(\pi \). We formally state this as the WEST_or_correct lemma.
Next, the WEST_and operation takes as input two lists of trace regexes and computes a list of trace regexes representing the intersection of the sets of traces represented by the input lists. We visualize the intended semantics of this operation in Fig. 3. One notable point here is that WEST_and Zero One is None, because it is impossible for a bit in a trace regex to simultaneously equal Zero and One. In Isabelle/HOL, we formalize WEST_and in four steps: first we define an operation between two bits, then between two regex states, then between two trace regexes, and finally between two WEST regexes.
The lowest-level operation between two bits (each of type WEST_bit) is defined in the function WEST_and_bitwise as follows:
This operation reflects the desired semantics visualized in Fig. 3 by using option types to return None when the set intersection is empty. For example, WEST_and_bitwise S Zero is Some Zero, while WEST_and_bitwise One Zero is None.
This operation is then lifted to two regex states in WEST_and_state; here, we apply WEST_and_bitwise to each pair of corresponding bits in the two regex states. If None is returned for any pair, then the function returns None for the entire regex state. Note that the lengths of the two regex states must be the same (i.e., equal to n, the number of atomic propositions), and this operation returns None if they are not. Then, we again lift WEST_and_state to operate on two trace regexes in the function WEST_and_trace by applying WEST_and_state to each pair of corresponding regex states in the two trace regexes, returning None if any of the calls to WEST_and_state returns None. The input trace regexes are allowed to have different lengths, and the shorter trace regex is treated as if the missing regex states are all S, following [17, Definition 4, Pad]. The full formal definitions can be found in our formalization [52].
Fig. 3.
Operations table for WEST_and_bit operation for bits (left), and two examples of WEST_and between regex states and traces (middle and right).
To establish the correctness of WEST_and, we prove the following lemma:
This shows that for input WEST regexes L1 and L2, both L1 and L2 match trace \(\pi \) iff the WEST_and of L1 and L2 matches \(\pi \). In other words, the set of traces that the WEST_and of L1 and L2 matches is exactly the intersection between the set of traces that L1 matches and the set of traces that L2 matches. The assumptions on L1 and L2 are well-definedness conditions that ensure all state regexes have length n (the number of atomic propositions), as required by WEST_and_state.
To keep the sizes of WEST regexes small, WEST implements an additional simplification step which collects together related trace regexes. If two trace regexes differ only by a single bit, then they may be combined into one trace regex where the differing bit is S. For example, fixing the number of atomic propositions to \(n=2\), the WEST regex [[[0,0],[0,1]], [[0,0],[0,0]], [[0,1],[0,S]]] may first be reduced (by combining the first two trace regexes) to [[[0,0],[0,S]], [[0,1],[0,S]]], and then to [[[0,S],[0,S]]]. This is crucial for improving the tool performance, as it helps to mitigate blowup in the length of the list of trace regexes during the WEST_and and WEST_or operations [17, Section 4].
The underlying idea is straightforward: greedily simplify pairs of regexes until no more pairs can be simplified; we implement this in the WEST_simp function. It is crucial that the simplification step does not change the set of traces that a WEST regex matches. The following lemma shows that, for a well-defined WEST regex L, a trace \(\pi \) matches L iff \(\pi \) matches the simplification of L:
Finally, we define the functions WEST_and_simp and WEST_or_simp by passing the output of WEST_and and WEST_or (respectively) to WEST_simp. The correctness of WEST_and_simp and WEST_or_simp follows directly from the correctness results for WEST_and, WEST_or, and WEST_simp.
3.2 Temporal Operators
Our formalization of the temporal operators in the WEST algorithm uses the WEST_and_simp and WEST_or_simp operators. It also uses an operation to shift regular expressions to later timesteps, which we call shift. Though the source material never explicitly defines this shift operation, it uses it implicitly and defines an analogous operation [17, Definition 5]. We formalize shift as follows:
Here, we refer to a state regex of all S’s as an arbitrary state, and we refer to a trace regex of all arbitrary states as an arbitrary trace [17, Section 6]. In this snippet, arbitrary_trace n t constructs an arbitrary trace regex containing t arbitrary states of length n. Then, shift takes as input a WEST regex L, and appends an arbitrary trace of t arbitrary states to the front of each trace regex in L. As intuitively named, shift shifts all trace regexes in L by t timesteps.
For example, fixing the number of atomic propositions at \(n = 2\), the WEST regex L = [[[1,1]], [[0,0], [0,0]]] captures that either \(p_0\) and \(p_1\) both need to be true at timestep 0, or \(p_0\) and \(p_1\) both need to be false at timesteps 0 and 1. If instead we want to delay this behavior for \(p_0\) and \(p_1\) by 3 timesteps, we can compute shift L 2 3, which returns [[[S,S],[S,S],[S,S],[1,1]], [[S,S],[S,S],[S,S],[0,0],[0,0]]]. The following lemma formalizes the connection between the shift operation for WEST regexes and the suffix of a trace:
More precisely, shift_match_property establishes that a sufficiently long trace \(\pi \) matches a WEST regex L shifted by t timesteps iff the suffix of \(\pi \) with t states removed, denoted drop t\(\pi \), matches L.
Now, we demonstrate how the temporal operators are built on top of the core WEST operators. We provide for an example WEST_global, defined as follows:
WEST_global takes as input a WEST regex L, lower and upper interval bounds a and b, and the number of atomic propositions n. WEST_global then uses the shift operation to shift the input regex L by b timesteps, and computes the WEST_and of the shifted L and WEST_global with b-1. Intuitively, L captures a set of traces specifying some behavior at timestep 0, and the successive shift and WEST_and operations ensures that L’s behavior happens at all timesteps between a and b. The remaining temporal operators are defined in a similar manner, using shift and the core WEST operators.
We establish the correctness of the WEST_global operator as follows:
This lemma says that for a WEST regex L over n variables (assumption L_vars) that captures the semantics of an MLTL formula \(\phi \) of at most n variables (assumption semantics_\(\phi \) and \(\phi \)_vars), and a trace \(\pi \) of sufficient length, WEST_global\(\phi \)a b n matches \(\pi \) iff \(\pi \) satisfies the semantics of Global_mltl\(\phi \)a b (representing the formula \(\texttt {G}_{[a,b]} \phi \)).
Likewise, each of the remaining temporal operators has a correctness lemma that establishes the connection between the WEST regex it computes and its corresponding temporal operator. The correctness lemmas for the temporal operators totaled about 850 lines of code.
3.3 Top-Level Algorithm and Correctness
The WEST algorithm takes as input an MLTL formula \(\phi \) in negation normal form (NNF) and recursively computes the WEST regex representing the set of traces with length at least the computation length of \(\phi \) that satisfy the formula. The existing Isabelle/HOL MLTL library [27] already formalizes the computation length3 of \(\phi \), denoted complen(\(\phi \)), which intuitively measures how much time is needed to decide the satisfiability of \(\phi \) [17, 24, 27].
We formalize the WEST algorithm in the function WEST_reg as follows:
Although input formulas to the WEST algorithm must be in NNF, we allow formulas of all shapes as input and apply the convert_nnf function from the existing MLTL formalization [27] to transform the input formula to NNF. The resultant NNF formula nnf_\(\phi \) and the number of atomic propositions, computed as WEST_num_vars\(\phi \), are then passed to the auxiliary function WEST_reg_aux. This auxiliary function takes two inputs (a nat mltl formula \(\phi \) and a natural number n for the number of atomic propositions) and cases on the structure of \(\phi \) to apply the appropriate core operators and return a WEST regex.
We consider here a few representative cases: True, Prop_mltl, And_mltl, and Global_mltl (corresponding to the cases of True, an atomic proposition, a conjunction, and the global operator). Mathematically, these cases are defined in the source material as follows [17]: \(\texttt {reg}(\texttt {True}) = S^n\), \(\texttt {reg}(p_k) = S^{k}1S^{n-k-1}\), and \(\texttt {reg}(\phi \wedge \psi ) = \texttt {reg}(\phi ) \wedge \texttt {reg}(\psi )\). The global operation, \(\texttt {reg}(\texttt {G}_{[a,b]} \phi )\) computes (recursively) the WEST_and of \(\texttt {reg}(\phi )\) shifted by i timesteps for all i with \(a \le i \le b\) (note this is essentially what WEST_global computes). In Isabelle/HOL, we have:
Here, map f L applies a function f on every element of a list L, so the base case for True_mltl creates a WEST regex containing a trace regex of all S’s. In the case Prop_mltl p, the map function takes as input j and returns One if the propositional variable p equals the index j, and otherwise S. In And_mltl, we directly call the WEST_and operator; likewise in Global_mltl.
Top-Level Correctness. A central contribution of our work is proving (and even slightly generalizing) the correctness of the WEST_reg_aux function and elucidating many of the details omitted in the original proof of correctness. Theorem 2 in the source material states the correctness result as follows: for a MLTL formula \(\phi \) in negation normal form, a trace \(\pi \) with length \(\texttt {complen}(\phi )\) satisfies \(\phi \) iff \(\pi \) matches \(\texttt {reg}(\phi )\) [17]. We formalize this in the theorem WEST_reg_aux_correct:
This theorem states that for MLTL formula \(\phi \) in NNF (assumption is_nnf) with at most n variables (assumption \(\phi \)_nv) and well-defined interval bounds (assumption intervals_welldef\(\phi \)) and a trace \(\pi \) of length at least complen(\(\phi \)) (assumption \(\pi \)_long_enough), the trace \(\pi \) satisfies \(\phi \) iff the trace \(\pi \) matches the WEST regex computed by WEST_reg_aux\(\phi \)n. Here, the functions convert_nnf, complen_mltl, and intervals_welldef are from the existing MLTL formalization [26]. The \(\phi \)_nv is an implicit assumption in the source material, which globally fixes the number of atomic propositions.4 We slightly generalize the original correctness result, as our formal result holds for all traces of length at least the computation length of \(\phi \) rather than just the traces of length equal to the computation length of \(\phi \).
We prove this by structural induction on the input formula \(\phi \). The is_nnf assumption allows us to use the custom induction rule nnf_induct from the existing MLTL formalizing [26], simplifying the induction proof. The base cases are straightforward, and the inductive cases are proven by applying the inductive hypothesis on the subformulas and using the correctness lemmas for the core WEST operators. For instance, for input formula \(\phi \)= Global_mltl\(\psi \)a b (which is \(\texttt {G}_{[a, b]} \psi \)), the inductive hypothesis gives us that the trace \(\pi \) satisfies \(\psi \) iff the WEST regex L computed by WEST_reg_aux\(\psi \)n matches \(\pi \). Next, in order to apply the correctness result of the WEST_global operator, we need to show that L is a WEST regex over n atomic propositions (i.e., each state regex in each trace regex in L is of length n). For this, we prove the lemma WEST_reg_aux_num_vars:
This lemma states that, for a formula \(\phi \) in NNF with at most n atomic propositions, the WEST regex computed by WEST_reg_aux\(\phi \)n is a WEST regex over n atomic propositions. With this, we can apply the correctness result of the WEST_global operator on L and complete the proof of the Global_mltl case.
Finally, we present the top-level correctness result for the WEST algorithm:
This theorem states that for any MLTL formula \(\phi \) with well-defined interval bounds [27] and any trace \(\pi \) of length at least the computation length of \(\phi \), \(\pi \) satisfies \(\phi \) iff the WEST regex WEST_reg\(\phi \) matches \(\pi \). The correctness of the top-level WEST algorithm took about 600 LOC in Isabelle/HOL compared to the 60 or so lines of proof sketches in the source material [17, Appendix III].5
4 Formalization Insights
Retrospectively viewing our formalization at a high level, we highlight a few notable points. First, our modular definitions did considerably streamline our correctness proofs. Many proofs have relatively similar structures, which helped guide the formalization at a high level. However, we also found that relatively short proofs in the source material became lengthy in the formalization, in part because they often split into many subcases. For example, the notion of a WEST regex matching a trace is intuitively simple, but the formalization used several helper functions. As another example, the proof of WEST_and_correct is approximately 15 lines of a proof sketch in the source material [17, Theorem 4]. However, our formal development took approximately 1800 LOC to state and prove this result level by level, starting from the correctness of the and operation on state regexes, then on trace regexes, and finally on WEST regexes. Although these proofs had structural similarities, subtle differences between the operators complicated the low-level details of the proofs; for instance, the option types of WEST_and_state required careful analysis in the correctness proofs.
Second, our formalization makes all details explicit, including details omitted in the source material. Many of our formal proofs are by induction; setting up the “right” inductive structure in a formal setting requires careful analysis that is often glossed over in source material. For instance, the top-level correctness theorem required making a mathematically implicit assumption on num_vars explicit. Setting up this assumption in the wrong way leads to an ineffective inductive structure. As another example, in the proof of WEST_simp_correct, we perform a tricky induction on the difference between the length of the input WEST regex and the output simplified WEST regex. Additionally, we are required to prove that all functions terminate. For many functions, Isabelle/HOL proves this automatically [28], but we occasionally ran into cases where we had to explicitly construct a measure to prove termination. For example, the WEST_reg_aux function and the WEST_simp function required such manual termination proofs. Intuitively, WEST_reg_aux recurses on all subformulas in NNF, converting subformulas to NNF as necessary; accordingly, we use a termination measure that is similar to the number of nodes in the abstract syntax tree (AST) of the formula, but weighs nodes that are not in NNF more heavily. This allows us to prove that WEST_reg_aux terminates, as this measure strictly decreases on every recursive call. Further, for WEST_simp, the length of the input list is not strictly decreasing, but the list of candidate pairs for simplification will be exhausted at some point, so we use a measure that combines the length of the input list with the length of remaining candidate pairs.
Overall, integrating WEST_simp into our formalization was rather involved. Our initial formalization did not include WEST_simp, but we ultimately realized that it is crucial for speed and thus also important for tool validation. While the modular nature of our formalization easily allowed us to add in this function to the algorithm, its correctness proofs were intricate. Similarly to WEST_and, we proved the correctness of WEST_simp level by level, totaling around 1300 LOC.
As a final interesting point, we found during our tool validation that WEST_reg and the (unverified) WEST tool sometimes produce trace regexes that differ only by a string of S’s at the end. In such cases, because these trace regexes have different length, our equivalence checking methods spuriously identify a mismatch. The WEST tool always produces trace regexes that have the same length as the computation length of the input formula, while WEST_reg does not.6 To account for this, we define a function simp_pad_WEST_reg which pads trace regexes to this computation length (and then simplifies). We extend the top-level correctness theorem from WEST_reg to simp_pad_WEST_reg; in our experiments, we work with simp_pad_WEST_reg so as to eliminate these spurious mismatches.
5 Experiments
The functions simp_pad_WEST_reg and naive_equivalence are executable in Isabelle/HOL, and we use Isabelle/HOL’s code generator [20] to export these functions to Haskell.7 We choose Haskell both to facilitate our experimental setup and because the GHC compiler [33] produces reasonably fast native machine code. We use our code export to validate two versions of the WEST tool—the initial version of WEST [17], and also a more recent version that has been highly optimized [51]. We also compare the different implementations for speed. We run all of our experiments in WSL2 on a Windows machine with an 11th generation Intel Core i7 processor and 32GB of RAM. We use an unverified parsing script to transform input MLTL formulas into the format required by our code export.8
Previous Validation Efforts. The most recent (and fastest) version of WEST was validated against several MLTL tools [51]: ➀ the original version of WEST [17], ➁ the runtime verification engine R2U2 [23, 24, 43] ➂ a direct C++ implementation of MLTL semantics [51], and ➃ translating MLTL formulas to propositional logic [21] and applying a BDD based AllSAT solver. The validation works by analyzing, for each formula in the test suite, whether the trace set of regexes produced by WEST is equivalent to the set of satisfying traces produced by other tools. The equivalence checking is a crucial step performed between outputs that can be in different formats (depending on the output format of each tool). The test suite of 1662 MLTL formulas was designed to capture every possible combination of MLTL operators [17].
5.1 Verified Equivalence Checking
Our tool validation is set up to check the outputs of our verified implementation of WEST against the two existing implementations. For this, we need to be able to check equivalences between WEST regexes. It is not always enough to merely check set equality, as implementation differences can lead to different (but logically equivalent) outputs. For instance, the two WEST regexes [[[S, S]]] and [[[S, 1]], [[1, S]], [[0, 0]]] are equivalent, but WEST_simp does not simplify the second into the first. The order in which WEST_simp simplifies pairs of trace regexes within a WEST regex is what causes these differences.
Developing a fully verified and optimized equivalence checking algorithm is out of scope of our work, but we still wanted a lightweight trustworthy implementation of regex equivalence checking. Accordingly, we formalize a naive equivalence checking function for WEST regexes, called naive_equivalence. This function works by explicitly enumerating all the trace regexes that each WEST regex produces and then checking set equality.
We then prove the experimentally relevant direction of correctness: If two WEST regexes are equivalent under our (executable) naive equivalence checking function, then they are indeed equivalent under the (non-executable) mathematical definition. Formally, we have the following lemma:
The proof was approximately 1150 lines of code. Although establishing both directions of equivalence here (i.e, \(\longleftrightarrow \) instead of \(\longrightarrow \)) is theoretically desirable, the direction we verify is the experimentally significant one, since we encounter no instances where naive_equivalence failed in our test suite. More specifically, naive_equivalence holds on all but 4 of the 1662 input formulas and times out (after 4 hours) on the remaining 4 formulas. Often the outputs are identical; for example, the Isabelle implementation and the optimized WEST tool produced identical WEST regexes on 1547 of the formulas. We additionally ran the previous (unverified) equivalence checking procedure, which succeeded on all of the formulas. Collectively, these results establish strong confidence in the correctness of the (unverified) WEST tools [17, 51].
5.2 Speed Comparison
The original C++ version of WEST [17] performed string-based operations, and the optimized version of WEST takes advantage of highly parallelized computations by using bitsets [51]. Although fast performance is not our primary goal, preliminary experiments demonstrate how our formalized code compares to the two unverified versions of WEST. Overall, we find that the optimized version of WEST is fast (as expected).
Fig. 4.
Two cactus plots, each comparing the three WEST implementations on 1000 random formulas of varying nesting depth d, interval bounds b, and number of atomic propositions n. The number of total solved instances is shown on the y-axis, and the cumulative time taken is shown on the x-axis, with the number of timeouts labeled.
Our Isabelle implementation also performs quite respectably; it is, in aggregate, close in performance to the optimized version of WEST. We perform extensive experiments to compare the performance of the three tools on large randomly generated benchmark sets. We use a script to generate random MLTL formulas [51], varying the parameters of the maximum depth and the maximum interval time bounds. Our results are in Fig. 4. As the primary focus of our work is tool validation, we do not envision our contribution as replacing the WEST tool, but its relative efficiency is encouraging nonetheless.
However, we did find that, on individual examples, our code export has somewhat unpredictable behavior (whereas the optimized version of WEST appears to be uniformly fast), and our code export seems to incur timeouts more frequently than the unverified WEST implementations. For example, in Fig. 5, we evaluate the speed of the three tools based on varying values of d, the depth of the formula, while fixing the number of atomic propositions at \(n=5\) and the maximum interval bound at \(b=2\). Here, we observe that the Isabelle implementation begins to timeout much more frequently than the other two tools when \(d=4\) and \(d=5\).
Fig. 5.
Results for \(n=5\), \(b=2\), and varying values of d from 0 to 5, with a batch size of 300 formulas per value of d. The Isabelle implementation is faster than the unoptimized WEST tool on most values of d, but times out on many formulas for \(d=5\).
Additional results, including aggregate cactus plots on easier but larger test suites, an extension of Fig. 5 on higher values of formula depth d, and experiments where we vary the value of maximum interval bound b (instead of d), can be found in Appendix A of the full version of this paper [53].
6 Conclusion
Our work produces a third, open-source, freely available implementation of the WEST algorithm, this time formally verified [52]. Given the popularity of MLTL as a formal specification language for safety-critical applications [5, 15, 16, 24, 31], verifying significant algorithms like WEST, which facilitates MLTL specification, is well-justified. We build on an existing formalization of MLTL in Isabelle/HOL [27] to further develop the library of verified MLTL algorithms and properties, which could help facilitate future verified developments in this space. Our development validates the existing (unverified) WEST tool [17, 51] on benchmarks from the literature, bringing us a step closer to validating other MLTL tools like R2U2 [23, 40]. Though our primary focus was not on speed, the aggregate performance of our Isabelle-generated code is promising, and optimizing our formalization could be interesting future work. It would be particularly beneficial to further optimize (and verify the reverse direction of) our naive WEST regex equivalence checking, possibly using existing work [29] which verifies regex equivalence checking in a general setting. Verified parsing (to transform input formulas into the syntax required by our code export) would also be welcome. Additionally, a deeper analysis of the performance of the WEST tools and of our verified code on different classes of benchmarks could inform future verified tool generation efforts. For example, it would be interesting to experimentally compare a code export to some of the other languages supported by Isabelle/HOL, like SML and OCaml, to see if a different target language could help avoid timeouts. Importantly, our formalization of MLTL rewriting, equivalence checking, and regular expression manipulation could serve as a basis for formalizing similar utilities in logics like MTL and STL that extend MLTL.
Acknowledgments
Thanks to NSF CAREER Award CNS-1552934, NSF CCRI-2016592, and GRFP-2024364991 for supporting this work. We thank the anonymous TACAS reviewers as well as Alec Rosentrater and Laura Gamboa Guzman for their helpful feedback on the paper, and the TACAS artifact evaluators for their time.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Note that we crucially assume that the number of variables of \(\phi \) is \(\le n\) instead of \(=n\) in order to satisfy the inductive hypothesis in our (inductive) proof.
This is because we implicitly treat shorter trace regexes to have all S’s at the end (recall our discussion of the WEST_and_trace operator in Sect. 3.1).
Note that, although Isabelle/HOL’s code generator is not yet fully verified, exporting a formalized function is more trustworthy than simply coding a function. Additionally, some work has considered verifying Isabelle’s code generator [22].
There has been some recent work [49] on improving support for verified parsing in Isabelle/HOL, so verifying this parsing step might be an interesting future direction.