Witnessing Secure Compilation

Abstract

Compiler optimizations may break or weaken the security properties of a source program. This work develops a translation validation methodology for secure compilation. A security property is expressed as an automaton operating over a bundle of program traces. A refinement proof scheme derived from a property automaton guarantees that the associated security property is preserved by a program transformation. This generalizes known refinement methods that apply only to specific security properties. In practice, the refinement relations (“security witnesses”) are generated during compilation and validated independently with a refinement checker. This process is illustrated for common optimizations. Crucially, it is not necessary to formally verify the compiler implementation, which is infeasible for production compilers.

Kedar S. Namjoshi, Lucas M. Tabajara

: Backward Context-Sensitive Flow Reconstruction of Taint Analysis Results

Abstract

Taint analysis detects if data coming from a source, such as user input, flows into a sink, such as an SQL query, unsanitized (not properly escaped). Both static and dynamic taint analyses have been widely applied to detect injection vulnerabilities in real world software. A main drawback of static analysis is that it could produce false alarms. In addition, it is extremely time-consuming to manually explain the flow of tainted data from the results of the analysis, to understand why a specific warning was raised. This paper formalizes \(\mathsf {BackFlow}\), a context-sensitive taint flow reconstructor that, starting from the results of a taint-analysis engine, reconstructs how tainted data flows inside the program and builds paths connecting sources to sinks. \(\mathsf {BackFlow}\) has been implemented on Julia’s static taint analysis. Experimental results on a set of standard benchmarks show that, when \(\mathsf {BackFlow}\) produces a taint graph for an injection warning, then there is empirical evidence that such warning is a true alarm. Moreover \(\mathsf {BackFlow}\) scales to real world programs.

Pietro Ferrara, Luca Olivieri, Fausto Spoto

Fixing Code that Explodes Under Symbolic Evaluation

Abstract

Effective symbolic evaluation is key to building scalable verification and synthesis tools based on SMT solving. These tools use symbolic evaluators to reduce the semantics of all paths through a finite program to logical constraints, discharged with an SMT solver. Using an evaluator effectively requires tool developers to be able to identify and repair performance bottlenecks in code under all-path evaluation, a difficult task, even for experts. This paper presents a new method for repairing such bottlenecks automatically. The key idea is to formulate the symbolic performance repair problem as combinatorial search through a space of semantics-preserving transformations, or repairs, to find an equivalent program with minimal cost under symbolic evaluation. The key to realizing this idea is (1) defining a small set of generic repairs that can be combined to fix common bottlenecks, and (2) searching for combinations of these repairs to find good solutions quickly and best ones eventually. Our technique, SymFix, contributes repairs based on deforestation and symbolic reflection, and an efficient algorithm that uses symbolic profiling to guide the search for fixes. To evaluate SymFix, we implement it for the Rosette solver-aided language and symbolic evaluator. Applying SymFix to 18 published verification and synthesis tools built in Rosette, we find that it automatically improves the performance of 12 tools by a factor of 1.1 \(\times \)–91.7 \(\times \), and 4 of these fixes match or outperform expert-written repairs. SymFix also finds 5 fixes that were missed by experts.

Sorawee Porncharoenwase, James Bornholt, Emina Torlak

The Correctness of a Code Generator for a Functional Language

Abstract

Code generation is gaining popularity as a technique to bridge the gap between high-level models and executable code. We describe the theory underlying the PVS2C code generator that translates functional programs written using the PVS specification language to standalone, efficiently executable C code. We outline a correctness argument for the code generator. The techniques used are quite generic and can be applied to transform programs written in functional languages into imperative code. We use a formal model of reference counting to capture memory management and safe destructive updates for a simple first-order functional language with arrays. We exhibit a bisimulation between the functional execution and the imperative execution. This bisimulation shows that the generated imperative program returns the same result as the functional program.

Nathanaël Courant, Antoine Séré, Natarajan Shankar

Leveraging Compiler Intermediate Representation for Multi- and Cross-Language Verification

Abstract

Developers nowadays regularly use numerous programming languages with different characteristics and trade-offs. Unfortunately, implementing a software verifier for a new language from scratch is a large and tedious undertaking, requiring expert knowledge in multiple domains, such as compilers, verification, and constraint solving. Hence, only a tiny fraction of the used languages has readily available software verifiers to aid in the development of correct programs. In the past decade, there has been a trend of leveraging popular compiler intermediate representations (IRs), such as LLVM IR, when implementing software verifiers. Processing IR promises out-of-the-box multi- and cross-language verification since, at least in theory, a verifier ought to be able to handle a program in any programming language (and their combination) that can be compiled into the IR. In practice though, to the best of our knowledge, nobody has explored the feasibility and ease of such integration of new languages. In this paper, we provide a procedure for adding support for a new language into an IR-based verification toolflow. Using our procedure, we extend the SMACK verifier with prototypical support for 6 additional languages. We assess the quality of our extensions through several case studies, and we describe our experience in detail to guide future efforts in this area.

Jack J. Garzella, Marek Baranowski, Shaobo He, Zvonimir Rakamarić

Putting the Squeeze on Array Programs: Loop Verification via Inductive Rank Reduction

Abstract

Automatic verification of array manipulating programs is a challenging problem because it often amounts to the inference of inductive quantified loop invariants which, in some cases, may not even be first-order expressible. In this paper, we suggest a novel verification technique that is based on induction on user-defined rank of program states as an alternative to loop-invariants. Our technique, dubbed inductive rank reduction, works in two steps. Firstly, we simplify the verification problem and prove that the program is correct when the input state contains an input array of length \(\ell _\textsf {B}\) or less, using the length of the array as the rank of the state. Secondly, we employ a squeezing function \(\curlyvee \) which converts a program state \(\sigma \) with an array of length \(\ell > \ell _\textsf {B}\) to a state

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-39322-9_6/487236_1_En_6_IEq5_HTML.gif

containing an array of length \(\ell -1\) or less. We prove that when \(\curlyvee \) satisfies certain natural conditions then if the program violates its specification on \(\sigma \) then it does so also on

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-39322-9_6/487236_1_En_6_IEq9_HTML.gif

. The correctness of the program on inputs with arrays of arbitrary lengths follows by induction.

We make our technique automatic for array programs whose length of execution is proportional to the length of the input arrays by (i) performing the first step using symbolic execution, (ii) verifying the conditions required of \(\curlyvee \) using Z3, and (iii) providing a heuristic procedure for synthesizing \(\curlyvee \). We implemented our technique and applied it successfully to several interesting array-manipulating programs, including a bidirectional summation program whose loop invariant cannot be expressed in first-order logic while its specification is quantifier-free.

Oren Ish-Shalom, Shachar Itzhaky, Noam Rinetzky, Sharon Shoham

A Systematic Approach to Abstract Interpretation of Program Transformations

Abstract

Abstract interpretation is a technique to define sound static analyses. While abstract interpretation is generally well-understood, the analysis of program transformations has not seen much attention. The main challenge in developing an abstract interpreter for program transformations is designing good abstractions that capture relevant information about the generated code. However, a complete abstract interpreter must handle many other aspects of the transformation language, such as backtracking and generic traversals, as well as analysis-specific concerns, such as interprocedurality and fixpoints. This deflects attention.

We propose a systematic approach to design and implement abstract interpreters for program transformations that isolates the abstraction for generated code from other analysis aspects. Using our approach, analysis developers can focus on the design of abstractions for generated code, while the rest of the analysis definition can be reused. We show that our approach is feasible and useful by developing three novel inter-procedural analyses for the Stratego transformation language: a singleton analysis for constant propagation, a sort analysis for type checking, and a locally-illsorted sort analysis that can additionally validate type changing generic traversals.

Sven Keidel, Sebastian Erdweg

Sharing Ghost Variables in a Collection of Abstract Domains

Abstract

We propose a framework in which we share ghost variables across a collection of abstract domains allowing precise proofs of complex properties.

In abstract interpretation, it is often necessary to be able to express complex properties while doing a precise analysis. A way to achieve that is to combine a collection of domains, each handling some kind of properties, using a reduced product. Separating domains allows an easier and more modular implementation, and eases soundness and termination proofs. This way, we can add a domain for any kind of property that is interesting. The reduced product, or an approximation of it, is in charge of refining abstract states, making the analysis precise.

In program verification, ghost variables can be used to ease proofs of properties by storing intermediate values that do not appear directly in the execution.

We propose a reduced product of abstract domains that allows domains to use ghost variables to ease the representation of their internal state. Domains must be totally agnostic with respect to other existing domains. In particular the handling of ghost variables must be entirely decentralized while still ensuring soundness and termination of the analysis.

Marc Chevalier, Jérôme Feret

Harnessing Static Analysis to Help Learn Pseudo-Inverses of String Manipulating Procedures for Automatic Test Generation

Abstract

We present a novel approach based on supervised machine-learning for inverting String Manipulating Procedures (SMPs), i.e., given an SMP \(p:\bar{\Sigma }\rightarrow \bar{\Sigma }\), we compute a partial pseudo-inverse function \(p^{-1}\) such that given a target string \(t\in \overline{\Sigma }\), if \(p^{-1}(t)\ne \bot \) then \(p(p^{-1}(t))=t\). The motivation for addressing this problem is the difficulties faced by modern symbolic execution tools, e.g., KLEE, to find ways to execute loops inside SMPs in a way which produces specific outputs required to enter a specific branch. Thus, we find ourselves in a pleasant situation where program analysis assists machine learning to help program analysis.

Our basic attack on the problem is to train a machine learning algorithm using (output, input) pairs generated by executing p on random inputs. Unfortunately, naively applying this technique is extremely expensive due to the size of the alphabet. To remedy this situation, we present a specialized static analysis algorithm that can drastically reduce the size of the alphabet \(\Sigma \) from which examples are drawn without sacrificing the ability to cover all the behaviors of the analyzed procedure. Our key observation is that often a procedure treats many characters in a particular uniform way: it only copies them from the input to the output in an order-preserving fashion. Our static analysis finds these good characters so that our learning algorithm may consider examples coming from a reduced alphabet containing a single representative good character, thus allowing to produce smaller models while using fewer examples than had the full alphabet been used. We then utilize the learned pseudo-inverse function to invert specific desired outputs by translating a given query to and from the reduced alphabet.

We implemented our approach using two machine learning algorithms and show that indeed our string inverters can find inputs that can drive a selection of procedures taken from real-life software to produce desired outputs, whereas KLEE, a state-of-the-art symbolic execution engine, fails to find such inputs.

Oren Ish-Shalom, Shachar Itzhaky, Roman Manevich, Noam Rinetzky

Synthesizing Environment Invariants for Modular Hardware Verification

Abstract

We automate synthesis of environment invariants for modular hardware verification in processors and application-specific accelerators, where functional equivalence is proved between a high-level specification and a low-level implementation. Invariants are generated and iteratively strengthened by reachability queries in a counterexample-guided abstraction refinement (CEGAR) loop. Within each iteration, we use a syntax-guided synthesis (SyGuS) technique for generating invariants, where we use novel grammars to capture high-level design insights and provide guidance in the search over candidate invariants. Our grammars explicitly capture the separation between control-related and data-related state variables in hardware designs to improve scalability of the enumerative search. We have implemented our SyGuS-based technique on top of an existing Constrained Horn Clause (CHC) solver and have developed a framework for hardware functional equivalence checking that can leverage other available tools and techniques for invariant generation. Our experiments show that our proposed SyGuS-based technique complements or outperforms existing property-directed reachability (PDR) techniques for invariant generation on practical hardware designs, including an AES block encryption accelerator, a Gaussian-Blur image processing accelerator and the PicoRV32 processor.

Hongce Zhang, Weikun Yang, Grigory Fedyukovich, Aarti Gupta, Sharad Malik

Systematic Classification of Attackers via Bounded Model Checking

Abstract

In this work, we study the problem of verification of systems in the presence of attackers using bounded model checking. Given a system and a set of security requirements, we present a methodology to generate and classify attackers, mapping them to the set of requirements that they can break. A naive approach suffers from the same shortcomings of any large model checking problem, i.e., memory shortage and exponential time. To cope with these shortcomings, we describe two sound heuristics based on cone-of-influence reduction and on learning, which we demonstrate empirically by applying our methodology to a set of hardware benchmark systems.

Eric Rothstein-Morris, Jun Sun, Sudipta Chattopadhyay

Cheap CTL Compassion in NuSMV

Abstract

We discuss expansions of \(\mathsf {CTL}\) with connectives able to express Streett fairness objectives for single paths. We focus on \(\mathsf {(E)SFCTL}\): (Extended) Streett-Fair \(\mathsf {CTL}\) inspired by a seminal paper of Emerson and Lei. Unlike several other fair extensions of \(\mathsf {CTL}\), our entire formalism (not just a subclass of formulas in some canonical form) allows a succinct embedding into the \(\mu \)-calculus, while being able to express concisely all relevant types of path-based fairness objectives. We implement our syntax in the well-known symbolic model checker NuSMV, consequently also implementing \(\mathsf {CTL}\) model checking with “compassion” objectives. Since the \(\mu \)-calculus embedding requires only alternation depth two, the resulting specifications correspond to parity games with two priorities. This allows a comparison of the performance of our NuSMV\(^{\mathsf {sf}}\) with existing parity game solvers (both explicit and symbolic). The advantages of the symbolic approach seem to extend to fair model checking.

Daniel Hausmann, Tadeusz Litak, Christoph Rauch, Matthias Zinner

A Cooperative Parallelization Approach for Property-Directed k-Induction

Abstract

Recently presented, IC3-inspired symbolic model checking algorithms strengthen the procedure for showing inductiveness of lemmas expressing reachability of states. These approaches show an impressive performance gain in comparison to previous state-of-the-art, but also present new challenges to portfolio-based, lemma sharing parallelization as the solvers now store lemmas that serve different purposes. In this work we formalize this recent algorithm class for lemma sharing parallel portfolios using two central engines, one for checking inductiveness and the other for checking bounded reachability, and show when the respective engines can share their information. In our implementation based on the PD-KIND algorithm, the approach provides a consistent speed-up already in a multi-core environment, and surpasses in performance the winners of a recent solver competition by a comfortable margin.

Martin Blicha, Antti E. J. Hyvärinen, Matteo Marescotti, Natasha Sharygina

Generalized Property-Directed Reachability for Hybrid Systems

Abstract

Generalized property-directed reachability (GPDR) belongs to the family of the model-checking techniques called IC3/PDR. It has been successfully applied to software verification; for example, it is the core of Spacer, a state-of-the-art Horn-clause solver bundled with Z3. However, it has yet to be applied to hybrid systems, which involve a continuous evolution of values over time. As the first step towards GPDR-based model checking for hybrid systems, this paper formalizes \(\textsc {HGPDR}\), an adaptation of GPDR to hybrid systems, and proves its soundness. We also implemented a semi-automated proof-of-concept verifier, which allows a user to provide hints to guide verification steps.

Kohei Suenaga, Takuya Ishizawa

Language Inclusion for Finite Prime Event Structures

Abstract

We study the problem of language inclusion between finite, labeled prime event structures. Prime event structures are a formalism to compactly represent concurrent behavior of discrete systems. A labeled prime event structure induces a language of sequences of labels produced by the represented system. We study the problem of deciding inclusion and membership for languages encoded by finite prime event structures and provide complexity results for both problems. We provide a family of examples where prime event structures are exponentially more succinct than formalisms that do not take concurrency into account. We provide a decision algorithm for language inclusion that exploits this succinctness. Furthermore, we provide an implementation of the algorithm and an evaluation on a series of benchmarks. Finally, we demonstrate how our results can be applied to mutation-based test case generation.

Andreas Fellner, Thorsten Tarrach, Georg Weissenbacher

Promptness and Bounded Fairness in Concurrent and Parameterized Systems

Abstract

We investigate the satisfaction of specifications in Prompt Linear Temporal Logic (\({\text {Prompt-LTL}}\)) by concurrent systems. Prompt-LTL is an extension of LTL that allows to specify parametric bounds on the satisfaction of eventualities, thus adding a quantitative aspect to the specification language. We establish a connection between bounded fairness, bounded stutter equivalence, and the satisfaction of \({\text {Prompt-LTL}} {\setminus }\mathbf{X} \) formulas. Based on this connection, we prove the first cutoff results for different classes of systems with a parametric number of components and quantitative specifications, thereby identifying previously unknown decidable fragments of the parameterized model checking problem.

Swen Jacobs, Mouhammad Sakr, Martin Zimmermann

Solving Using Approximations

Abstract

Linear arithmetic with stars, \(\mathrm {LIA} ^\star \), is an extension of Presburger arithmetic that allows forming indefinite summations over values that satisfy a formula. It has found uses in decision procedures for multi-sets and for vector addition systems. \(\mathrm {LIA} ^\star \) formulas can be translated back into Presburger arithmetic, but with non-trivial space overhead. In this paper we develop a decision procedure for \(\mathrm {LIA} ^\star \) that checks satisfiability of \(\mathrm {LIA} ^\star \) formulas. By refining on-demand under and over-approximations of \(\mathrm {LIA} ^\star \) formulas, it can avoid the space overhead that is integral to previous approaches. We have implemented our procedure in a prototype and report on encouraging results that suggest that \(\mathrm {LIA} ^\star \) formulas can be checked for satisfiability without computing a prohibitively large equivalent Presburger formula.

Maxwell Levatich, Nikolaj Bjørner, Ruzica Piskac, Sharon Shoham

Formalizing and Checking Multilevel Consistency

Abstract

Developers of distributed data-stores must trade consistency for performance and availability. Such systems may in fact implement weak consistency models, e.g., causal consistency or eventual consistency, corresponding to different costs and guarantees to the clients. We consider the case of distributed systems that offer not just one level of consistency but multiple levels of consistency to the clients. This corresponds to many practical situations. For instance, popular data-stores such as Amazon DynamoDB and Apache’s Cassandra allow applications to tag each query within the same session with a separate consistency level. In this paper, we provide a formal framework for the specification of multilevel consistency, and we address the problem of checking the conformance of a computation to such a specification. We provide a principled algorithmic approach to this problem and apply it to several instances of models with multilevel consistency.

Ahmed Bouajjani, Constantin Enea, Madhavan Mukund, Gautham Shenoy R., S. P. Suresh

Practical Abstractions for Automated Verification of Shared-Memory Concurrency

Abstract

Modern concurrent and distributed software is highly complex. Techniques to reason about the correct behaviour of such software are essential to ensure its reliability. To be able to reason about realistic programs, these techniques must be modular and compositional as well as practical by being supported by automated tools. However, many existing approaches for concurrency verification are theoretical and focus on expressivity and generality. This paper contributes a technique for verifying behavioural properties of concurrent and distributed programs that makes a trade-off between expressivity and usability. The key idea of the approach is that program behaviour is abstractly modelled using process algebra, and analysed separately. The main difficulty is presented by the typical abstraction gap between program implementations and their models. Our approach bridges this gap by providing a deductive technique for formally linking programs with their process-algebraic models. Our verification technique is modular and compositional, is proven sound with Coq, and has been implemented in the automated concurrency verifier VerCors. Moreover, our technique is demonstrated on multiple case studies, including the verification of a leader election protocol.

Wytse Oortwijn, Dilian Gurov, Marieke Huisman

How to Win First-Order Safety Games

Abstract

First-order (FO) transition systems have recently attracted attention for the verification of parametric systems such as network protocols, software-defined networks or multi-agent workflows like conference management systems. Functional correctness or noninterference of these systems have conveniently been formulated as safety or hypersafety properties, respectively. In this article, we take the step from verification to synthesis—tackling the question whether it is possible to automatically synthesize predicates to enforce safety or hypersafety properties like noninterference. For that, we generalize FO transition systems to FO safety games. For FO games with monadic predicates only, we provide a complete classification into decidable and undecidable cases. For games with non-monadic predicates, we concentrate on universal first-order invariants, since these are sufficient to express a large class of properties—for example noninterference. We identify a non-trivial sub-class where invariants can be proven inductive and FO winning strategies be effectively constructed. We also show how the extraction of weakest FO winning strategies can be reduced to SO quantifier elimination itself. We demonstrate the usefulness of our approach by automatically synthesizing nontrivial FO specifications of messages in a leader election protocol as well as for paper assignment in a conference management system to exclude unappreciated disclosure of reports.

Helmut Seidl, Christian Müller, Bernd Finkbeiner

Improving Parity Game Solvers with Justifications

Abstract

Parity games are infinite two-player games played on node-weighted directed graphs. Formal verification problems such as verifying and synthesizing automata, bounded model checking of LTL, CTL*, propositional \(\mu \)-calculus, ... reduce to problems over parity games. The core problem of parity game solving is deciding the winner of some (or all) nodes in a parity game. In this paper, we improve several parity game solvers by using a justification graph. Experimental evaluation shows our algorithms improve upon the state-of-the-art.

Ruben Lapauw, Maurice Bruynooghe, Marc Denecker

Springer Professional

Verification, Model Checking, and Abstract Interpretation

21st International Conference, VMCAI 2020, New Orleans, LA, USA, January 16–21, 2020, Proceedings

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Witnessing Secure Compilation

: Backward Context-Sensitive Flow Reconstruction of Taint Analysis Results

Fixing Code that Explodes Under Symbolic Evaluation

The Correctness of a Code Generator for a Functional Language

Leveraging Compiler Intermediate Representation for Multi- and Cross-Language Verification

Putting the Squeeze on Array Programs: Loop Verification via Inductive Rank Reduction

A Systematic Approach to Abstract Interpretation of Program Transformations

Sharing Ghost Variables in a Collection of Abstract Domains

Harnessing Static Analysis to Help Learn Pseudo-Inverses of String Manipulating Procedures for Automatic Test Generation

Synthesizing Environment Invariants for Modular Hardware Verification

Systematic Classification of Attackers via Bounded Model Checking

Cheap CTL Compassion in NuSMV

A Cooperative Parallelization Approach for Property-Directed k-Induction

Generalized Property-Directed Reachability for Hybrid Systems

Language Inclusion for Finite Prime Event Structures

Promptness and Bounded Fairness in Concurrent and Parameterized Systems

Solving Using Approximations

Formalizing and Checking Multilevel Consistency

Practical Abstractions for Automated Verification of Shared-Memory Concurrency

How to Win First-Order Safety Games

Improving Parity Game Solvers with Justifications

Backmatter