Invited Contributions

Frontmatter

Rethinking Static Analysis by Combining Discrete and Continuous Reasoning

Abstract

Static analyses predominantly use discrete modes of logical reasoning to derive facts about programs. Despite significant strides, this form of reasoning faces new challenges in modern software applications and practices. These challenges concern not only traditional analysis objectives such as scalability, accuracy, and soundness, but also emerging ones such as tailoring analysis conclusions based on relevance or severity of particular code changes, and needs of individual programmers.

We advocate seamlessly extending static analyses to leverage continuous modes of logical reasoning in order to address these challenges. Central to our approach is expressing the specification of the static analysis in a constraint language that is amenable to computing provenance information. We use the logic programming language Datalog as proof-of-concept for this purpose. We illustrate the benefits of exploiting provenance even in the discrete setting. Moreover, by associating weights with constraints, we show how to amplify these benefits in the continuous setting.

We also present open problems in aspects of analysis usability, language expressiveness, and solver techniques. The overall process constitutes a fundamental rethinking of how to design, implement, deploy, and adapt static analyses.

Mayur Naik

Static Analysis of Data Science Software

Abstract

Data science software is playing an increasingly important role in every aspect of our daily lives and is even slowly creeping into mission critical scenarios, despite being often opaque and unpredictable. In this paper, we will discuss some key challenges and a number of research questions that we are currently addressing in developing static analysis methods and tools for data science software.

Caterina Urban

Pointers and Dataflow

Frontmatter

Fast and Precise Handling of Positive Weight Cycles for Field-Sensitive Pointer Analysis

Abstract

By distinguishing the fields of an object, Andersen’s field-sensitive pointer analysis yields better precision than its field-insensitive counterpart. A typical field-sensitive solution to inclusion-based pointer analysis for C/C++ is to add positive weights to the edges in Andersen’s constraint graph to model field access. However, the precise modeling is at the cost of introducing a new type of constraint cycles, called positive weight cycles (PWCs). A PWC, which contains at least one positive weight constraint, can cause infinite and redundant field derivations of an object unless the number of its fields is bounded by a pre-defined value. PWCs significantly affect analysis performance when analyzing large C/C++ programs with heavy use of structs and classes.para This paper presents Dea, a fast and precise approach to handling of PWCs that significantly accelerates existing field-sensitive pointer analyses by using a new field collapsing technique that captures the derivation equivalence of fields derived from the same object when resolving a PWC.para Two fields are derivation equivalent in a PWC if they are always pointed to by the same variables (nodes) in this PWC. A stride-based field representation is proposed to identify and collapse derivation equivalent fields into one, avoiding redundant field derivations with significantly fewer field objects during points-to propagation. We have conducted experiments using 11 open-source C/C++ programs. The evaluation shows that Dea is on average 7.1X faster than Pearce et al.’s field-sensitive analysis (Pkh), obtaining the best speedup of 11.0X while maintaining the same precision.

Yuxiang Lei, Yulei Sui

Per-Dereference Verification of Temporal Heap Safety via Adaptive Context-Sensitive Analysis

Abstract

We address the problem of verifying the temporal safety of heap memory at each pointer dereference. Our whole-program analysis approach is undertaken from the perspective of pointer analysis, allowing us to leverage the advantages of and advances in pointer analysis to improve precision and scalability. A dereference \(\omega \), say, via pointer q is unsafe iff there exists a deallocation \(\psi \), say, via pointer p such that on a control-flow path \(\rho \),p aliases with q (with both pointing to an object o representing an allocation), denoted

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-32304-2_4/488373_1_En_4_IEq4_HTML.gif

, and \(\psi \) reaches \(\omega \) on \(\rho \) via control flow, denoted

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-32304-2_4/488373_1_En_4_IEq8_HTML.gif

. Applying directly any existing pointer analysis, which is typically solved separately with an associated control-flow reachability analysis, will render such verification highly imprecise, since

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-32304-2_4/488373_1_En_4_IEq9_HTML.gif

(i.e., \(\exists \) does not distribute over \(\wedge \)). For precision, we solve

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-32304-2_4/488373_1_En_4_IEq12_HTML.gif

, with a control-flow path \(\rho \) containing an allocation o, a deallocation \(\psi \) and a dereference \(\omega \) abstracted by a tuple of three contexts

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-32304-2_4/488373_1_En_4_IEq16_HTML.gif

. For scalability, a demand-driven full context-sensitive (modulo recursion) pointer analysis, which operates on pre-computed def-use chains with adaptive context-sensitivity, is used to infer

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-32304-2_4/488373_1_En_4_IEq17_HTML.gif

, without losing soundness or precision. Our evaluation shows that our approach can successfully verify the safety of 81.3% (or \(\frac{93,141}{114,508}\)) of all the dereferences in a set of ten C programs totalling 1,166 KLOC.

Hua Yan, Shiping Chen, Yulei Sui, Yueqian Zhang, Changwei Zou, Jingling Xue

A Change-Based Heuristic for Static Analysis with Policy Iteration

Abstract

We improve the policy iteration-based algorithm for value set analysis by giving a new heuristic for policy selection based on a local static analysis. In particular, we detect loops in the program and perform an analysis to discover the relative changes of variables in the loop, that is, whether a variable is constant or whether its value rises, falls or both. We use these relative changes to improve the old heuristic, achieving better (that is, smaller) fixed points than the original approach.

Marcus Völker, Stefan Kowalewski

Syntactic and Semantic Soundness of Structural Dataflow Analysis

Abstract

We show that the classical approach to the soundness of dataflow analysis is with respect to a syntactic path abstraction that may be problematic with respect to a semantics trace-based specification. The fix is a rigorous abstract interpretation based approach to formally construct dataflow analysis algorithms by calculational design.

Patrick Cousot

Languages and Decidability

Frontmatter

Abstract Interpretation of Indexed Grammars

Abstract

Indexed grammars are a generalization of context-free grammars and recognize a proper subset of context-sensitive languages. The class of languages recognized by indexed grammars are called indexed languages and they correspond to the languages recognized by nested stack automata. For example indexed grammars can recognize the language

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-32304-2_7/488373_1_En_7_IEq1_HTML.gif

which is not context-free, but they cannot recognize

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-32304-2_7/488373_1_En_7_IEq2_HTML.gif

which is context-sensitive. Indexed grammars identify a set of languages that are more expressive than context-free languages, while having decidability results that lie in between the ones of context-free and context-sensitive languages. In this work we study indexed grammars in order to formalize the relation between indexed languages and the other classes of languages in the Chomsky hierarchy. To this end, we provide a fixpoint characterization of the languages recognized by an indexed grammar and we study possible ways to abstract, in the abstract interpretation sense, these languages and their grammars into context-free and regular languages.

Marco Campion, Mila Dalla Preda, Roberto Giacobazzi

Language Inclusion Algorithms as Complete Abstract Interpretations

Abstract

We study the language inclusion problem \(L_1 \subseteq L_2\) where \(L_1\) is regular. Our approach relies on abstract interpretation and checks whether an overapproximating abstraction of \(L_1\), obtained by successively overapproximating the Kleene iterates of its least fixpoint characterization, is included in \(L_2\). We show that a language inclusion problem is decidable whenever this overapproximating abstraction satisfies a completeness condition (i.e. its loss of precision causes no false alarm) and prevents infinite ascending chains (i.e. it guarantees termination of least fixpoint computations). Such overapproximating abstraction function on languages can be defined using quasiorder relations on words where the abstraction gives the language of all words “greater than or equal to” a given input word for that quasiorder. We put forward a range of quasiorders that allow us to systematically design decision procedures for different language inclusion problems such as regular languages into regular languages or into trace sets of one-counter nets. In the case of inclusion between regular languages, some of the induced inclusion checking procedures correspond to well-known state-of-the-art algorithms like the so-called antichain algorithms. Finally, we provide an equivalent greatest fixpoint language inclusion check which relies on quotients of languages and, to the best of our knowledge, was not previously known.

Pierre Ganty, Francesco Ranzato, Pedro Valero

On the Monniaux Problem in Abstract Interpretation

Abstract

The Monniaux Problem in abstract interpretation asks, roughly speaking, whether the following question is decidable: given a program P, a safety (e.g., non-reachability) specification \(\varphi \), and an abstract domain of invariants \(\mathcal {D}\), does there exist an inductive invariant \(\mathcal {I}\) in \(\mathcal {D}\) guaranteeing that program P meets its specification \(\varphi \). The Monniaux Problem is of course parameterised by the classes of programs and invariant domains that one considers.

In this paper, we show that the Monniaux Problem is undecidable for unguarded affine programs and semilinear invariants (unions of polyhedra). Moreover, we show that decidability is recovered in the important special case of simple linear loops.

Nathanaël Fijalkow, Engel Lefaucheux, Pierre Ohlmann, Joël Ouaknine, Amaury Pouly, James Worrell

Numerical

Frontmatter

Revisiting Polyhedral Analysis for Hybrid Systems

Abstract

Thanks to significant progress in the adopted implementation techniques, the recent years have witnessed a renewed interest in the development of analysis tools based on the domain of convex polyhedra. In this paper we revisit the application of this abstract domain to the case of reachability analysis for hybrid systems, focusing on the lesson learned during the development of the tool PHAVerLite. In particular, we motivate the implementation of specialized versions of several well known abstract operators, as well as the adoption of a heuristic technique (boxed polyhedra) for the handling of finite collections of polyhedra, showing their impact on the efficiency of the analysis tool.

Anna Becchi, Enea Zaffanella

An Efficient Parametric Linear Programming Solver and Application to Polyhedral Projection

Abstract

Polyhedral projection is a main operation of the polyhedron abstract domain. It can be computed via parametric linear programming (PLP), which is more efficient than the classic Fourier-Motzkin elimination method.

In prior work, PLP was done in arbitrary precision rational arithmetic. In this paper, we present an approach where most of the computation is performed in floating-point arithmetic, then exact rational results are reconstructed.

We also propose a workaround for a difficulty that plagued previous attempts at using PLP for computations on polyhedra: in general the linear programming problems are degenerate, resulting in redundant computations and geometric descriptions.

Hang Yu, David Monniaux

Analysis of Software Patches Using Numerical Abstract Interpretation

Abstract

We present a static analysis for software patches. Given two syntactically close versions of a program, our analysis can infer a semantic difference, and prove that both programs compute the same outputs when run on the same inputs. Our method is based on abstract interpretation, and parametric in the choice of an abstract domain. We focus on numeric properties only. Our method is able to deal with unbounded executions of infinite-state programs, reading from infinite input streams. Yet, it is limited to comparing terminating executions, ignoring non terminating ones.

We first present a novel concrete collecting semantics, expressing the behaviors of both programs at the same time. Then, we propose an abstraction of infinite input streams able to prove that programs that read from the same stream compute equal output values. We then show how to leverage classic numeric abstract domains, such as polyhedra or octagons, to build an effective static analysis. We also introduce a novel numeric domain to bound differences between the values of the variables in the two programs, which has linear cost, and the right amount of relationality to express useful properties of software patches.

We implemented a prototype and experimented on a few small examples from the literature. Our prototype operates on a toy language, and assumes a joint syntactic representation of two versions of a program given, which distinguishes between common and distinctive parts.

David Delmas, Antoine Miné

Verifying Numerical Programs via Iterative Abstract Testing

Abstract

When applying abstract interpretation to verification, it may suffer from the problem of getting too conservative over-approximations to verify a given target property, and being hardly able to generate counter-examples when the property does not hold. In this paper, we propose iterative abstract testing, to create a property-oriented verification approach based on abstract interpretation. Abstract testing employs forward abstract executions (i.e., forward analysis) together with property checking to mimic (regular) testing, and utilizes backward abstract executions (i.e., backward analysis) to derive necessary preconditions that may falsify the target property, and be useful for reducing the input space that needs further exploration. To verify a property, we conduct abstract testing in an iterative manner by utilizing dynamic partitioning to split the input space into sub-spaces such that each sub-space involves fewer program behaviors and may be easier to verify. Moreover, we leverage bounded exhaustive testing to verify bounded small sub-spaces, as a means to complement abstract testing based verification. The experimental results show that our approach has comparable strength with several state-of-the-art verification tools.

Banghu Yin, Liqian Chen, Jiangchao Liu, Ji Wang, Patrick Cousot

Trends: Assuring Machine Learning

Frontmatter

Robustness Verification of Support Vector Machines

Abstract

We study the problem of formally verifying the robustness to adversarial examples of support vector machines (SVMs), a major machine learning model for classification and regression tasks. Following a recent stream of works on formal robustness verification of (deep) neural networks, our approach relies on a sound abstract version of a given SVM classifier to be used for checking its robustness. This methodology is parametric on a given numerical abstraction of real values and, analogously to the case of neural networks, needs neither abstract least upper bounds nor widening operators on this abstraction. The standard interval domain provides a simple instantiation of our abstraction technique, which is enhanced with the domain of reduced affine forms, an efficient abstraction of the zonotope abstract domain. This robustness verification technique has been fully implemented and experimentally evaluated on SVMs based on linear and nonlinear (polynomial and radial basis function) kernels, which have been trained on the popular MNIST dataset of images and on the recent and more challenging Fashion-MNIST dataset. The experimental results of our prototype SVM robustness verifier appear to be encouraging: this automated verification is fast, scalable and shows significantly high percentages of provable robustness on the test set of MNIST, in particular compared to the analogous provable robustness of neural networks.

Francesco Ranzato, Marco Zanella

Analyzing Deep Neural Networks with Symbolic Propagation: Towards Higher Precision and Faster Verification

Abstract

Deep neural networks (DNNs) have been shown lack of robustness, as they are vulnerable to small perturbations on the inputs, which has led to safety concerns on applying DNNs to safety-critical domains. Several verification approaches have been developed to automatically prove or disprove safety properties for DNNs. However, these approaches suffer from either the scalability problem, i.e., only small DNNs can be handled, or the precision problem, i.e., the obtained bounds are loose. This paper improves on a recent proposal of analyzing DNNs through the classic abstract interpretation technique, by a novel symbolic propagation technique. More specifically, the activation values of neurons are represented symbolically and propagated forwardly from the input layer to the output layer, on top of abstract domains. We show that our approach can achieve significantly higher precision and thus can prove more properties than using only abstract domains. Moreover, we show that the bounds derived from our approach on the hidden neurons, when applied to a state-of-the-art SMT based verification tool, can improve its performance. We implement our approach into a software tool and validate it over a few DNNs trained on benchmark datasets such as MNIST, etc.

Jianlin Li, Jiangchao Liu, Pengfei Yang, Liqian Chen, Xiaowei Huang, Lijun Zhang

Synthesis and Security

Frontmatter

Sorcar: Property-Driven Algorithms for Learning Conjunctive Invariants

Abstract

We present a new learning algorithm Sorcar to synthesize conjunctive inductive invariants for proving that a program satisfies its assertions. The salient property of this algorithm is that it is property-driven, and for a fixed finite set of n predicates, guarantees convergence in 2n rounds, taking only polynomial time in each round. We implement and evaluate the algorithm and show that its performance is favorable to the existing Houdini algorithm (which is not property-driven) for a class of benchmarks that prove data race freedom of GPU programs and another class that synthesizes invariants for proving separation logic properties for heap manipulating programs.

Daniel Neider, Shambwaditya Saha, Pranav Garg, P. Madhusudan

Direct Manipulation for Imperative Programs

Abstract

Direct manipulation is a programming paradigm in which the programmer conveys the intended program behavior by modifying program values at runtime. The programming environment then finds a modification of the original program that yields the manipulated values. In this paper, we propose the first framework for direct manipulation of imperative programs. First, we introduce direct state manipulation, which allows programmers to visualize the trace of a buggy program on an input, and modify variable values at a location. Second, we propose a synthesis technique based on program sketching and quantitative objectives to efficiently find the “closest” program to the original one that is consistent with the manipulated values. We formalize the problem and build a tool JDial based on the Sketch synthesizer. We investigate the effectiveness of direct manipulation by using JDial to fix benchmarks from introductory programming assignments. In our evaluation, we observe that direct state manipulations are an effective specification mechanism: even when provided with a single state manipulation, JDial can produce desired program modifications for 66% of our benchmarks while techniques based only on test cases always fail.

Qinheping Hu, Roopsha Samanta, Rishabh Singh, Loris D’Antoni

Responsibility Analysis by Abstract Interpretation

Abstract

Given a behavior of interest in the program, statically determining the corresponding responsible entity is a task of critical importance, especially in program security. Classical static analysis techniques (e.g. dependency analysis, taint analysis, slicing, etc.) assist programmers in narrowing down the scope of responsibility, but none of them can explicitly identify the responsible entity. Meanwhile, the causality analysis is generally not pertinent for analyzing programs, and the structural equations model (SEM) of actual causality misses some information inherent in programs, making its analysis on programs imprecise. In this paper, a novel definition of responsibility based on the abstraction of event trace semantics is proposed, which can be applied in program security and other scientific fields. Briefly speaking, an entity \(E_{R }\) is responsible for behavior \(\mathcal {B}\), if and only if \(E_{R }\) is free to choose its input value, and such a choice is the first one that ensures the occurrence of \(\mathcal {B}\) in the forthcoming execution. Compared to current analysis methods, the responsibility analysis is more precise. In addition, our definition of responsibility takes into account the cognizance of the observer, which, to the best of our knowledge, is a new innovative idea in program analysis.

Chaoqiang Deng, Patrick Cousot

Abstract Semantic Dependency

Abstract

Dependency is a prevalent notion in computer science. There have been numerous informal or formal attempts to define viable syntactic and semantic concepts of dependency in programming languages with subtle variations and limitations. We develop a new value dependency analysis defined by abstract interpretation of a trace semantics. A sound approximate dependency algorithm is formally derived by calculational design. Further abstractions provide information flow, slicing, non-interference, dye, and taint analyses.

Patrick Cousot

Temporal Properties and Termination

Frontmatter

Temporal Verification of Programs via First-Order Fixpoint Logic

Abstract

This paper presents a novel program verification method based on Mu-Arithmetic, a first-order logic with integer arithmetic and predicate-level least/greatest fixpoints. We first show that linear-time temporal property verification of first-order recursive programs can be reduced to the validity checking of a Mu-Arithmetic formula. We also propose a method for checking the validity of Mu-Arithmetic formulas. The method generalizes a reduction from termination verification to safety property verification and reduces validity of a Mu-Arithmetic formula to satisfiability of CHC, which can then be solved by using off-the-shelf CHC solvers. We have implemented an automated prover for Mu-Arithmetic based on the proposed method. By combining the automated prover with a known reduction and the reduction from first-order recursive programs above, we obtain: (i) for while-programs, an automated verification method for arbitrary properties expressible in the modal \(\mu \)-calculus, and (ii) for first-order recursive programs, an automated verification method for arbitrary linear-time properties expressible using Büchi automata. We have applied our Mu-Arithmetic prover to formulas obtained from various verification problems and obtained promising experimental results.

Naoki Kobayashi, Takeshi Nishikawa, Atsushi Igarashi, Hiroshi Unno

A Temporal Logic for Higher-Order Functional Programs

Abstract

We propose an extension of linear temporal logic that we call Linear Temporal Logic of Calls (LTLC) for describing temporal properties of higher-order functions, such as “the function calls its first argument before any call of the second argument.” A distinguishing feature of LTLC is a new modal operator, the call modality, that checks if the function specified by the operator is called in the current step and, if so, describes how the arguments are used in the subsequent computation. We demonstrate expressiveness of the logic, by giving examples of LTLC formulas describing interesting properties. Despite its high expressiveness, the model checking problem is decidable for deterministic programs with finite base types.

Yuya Okuyama, Takeshi Tsukada, Naoki Kobayashi

Multiphase-Linear Ranking Functions and Their Relation to Recurrent Sets

Abstract

Multiphase ranking functions (M\(\varPhi \)RFs) are used to prove termination of loops in which the computation progresses through a number of phases. They consist of linear functions \(\langle f_1,\ldots ,f_d \rangle \) where \(f_i\) decreases during the ith phase. This work provides new insights regarding M\(\varPhi \)RFs for loops described by a conjunction of linear constraints (\( SLC \) loops). In particular, we consider the existence problem (does a given \( SLC \) loop admit a M\(\varPhi \)RF). The decidability and complexity of the problem, in the case that d is restricted by an input parameter, have been settled in recent work, while in this paper we make progress regarding the existence problem without a given depth bound. Our new approach, while falling short of a decision procedure for the general case, reveals some important insights into the structure of these functions. Interestingly, it relates the problem of seeking M\(\varPhi \)RFs to that of seeking recurrent sets (used to prove nontermination). It also helps in identifying classes of loops for which M\(\varPhi \)RFs are sufficient, and thus have linear runtime bounds. For the depth-bounded existence problem, we obtain a new polynomial-time procedure that can provide witnesses for negative answers as well. To obtain this procedure we introduce a new representation for \( SLC \) loops, the difference polyhedron replacing the customary transition polyhedron. We find that this representation yields new insights on M\(\varPhi \)RFs and \( SLC \) loops in general, and some results on termination and nontermination of bounded \( SLC \) loops become straightforward.

Amir M. Ben-Amram, Jesús J. Doménech, Samir Genaim

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter

Invited Contributions

Frontmatter

Rethinking Static Analysis by Combining Discrete and Continuous Reasoning

Static Analysis of Data Science Software

Pointers and Dataflow

Frontmatter

Fast and Precise Handling of Positive Weight Cycles for Field-Sensitive Pointer Analysis

Per-Dereference Verification of Temporal Heap Safety via Adaptive Context-Sensitive Analysis

A Change-Based Heuristic for Static Analysis with Policy Iteration

Syntactic and Semantic Soundness of Structural Dataflow Analysis

Languages and Decidability

Frontmatter

Abstract Interpretation of Indexed Grammars

Language Inclusion Algorithms as Complete Abstract Interpretations

On the Monniaux Problem in Abstract Interpretation

Numerical

Frontmatter

Revisiting Polyhedral Analysis for Hybrid Systems

An Efficient Parametric Linear Programming Solver and Application to Polyhedral Projection

Analysis of Software Patches Using Numerical Abstract Interpretation

Verifying Numerical Programs via Iterative Abstract Testing

Trends: Assuring Machine Learning

Frontmatter

Robustness Verification of Support Vector Machines

Analyzing Deep Neural Networks with Symbolic Propagation: Towards Higher Precision and Faster Verification

Synthesis and Security

Frontmatter

Sorcar: Property-Driven Algorithms for Learning Conjunctive Invariants

Direct Manipulation for Imperative Programs

Responsibility Analysis by Abstract Interpretation

Abstract Semantic Dependency

Temporal Properties and Termination

Frontmatter

Temporal Verification of Programs via First-Order Fixpoint Logic

A Temporal Logic for Higher-Order Functional Programs

Multiphase-Linear Ranking Functions and Their Relation to Recurrent Sets

Backmatter

Premium Partner