Top

2021 | Book

Read chapter Read first chapter

Static Analysis

28th International Symposium, SAS 2021, Chicago, IL, USA, October 17–19, 2021, Proceedings

Editors: Cezara Drăgoi, Suvam Mukherjee, Dr. Kedar Namjoshi

Publisher: Springer International Publishing

Book Series : Lecture Notes in Computer Science

Part of: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

About this book

This book constitutes the refereed proceedings of the 28th International Symposium on Static Analysis, SAS 2021, held in Chicago, IL, USA, in October 2021.

The 18 regular and 4 short papers, carefully reviewed and selected from 48 submissions, are presented in this book together with 1-page summaries of the three invited talks. The papers cover topics such as static program analysis, abstract domain, abstract interpretation, automated deduction, debugging techniques, deductive methods, model checking, data science, program optimizations and transformations, program synthesis, program verification, and security analysis.

Frontmatter

Fast and Efficient Bit-Level Precision Tuning

Abstract

In this article, we introduce a new technique for precision tuning. This problem consists of finding the least data types for numerical values such that the result of the computation satisfies some accuracy requirement. State of the art techniques for precision tuning use a trial-and-error approach. They change the data types of some variables of the program and evaluate the accuracy of the result. Depending on what is obtained, they change more or less data types and repeat the process. Our technique is radically different. Based on semantic equations, we generate an Integer Linear Problem (ILP) from the program source code. Basically, this is done by reasoning on the most significant bit and the number of significant bits of the values which are integer quantities. The integer solution to this problem, computed in polynomial time by a classical linear programming solver, gives the optimal data types at the bit level. A finer set of semantic equations is also proposed which does not reduce directly to an ILP problem. So we use policy iteration to find the solution. Both techniques have been implemented and we show that our results encompass the results of state-of-the-art tools.

Assalé Adjé, Dorra Ben Khalifa, Matthieu Martel

Hash Consed Points-To Sets

Abstract

Points-to analysis is a fundamental static analysis, on which many other analyses and optimisations are built. The goal of points-to analysis is to statically approximate the set of abstract objects that a pointer can point to at runtime. Due to the nature of static analysis, points-to analysis introduces much redundancy which can result in duplicate points-to sets and duplicate set union operations, particularly when analysing large programs precisely. To improve performance, there has been extensive effort in mitigating duplication at the algorithmic level through, for example, cycle elimination and variable substitution.

Unlike previous approaches which make algorithmic changes to points-to analysis, this work aims to improve the underlying data structure, which is less studied. Inspired by hash consing from the functional programming community, this paper introduces the use of hash consed points-to sets to reduce the effects of this duplication on both space and time without any high-level algorithmic change. Hash consing can effectively handle duplicate points-to set by representing points-to sets once, and referring to such representations through references, and can speed up duplicate union operations through efficient memoisation. We have implemented and evaluated our approach using 16 real-world C/C++ programs (more than 9.5 million lines of LLVM instructions). Our results show that our approach speeds up state-of-the-art Andersen’s analysis by 1.85\(\times \) on average (up to 3.21\(\times \)) and staged flow-sensitive analysis (SFS) by 1.69\(\times \) on average (up to 2.23\(\times \)). We also observe an average \(\ge \)4.93\(\times \) (up to \(\ge \)15.52\(\times \)) memory usage reduction for SFS.

Mohamad Barbar, Yulei Sui

Backward Symbolic Execution with Loop Folding

Abstract

Symbolic execution is an established program analysis technique that aims to search all possible execution paths of the given program. Due to the so-called path explosion problem, symbolic execution is usually unable to analyze all execution paths and thus it is not convenient for program verification as a standalone method. This paper focuses on backward symbolic execution (BSE), which searches program paths backwards from the error location whose reachability should be proven or refuted. We show that this technique is equivalent to performing k-induction on control-flow paths. While standard BSE simply unwinds all program loops, we present an extension called loop folding that aims to derive loop invariants during BSE that are sufficient to prove the unreachability of the error location. The resulting technique is called backward symbolic execution with loop folding (BSELF). Our experiments show that BSELF performs better than BSE and other tools based on k-induction when non-trivial benchmarks are considered. Moreover, a sequential combination of symbolic execution and BSELF achieved very competitive results compared to state-of-the-art verification tools.

Marek Chalupa, Jan Strejček

Accelerating Program Analyses in Datalog by Merging Library Facts

Abstract

Static program analysis uses sensitivity to balance between precision and scalability. However, finer sensitivity does not necessarily lead to more precise results but may reduce scalability. Recently, a number of approaches have been proposed to finely tune the sensitivity of different program parts. However, these approaches are usually designed for specific program analyses, and their abstraction adjustments are coarse-grained as they directly drop sensitivity elements.

In this paper, we propose a new technique, 4DM, to tune abstractions for program analyses in Datalog. 4DM merges values in a domain, allowing fine-grained sensitivity tuning. 4DM uses a data-driven algorithm for automatically learning a merging strategy for a library from a training set of programs. Unlike existing approaches that rely on the properties of a certain analysis, our learning algorithm works for a wide range of Datalog analyses. We have evaluated our approach on a points-to analysis and a liveness analysis, on the DaCapo benchmark suite. Our evaluation results suggest that our technique achieves a significant speedup and negligible precision loss, reaching a good balance.

Yifan Chen, Chenyang Yang, Xin Zhang, Yingfei Xiong, Hao Tang, Xiaoyin Wang, Lu Zhang

Static Analysis of Endian Portability by Abstract Interpretation

Abstract

We present a static analysis of endian portability for C programs. Our analysis can infer that a given program, or two syntactically close versions thereof, compute the same outputs when run with the same inputs on platforms with different byte-orders, a.k.a. endiannesses. We target low-level C programs that abuse C pointers and unions, hence rely on implementation-specific behaviors undefined in the C standard.

Our method is based on abstract interpretation, and parametric in the choice of a numerical abstract domain. We first present a novel concrete collecting semantics, relating the behaviors of two versions of a program, running on platforms with different endiannesses. We propose a joint memory abstraction, able to infer equivalence relations between little- and big-endian memories. We introduce a novel symbolic predicate domain to infer relations between individual bytes of the variables in the two programs, which has near-linear cost, and the right amount of relationality to express (bitwise) arithmetic properties relevant to endian portability. We implemented a prototype static analyzer, able to scale to large real-world industrial software, with zero false alarms.

David Delmas, Abdelraouf Ouadjaout, Antoine Miné

Verified Functional Programming of an Abstract Interpreter

Abstract

Abstract interpreters are complex pieces of software: even if the abstract interpretation theory and companion algorithms are well understood, their implementations are subject to bugs, that might question the soundness of their computations.

While some formally verified abstract interpreters have been written in the past, writing and understanding them requires expertise in the use of proof assistants, and requires a non-trivial amount of interactive proofs. This paper presents a formally verified abstract interpreter fully programmed and proved correct in the F* verified programming environment. Thanks to F* refinement types and SMT prover capabilities we demonstrate a substantial saving in proof effort compared to previous works based on interactive proof assistants. Almost all the code of our implementation, proofs included, written in a functional style, are presented directly in the paper.

Lucas Franceschino, David Pichardie, Jean-Pierre Talpin

Disjunctive Interval Analysis

Abstract

We revisit disjunctive interval analysis based on the Boxes abstract domain. We propose the use of what we call range decision diagrams (RDDs) to implement Boxes, and we provide algorithms for the necessary RDD operations. RDDs tend to be more compact than the linear decision diagrams (LDDs) that have traditionally been used for Boxes. Representing information more directly, RDDs also allow for the implementation of more accurate abstract operations. This comes at no cost in terms of analysis efficiency, whether LDDs utilise dynamic variable ordering or not. RDD and LDD implementations are available in the Crab analyzer, and our experiments confirm that RDDs are well suited for disjunctive interval analysis.

Graeme Gange, Jorge A. Navas, Peter Schachte, Harald Søndergaard, Peter J. Stuckey

Static Analysis of ReLU Neural Networks with Tropical Polyhedra

Abstract

This paper studies the problem of range analysis for feedforward neural networks, which is a basic primitive for applications such as robustness of neural networks, compliance to specifications and reachability analysis of neural-network feedback systems. Our approach focuses on ReLU (rectified linear unit) feedforward neural nets that present specific difficulties: approaches that exploit derivatives do not apply in general, the number of patterns of neuron activations can be quite large even for small networks, and convex approximations are generally too coarse. In this paper, we employ set-based methods and abstract interpretation that have been very successful in coping with similar difficulties in classical program verification. We present an approach that abstracts ReLU feedforward neural networks using tropical polyhedra. We show that tropical polyhedra can efficiently abstract ReLU activation function, while being able to control the loss of precision due to linear computations. We show how the connection between ReLU networks and tropical rational functions can provide approaches for range analysis of ReLU neural networks. We report on a preliminary evaluation of our approach using a prototype implementation.

Eric Goubault, Sébastien Palumby, Sylvie Putot, Louis Rustenholz, Sriram Sankaranarayanan

Exploiting Verified Neural Networks via Floating Point Numerical Error

Abstract

Researchers have developed neural network verification algorithms motivated by the need to characterize the robustness of deep neural networks. The verifiers aspire to answer whether a neural network guarantees certain properties with respect to all inputs in a space. However, many verifiers inaccurately model floating point arithmetic but do not thoroughly discuss the consequences.

We show that the negligence of floating point error leads to unsound verification that can be systematically exploited in practice. For a pretrained neural network, we present a method that efficiently searches inputs as witnesses for the incorrectness of robustness claims made by a complete verifier. We also present a method to construct neural network architectures and weights that induce wrong results of an incomplete verifier. Our results highlight that, to achieve practically reliable verification of neural networks, any verification system must accurately (or conservatively) model the effects of any floating point computations in the network inference or verification system.

Kai Jia, Martin Rinard

Verifying Low-Dimensional Input Neural Networks via Input Quantization

Abstract

Deep neural networks are an attractive tool for compressing the control policy lookup tables in systems such as the Airborne Collision Avoidance System (ACAS). It is vital to ensure the safety of such neural controllers via verification techniques. The problem of analyzing ACAS Xu networks has motivated many successful neural network verifiers. These verifiers typically analyze the internal computation of neural networks to decide whether a property regarding the input/output holds. The intrinsic complexity of neural network computation renders such verifiers slow to run and vulnerable to floating-point error.

This paper revisits the original problem of verifying ACAS Xu networks. The networks take low-dimensional sensory inputs with training data provided by a precomputed lookup table. We propose to prepend an input quantization layer to the network. Quantization allows efficient verification via input state enumeration, whose complexity is bounded by the size of the quantization space. Quantization is equivalent to nearest-neighbor interpolation at run time, which has been shown to provide acceptable accuracy for ACAS in simulation. Moreover, our technique can deliver exact verification results immune to floating-point error if we directly enumerate the network outputs on the target inference implementation or on an accurate simulation of the target implementation.

Kai Jia, Martin Rinard

Data Abstraction: A General Framework to Handle Program Verification of Data Structures

Abstract

Proving properties on programs accessing data structures such as arrays often requires universally quantified invariants, e.g., “all elements below index i are nonzero”. In this article, we propose a general data abstraction scheme operating on Horn formulas, into which we recast previously published abstractions. We show that our instantiation scheme is relatively complete: the generated purely scalar Horn clauses have a solution (inductive invariants) if and only if the original problem has one expressible by the abstraction.

Julien Braine, Laure Gonnord, David Monniaux

Toward Neural-Network-Guided Program Synthesis and Verification

Abstract

We propose a novel framework of program and invariant synthesis called neural network-guided synthesis. We first show that, by suitably designing and training neural networks, we can extract logical formulas over integers from the weights and biases of the trained neural networks. Based on the idea, we have implemented a tool to synthesize formulas from positive/negative examples and implication constraints, and obtained promising experimental results. We also discuss an application of our method for improving the qualifier discovery in the framework of ICE-learning-based CHC solving, which can in turn be applied to program verification and inductive invariant synthesis. Another potential application is to a neural-network-guided variation of Solar-Lezama’s program synthesis by sketching.

Naoki Kobayashi, Taro Sekiyama, Issei Sato, Hiroshi Unno

Selective Context-Sensitivity for k-CFA with CFL-Reachability

Abstract

k-CFA provides the most well-known context abstraction for program analysis, especially pointer analysis, for a wide range of programming languages. However, its inherent context explosion problem has hindered its applicability. To mitigate this problem, selective context-sensitivity is promising as context-sensitivity is applied only selectively to some parts of the program. This paper introduces a new approach to selective context-sensitivity for supporting k-CFA-based pointer analysis, based on CFL-reachability. Our approach can make k-CFA-based pointer analysis run significantly faster while losing little precision, based on an evaluation using a set of 11 popular Java benchmarks and applications.

Jingbo Lu, Dongjie He, Jingling Xue

Selectively-Amortized Resource Bounding

Abstract

We consider the problem of automatically proving resource bounds. That is, we study how to prove that an integer-valued resource variable is bounded by a given program expression. Automatic resource-bound analysis has recently received significant attention because of a number of important applications (e.g., detecting performance bugs, preventing algorithmic-complexity attacks, identifying side-channel vulnerabilities), where the focus has often been on developing precise amortized reasoning techniques to infer the most exact resource usage. While such innovations remain critical, we observe that fully precise amortization is not always necessary to prove a bound of interest. And in fact, by amortizing selectively, the needed supporting invariants can be simpler, making the invariant inference task more feasible and predictable. We present a framework for selectively-amortized analysis that mixes worst-case and amortized reasoning via a property decomposition and a program transformation. We show that proving bounds in any such decomposition yields a sound resource bound in the original program, and we give an algorithm for selecting a reasonable decomposition.

Tianhan Lu, Bor-Yuh Evan Chang, Ashutosh Trivedi

Reduced Products of Abstract Domains for Fairness Certification of Neural Networks

Abstract

We present Libra, an open-source abstract interpretation-based static analyzer for certifying fairness of ReLU neural network classifiers for tabular data. Libra combines a sound forward pre-analysis with an exact backward analysis that leverages the polyhedra abstract domain to provide definite fairness guarantees when possible, and to otherwise quantify and describe the biased input space regions. The analysis is configurable in terms of scalability and precision. We equipped Libra with new abstract domains to use in the pre-analysis, including a generic reduced product domain construction, as well as search heuristics to find the best analysis configuration. We additionally set up the backward analysis to allow further parallelization. Our experimental evaluation demonstrates the effectiveness of the approach on neural networks trained on a popular dataset in the fairness literature.

Denis Mazzucato, Caterina Urban

A Multilanguage Static Analysis of Python Programs with Native C Extensions

Abstract

Modern programs are increasingly multilanguage, to benefit from each programming language’s advantages and to reuse libraries. For example, developers may want to combine high-level Python code with low-level, performance-oriented C code. In fact, one in five of the 200 most downloaded Python libraries available on GitHub contains C code. Static analyzers tend to focus on a single language and may use stubs to model the behavior of foreign function calls. However, stubs are costly to implement and undermine the soundness of analyzers. In this work, we design a static analyzer by abstract interpretation that can handle Python programs calling C extensions. It analyses directly and fully automatically both the Python and the C source codes. It reports runtime errors that may happen in Python, in C, and at the interface. We implemented our analysis in a modular fashion: it reuses off-the-shelf C and Python analyses written in the same analyzer. This approach allows sharing between abstract domains of different languages. Our analyzer can tackle tests of real-world libraries a few thousand lines of C and Python long in a few minutes.

Raphaël Monat, Abdelraouf Ouadjaout, Antoine Miné

Automated Verification of the Parallel Bellman–Ford Algorithm

Abstract

Many real-world problems such as internet routing are actually graph problems. To develop efficient solutions to such problems, more and more parallel graph algorithms are proposed. This paper discusses the mechanized verification of a commonly used parallel graph algorithm, namely the Bellman–Ford algorithm, which provides an inherently parallel solution to the Single-Source Shortest Path problem.

Concretely, we verify an unoptimized GPU version of the Bellman–Ford algorithm, using the VerCors verifier. The main challenge that we had to address was to find suitable global invariants of the graph-based properties for automated verification. This case study is the first deductive verification to prove functional correctness of the parallel Bellman–Ford algorithm. It provides the basis to verify other, optimized implementations of the algorithm. Moreover, it may also provide a good starting point to verify other parallel graph-based algorithms.

Mohsen Safari, Wytse Oortwijn, Marieke Huisman

Improving Thread-Modular Abstract Interpretation

Abstract

We give thread-modular non-relational value analyses as abstractions of a local trace semantics. The semantics as well as the analyses are formulated by means of global invariants and side-effecting constraint systems. We show that a generalization of the analysis provided by the static analyzer Goblint as well as a natural improvement of Antoine Miné’s approach can be obtained as instances of this general scheme. We show that these two analyses are incomparable w.r.t. precision and provide a refinement which improves on both precision-wise. We also report on a preliminary experimental comparison of the given analyses on a meaningful suite of benchmarks.

Michael Schwarz, Simmo Saan, Helmut Seidl, Kalmer Apinis, Julian Erhard, Vesal Vojdani

Thread-Modular Analysis of Release-Acquire Concurrency

Abstract

We present a thread-modular abstract interpretation (TMAI) technique to verify programs under the release-acquire (RA) memory model for safety property violations. The main contributions of our work are: we capture the execution order of program statements as an abstract domain, and propose a sound upper approximation over this domain to efficiently reason over RA concurrency. The proposed domain is general in its application and captures the ordering relations as a first-class feature in the abstract interpretation theory. In particular, the domain represents a set of sequences of modifications of a global variable in concurrent programs as a partially ordered set. Under the upper approximation, older sequenced-before stores of a global variable are forgotten and only the latest stores per variable are preserved. We establish the soundness of our proposed abstractions and implement them in a prototype abstract interpreter called PRIORI. The evaluations of PRIORI on existing and challenging RA benchmarks demonstrate that the proposed technique is not only competitive in refutation, but also in verification. PRIORI shows significantly fast analysis runtimes with higher precision compared to recent state-of-the-art tools for RA concurrency.

Divyanjali Sharma, Subodh Sharma

Symbolic Automatic Relations and Their Applications to SMT and CHC Solving

Abstract

Despite the recent advance of automated program verification, reasoning about recursive data structures remains as a challenge for verification tools and their backends such as SMT and CHC solvers. To address the challenge, we introduce the notion of symbolic automatic relations (SARs), which combines symbolic automata and automatic relations, and inherits their good properties such as the closure under Boolean operations. We consider the satisfiability problem for SARs, and show that it is undecidable in general, but that we can construct a sound (but incomplete) and automated satisfiability checker by a reduction to CHC solving. We discuss applications to SMT and CHC solving on data structures, and show the effectiveness of our approach through experiments.

Takumi Shimoda, Naoki Kobayashi, Ken Sakayori, Ryosuke Sato

Compositional Verification of Smart Contracts Through Communication Abstraction

Abstract

Solidity smart contracts are programs that manage up to \(2^{160}\) users on a blockchain. Verifying a smart contract relative to all users is intractable due to state explosion. Existing solutions either restrict the number of users to under-approximate behaviour, or rely on manual proofs. In this paper, we present local bundles that reduce contracts with arbitrarily many users to sequential programs with a few representative users. Each representative user abstracts concrete users that are locally symmetric to each other relative to the contract and the property. Our abstraction is semi-automated. The representatives depend on communication patterns, and are computed via static analysis. A summary for the behaviour of each representative is provided manually, but a default summary is often sufficient. Once obtained, a local bundle is amenable to sequential static analysis. We show that local bundles are relatively complete for parameterized safety verification, under moderate assumptions. We implement local bundle abstraction in SmartACE, and show order-of-magnitude speedups compared to a state-of-the-art verifier.

Scott Wesley, Maria Christakis, Jorge A. Navas, Richard Trefler, Valentin Wüstholz, Arie Gurfinkel

Automatic Synthesis of Data-Flow Analyzers

Abstract

Data-flow analyzers (DFAs) are widely deployed in many stages of software development, such as compiler optimization, bug detection, and program verification. Automating their synthesis is non-trivial but will be practically beneficial. In this paper, we propose DFASy, a framework for the automatic synthesis of DFAs. Given a specification consisting of a control flow graph and the expected data-flow facts before and after each of its nodes, DFASy automatically synthesizes a DFA that satisfies the specification, including its flow direction, meet operator, and transfer function. DFASy synthesizes transfer functions by working with a domain-specific language that supports rich data-flow fact extraction operations, set operations, and logic operations. To avoid exploding the search space, we introduce an abstraction-guided pruning technique to assess the satisfiability of partially instantiated candidates and drop unsatisfiable ones from further consideration as early as possible. In addition, we also introduce a brevity-guided pruning technique to improve the readability and simplicity of synthesized DFAs and further accelerate the search. We have built a benchmark suite, which consists of seven classic (e.g., live variable analysis and null pointer detection) and seven custom data-flow problems. DFASy has successfully solved all the 14 data-flow problems in 21.8 s on average, outperforming significantly the three baselines compared. Both DFASy and its associated benchmark suite have been open-sourced.

Xuezheng Xu, Xudong Wang, Jingling Xue

Backmatter

Title: Static Analysis
Editors: Cezara Drăgoi
Suvam Mukherjee
Dr. Kedar Namjoshi
Publisher: Springer International Publishing
Electronic ISBN: 978-3-030-88806-0
Print ISBN: 978-3-030-88805-3
DOI: https://doi.org/10.1007/978-3-030-88806-0

Springer Professional

Static Analysis

28th International Symposium, SAS 2021, Chicago, IL, USA, October 17–19, 2021, Proceedings

About this book

Table of Contents

Frontmatter

Fast and Efficient Bit-Level Precision Tuning

Hash Consed Points-To Sets

Backward Symbolic Execution with Loop Folding

Accelerating Program Analyses in Datalog by Merging Library Facts

Static Analysis of Endian Portability by Abstract Interpretation

Verified Functional Programming of an Abstract Interpreter

Disjunctive Interval Analysis

Static Analysis of ReLU Neural Networks with Tropical Polyhedra

Exploiting Verified Neural Networks via Floating Point Numerical Error

Verifying Low-Dimensional Input Neural Networks via Input Quantization

Data Abstraction: A General Framework to Handle Program Verification of Data Structures

Toward Neural-Network-Guided Program Synthesis and Verification

Selective Context-Sensitivity for k-CFA with CFL-Reachability

Selectively-Amortized Resource Bounding

Reduced Products of Abstract Domains for Fairness Certification of Neural Networks

A Multilanguage Static Analysis of Python Programs with Native C Extensions

Automated Verification of the Parallel Bellman–Ford Algorithm

Improving Thread-Modular Abstract Interpretation

Thread-Modular Analysis of Release-Acquire Concurrency

Symbolic Automatic Relations and Their Applications to SMT and CHC Solving

Compositional Verification of Smart Contracts Through Communication Abstraction

Automatic Synthesis of Data-Flow Analyzers

Backmatter

Premium Partner