Invited Talk

Unleashing the Power of Static Analysis

Abstract

The last few years have seen a surge of activity in the static analysis community on the application of static analysis to program verification and defect detection. Researchers have long believed in the benefit of exposing and fixing potential defects in a program before it is ever run, especially when the program can be made correct by construction, as in the case of compiler-enforced type systems. But every static analysis tool (other than a compiler’s type checker) ever built, no matter how precise, suffers from the same fatal flaw in the eyes of the programmer: Defect reports do not come with known user scenarios that expose the defects. Therefore, programmers have been loathe to examine and fix defect reports produced by static analysis tools as a routine part of the software development process. In spite of recent advancements in analysis techniques, there are no papers we are aware of that report programmers fixing more than a few dozen defects.

Manuvir Das

Session 1

Static Analysis in Disjunctive Numerical Domains

Abstract

The convexity of numerical domains such as polyhedra, octagons, intervals and linear equalities enables tractable analysis of software for buffer overflows, null pointer dereferences and floating point errors. However, convexity also causes the analysis to fail in many common cases. Powerset extensions can remedy this shortcoming by considering disjunctions of predicates. Unfortunately, analysis using powerset domains can be exponentially more expensive as compared to analysis on the base domain. In this paper, we prove structural properties of fixed points computed in commonly used powerset extensions. We show that a fixed point computed on a powerset extension is also a fixed point in the base domain computed on an “elaboration” of the program’s CFG structure. Using this insight, we build analysis algorithms that approach path sensitive static analysis algorithms by performing the fixed point computation on the base domain while discovering an “elaboration” on the fly. Using restrictions on the nature of the elaborations, we design algorithms that scale polynomially in terms of the number of disjuncts. We have implemented a light-weight static analyzer for C programs with encouraging initial results.

Sriram Sankaranarayanan, Franjo Ivančić, Ilya Shlyakhter, Aarti Gupta

Static Analysis of Numerical Algorithms

Abstract

We present a new numerical abstract domain for static analysis of the errors introduced by the approximation by floating-point arithmetic of real numbers computation, by abstract interpretation [3]. This work extends a former domain [4,8], with an implicitly relational domain for the approximation of the floating-point values of variables, based on affine arithmetic [2]. It allows us to analyze non trivial numerical computations, that no other abstract domain we know of can analyze with such precise results, such as linear recursive filters of different orders, Newton methods for solving non-linear equations, polynomial iterations, conjugate gradient algorithms.

Eric Goubault, Sylvie Putot

Static Analysis of String Manipulations in Critical Embedded C Programs

Abstract

This paper describes a new static analysis to show the absence of memory errors, especially string buffer overflows in C programs. The analysis is specifically designed for the subset of C that is found in critical embedded software. It is based on the theory of abstract interpretation and relies on an abstraction of stores that retains the length of string buffers. A transport structure allows to change the granularity of the abstraction and to concisely define several inherently complex abstract primitives such as destructive update and string copy. The analysis integrates several features of the C language such as multi-dimensional arrays, structures, pointers and function calls. A prototype implementation produces encouraging results in early experiments.

Xavier Allamigeon, Wenceslas Godard, Charles Hymans

Session 2

Abstract Regular Tree Model Checking of Complex Dynamic Data Structures

Abstract

We consider the verification of non-recursive C programs manipulating dynamic linked data structures with possibly several next pointer selectors and with finite domain non-pointer data. We aim at checking basic memory consistency properties (no null pointer assignments, etc.) and shape invariants whose violation can be expressed in an existential fragment of a first order logic over graphs. We formalise this fragment as a logic for specifying bad memory patterns whose formulae may be translated to testers written in C that can be attached to the program, thus reducing the verification problem considered to checking reachability of an error control line. We encode configurations of programs, which are essentially shape graphs, in an original way as extended tree automata and we represent program statements by tree transducers. Then, we use the abstract regular tree model checking framework for a fully automated verification. The method has been implemented and successfully applied on several case studies.

Ahmed Bouajjani, Peter Habermehl, Adam Rogalewicz, Tomáš Vojnar

Structural Invariants

Abstract

We present structural invariants (SI), a new technique for incrementally overapproximating the verification condition of a program in static single assignment form by making a linear pass over the dominator tree of the program. The 1-level SI at a program location is the conjunction of all dominating program statements viewed as constraints. For any k, we define a k-level SI by recursively strengthening the dominating join points of the 1-level SI with the (k – 1)-level SI of the predecessors of the join point, thereby providing a tunable selector to add path-sensitivity incrementally. By ignoring program paths, the size of the SI and correspondingly the time to discharge the validity query remains small, allowing the technique to scale to large programs. We show experimentally that even with k ≤2, for a set of open-source programs totaling 570K lines and properties for which specialized analyses have been previously devised, our method provides an automatic and scalable algorithm with a low false positive rate.

Ranjit Jhala, Rupak Majumdar, Ru-Gang Xu

Existential Label Flow Inference Via CFL Reachability

Abstract

In programming languages, existential quantification is useful for describing relationships among members of a structured type. For example, we may have a list in which there exists some mutual exclusion lock l in each list element such that l protects the data stored in that element. With this information, a static analysis can reason about the relationship between locks and locations in the list even when the precise identity of the lock and/or location is unknown. To facilitate the construction of such static analyses, this paper presents a context-sensitive label flow analysis algorithm with support for existential quantification. Label flow analysis is a core part of many static analysis systems. Following Rehof et al, we use context-free language (CFL) reachability to develop an efficient O(n ³) label flow inference algorithm. We prove the algorithm sound by reducing its derivations to those in a system based on polymorphically-constrained types, in the style of Mossin. We have implemented a variant of our analysis as part of a data race detection tool for C programs.

Polyvios Pratikakis, Jeffrey S. Foster, Michael Hicks

Session 3

Abstract Interpretation with Specialized Definitions

Abstract

The relationship between abstract interpretation and partial evaluation has received considerable attention and (partial) integrations have been proposed starting from both the partial evaluation and abstract interpretation perspectives. In this work we present what we argue is the first generic algorithm for efficient and precise integration of abstract interpretation and partial evaluation from an abstract interpretation perspective. Taking as starting point state-of-the-art algorithms for context-sensitive, polyvariant abstract interpretation and (abstract) partial evaluation of logic programs, we present an algorithm which combines the best of both worlds. Key ingredients include the accurate success propagation inherent to abstract interpretation and the powerful program transformations achievable by partial deduction. In our algorithm, the calls which appear in the analysis graph are not analyzed w.r.t. the original definition of the procedure but w.r.t. specialized definitions of these procedures. Such specialized definitions are obtained by applying both unfolding and abstract executability. Also, our framework is parametric w.r.t. different control strategies and abstract domains. Different combinations of these parameters correspond to existing algorithms for program analysis and specialization. Our approach efficiently computes strictly more precise results than those achievable by each of the individual techniques. The algorithm is one of the key components of CiaoPP, the analysis and specialization system of the Ciao compiler.

Germán Puebla, Elvira Albert, Manuel Hermenegildo

Underapproximating Predicate Transformers

Abstract

We study the underapproximation of the predicate transformers used to give semantics to the modalities in dynamic and temporal logic. Because predicate transformers operate on state sets, we define appropriate powerdomains for sound approximation. We study four such domains — two are based on “set inclusion” approximation, and two are based on “quantification” approximation — and we apply the domains to synthesize the most precise, underapproximating \(\widetilde{pre}\) and pre transformers, in the latter case, introducing a focus operation. We also show why the expected abstractions of post and \(\widetilde{post}\) are unsound, and we use the powerdomains to guide us to correct, sound underapproximations.

David A. Schmidt

Combining Widening and Acceleration in Linear Relation Analysis

Abstract

Linear Relation Analysis [CH78, Hal79] is one of the first, but still one of the most powerful, abstract interpretations working in an infinite lattice. As such, it makes use of a widening operator to enforce the convergence of fixpoint computations. While the approximation due to widening can be arbitrarily refined by delaying the application of widening, the analysis quickly becomes too expensive with the increase of delay. Previous attempts at improving the precision of widening are not completely satisfactory, since none of them is guaranteed to improve the precision of the result, and they can nevertheless increase the cost of the analysis. In this paper, we investigate an improvement of Linear Relation Analysis consisting in computing, when possible, the exact (abstract) effect of a loop. This technique is fully compatible with the use of widening, and whenever it applies, it improves both the precision and the performance of the analysis.

Laure Gonnord, Nicolas Halbwachs

Beyond Iteration Vectors: Instancewise Relational Abstract Domains

Abstract

We introduce a formalism to reason about program properties at an infinite number of runtime control points, called instances. Infinite sets of instances are represented by rational languages. This framework gives a formal foundation to the well known concept of iteration vectors, extending it to recursive programs with any structured control flow (nested loops and recursive calls). We also extend the concept of induction variables to recursive programs. For a class of monoid-based data structures, including arrays and trees, induction variables capture the exact memory location accessed at every step of the execution. This compile-time characterization is computed in polynomial time as a rational function. Applications include dependence and region analysis for array and tree algorithms, array expansion, and automatic parallelization of recursive programs.

Pierre Amiranoff, Albert Cohen, Paul Feautrier

Invited Talk

Separation Logic and Program Analysis

Abstract

Separation logic is a program logic for reasoning about programs that manipulate pointer data structures. It has a strong form of modularity or locality built in, and has led to simpler by-hand proofs of pointer algorithms than was possible in previous formalisms. It is natural to wonder whether, and in what way, its ideas might be used in program analysis.

In this talk I will begin by describing the basics of separation logic, along the way connecting them to concepts from program analysis. I will then describe some initial, unsuccessful attempts at applying the formalism. This is done in an effort to convey that some of the first ideas at application do not work well, and also to help pin down what some of the central outstanding issues are. This will then lead on to, and partially justify, one way of organizing a program analysis, where the abstract domain is built from formulae in separation logic. Finally, I will survey some recent developments and speculate on further possibilities.

Peter W. O’Hearn

Beyond Reachability: Shape Abstraction in the Presence of Pointer Arithmetic

Abstract

Previous shape analysis algorithms use a memory model where the heap is composed of discrete nodes that can be accessed only via access paths built from variables and field names, an assumption that is violated by pointer arithmetic. In this paper we show how this assumption can be removed, and pointer arithmetic embraced, by using an analysis based on separation logic. We describe an abstract domain whose elements are certain separation logic formulae, and an abstraction mechanism that automatically transits between a low-level RAM view of memory and a higher, fictional, view that abstracts from the representation of nodes and multiword linked-lists as certain configurations of the RAM. A widening operator is used to accelerate the analysis. We report experimental results obtained from running our analysis on a number of classic algorithms for dynamic memory management.

Cristiano Calcagno, Dino Distefano, Peter W. O’Hearn, Hongseok Yang

Session 4

Specialized 3-Valued Logic Shape Analysis Using Structure-Based Refinement and Loose Embedding

Abstract

We consider a shape analysis framework based on 3-valued logic, and explore ways for improving its performance and scalability by means of reducing algorithmic overhead and restraining abstract state set inflation. First we propose a new approach to implementing a fast 3-valued logic analyzer, which replaces a general-purpose abstract heap refinement mechanism—accounting for most of the time spent by the reference implementation—with tailored structure-based refinement. We apply our framework to analyze a set of small Java programs manipulating singly- and doubly-linked lists, obtaining results that are comparable to those of the reference implementation, with a process 40-85 times faster and 2-11 times less memory consuming. We then propose a new definition for partial ordering of abstract heap descriptors (embedding), that trims abstract states representing “special cases” in the presence of a state representing a “general case”. This extension deflates sets of abstract states by a combinatorial factor, resulting in 45-55% less structures for the same set of benchmarks. Despite its induced algorithmic overhead per operation, this modification further cuts the analysis time by 17-50%. We argue that improving on these two axes together yields a promise for greater applicability of specialized shape analysis to real-life programs.

Gilad Arnold

Recency-Abstraction for Heap-Allocated Storage

Abstract

In this paper, we present an abstraction for heap-allocated storage, called the recency-abstraction, that allows abstract-interpretation algorithms to recover some non-trivial information for heap-allocated data objects. As an application of the recency-abstraction, we show how it can resolve virtual-function calls in stripped executables (i.e., executables from which debugging information has been removed). This approach succeeded in resolving 55% of virtual-function call-sites, whereas previous tools for analyzing executables fail to resolve any of the virtual-function call-sites.

Gogul Balakrishnan, Thomas Reps

Interprocedural Shape Analysis with Separated Heap Abstractions

Abstract

We describe an interprocedural shape analysis that makes use of spatial locality (i.e. the fact that most procedures modify only a small subset of the heap) in its representation of abstract states. Instead of tracking reachability information directly and aliasing information indirectly, our representation tracks reachability indirectly and aliasing directly. Computing the effect of procedure calls and returns on an abstract state is easy because the representation exhibits spatial locality mirroring the locality that is present in the concrete semantics. The benefits of this approach include improved speed, support for programs that deallocate memory, the handling of bounded numbers of heap cutpoints, and support for cyclic and shared data structures.

Alexey Gotsman, Josh Berdine, Byron Cook

Automated Verification of the Deutsch-Schorr-Waite Tree-Traversal Algorithm

Abstract

This paper reports on the automated verification of the total correctness (partial correctness and termination) of the Deutsch-Schorr-Waite (DSW) algorithm. DSW is an algorithm for traversing a binary tree without the use of a stack by means of destructive pointer manipulation. Prior approaches to the verification of the algorithm involved applications of theorem provers or hand-written proofs. TVLA’s abstract-interpretation approach made possible the automatic symbolic exploration of all memory configurations that can arise. With the introduction of a few simple core and instrumentation relations, TVLA was able to establish the partial correctness and termination of DSW.

Alexey Loginov, Thomas Reps, Mooly Sagiv

Invited Talk

Shape Analysis for Low-Level Code

Abstract

Shape analysis algorithms statically infer deep properties of the runtime heap, such as whether a variable points to a cyclic or acyclic linked list. Previous shape analyses have tended to avoid features of low-level programming languages, such as memory disposal and pointer arithmetic. Yet, these features are used in many important programs, particularly systems programs.

In this talk I will describe how shape analysis for low-level code can be done with separation logic. A crucial element of the approach is the way it negotiates a transit between a low-level RAM view of memory and a higher, fictional, view that abstracts from the representation of nodes and linked structures as certain configurations of the RAM. The analysis algorithm can be seen as conducting a proof search in separation logic, and I will show how this provides a flexible way of exploring non-standard optimizations, while maintaining soundness.

Hongseok Yang

Session 5

Catching and Identifying Bugs in Register Allocation

Abstract

Although there are many register allocation algorithms that work well, it can be difficult to correctly implement these algorithms. As a result, it is common for bugs to remain in the register allocator, even after the compiler is released. The register allocator may run, but bugs can cause it to produce incorrect output code. The output program may even execute properly on some test data, but errors can remain. In this paper, we propose novel data flow analyses to statically check that the output code from the register allocator is correct in terms of its data dependences. The approach is accurate, fast, and can identify and report error locations and types. No false alarms are produced. The paper describes our approach, called SARAC, and a tool, called ra-analyzer, that statically checks a register allocation and reports the errors it finds. The tool has an average compile-time overhead of only 8% and a modest average memory overhead of 85KB.

Yuqiang Huang, Bruce R. Childers, Mary Lou Soffa

Certificate Translation for Optimizing Compilers

(Extended Abstract)

Abstract

Certifying compilation provides a means to ensure that untrusted mobile code satisfies its functional specification. A certifying compiler generates code as well as a machine-checkable “certificate”, i.e. a formal proof that establishes adherence of the code to specified properties. While certificates for safety properties can be built fully automatically, certificates for more expressive and complex properties often require the use of interactive code verification. We propose a technique to provide code consumers with the benefits of interactive source code verification. Our technique, certificate translation, extends program transformations by offering the means to turn certificates of functional correctness for programs in high-level languages into certificates for executable code. The article outlines the principles of certificate translation, using specifications written in first order logic. This translation is instantiated for standard compiler optimizations in the context of an intermediate RTL Language.

Gilles Barthe, Benjamin Grégoire, César Kunz, Tamara Rezk

Analysis of Low-Level Code Using Cooperating Decompilers

Abstract

Analysis or verification of low-level code is useful for minimizing the disconnect between what is verified and what is actually executed and is necessary when source code is unavailable or is, say, intermingled with inline assembly. We present a modular framework for building pipelines of cooperating decompilers that gradually lift the level of the language to something appropriate for source-level tools. Each decompilation stage contains an abstract interpreter that encapsulates its findings about the program by translating the program into a higher-level intermediate language. We provide evidence for the modularity of this framework through the implementation of multiple decompilation pipelines for both x86 and MIPS assembly produced by gcc, gcj, and coolc (a compiler for a pedagogical Java-like language) that share several low-level components. Finally, we discuss our experimental results that apply the BLAST model checker for C and the Cqual analyzer to decompiled assembly.

Bor-Yuh Evan Chang, Matthew Harren, George C. Necula

Session 6

Static Analysis for Java Servlets and JSP

Abstract

We present an approach for statically reasoning about the behavior of Web applications that are developed using Java Servlets and JSP. Specifically, we attack the problems of guaranteeing that all output is well-formed and valid XML and ensuring consistency of XHTML form fields and session state. Our approach builds on a collection of program analysis techniques developed earlier in the JWIG and Xact projects, combined with work on balanced context-free grammars. Together, this provides the necessary foundation concerning reasoning about output streams and application control flow.

Christian Kirkegaard, Anders Møller

Cryptographically-Masked Flows

Abstract

Cryptographic operations are essential for many security-critical systems. Reasoning about information flow in such systems is challenging because typical (noninterference-based) information-flow definitions allow no flow from secret to public data. Unfortunately, this implies that programs with encryption are ruled out because encrypted output depends on secret inputs: the plaintext and the key. However, it is desirable to allow flows arising from encryption with secret keys provided that the underlying cryptographic algorithm is strong enough. In this paper we conservatively extend the noninterference definition to allow safe encryption, decryption, and key generation. To illustrate the usefulness of this approach, we propose (and implement) a type system that guarantees noninterference for a small imperative language with primitive cryptographic operations. The type system prevents dangerous program behavior (e.g., giving away a secret key or confusing keys and non-keys), which we exemplify with secure implementations of cryptographic protocols. Because the model is based on a standard noninterference property, it allows us to develop some natural extensions. In particular, we consider public-key cryptography and integrity, which accommodate reasoning about primitives that are vulnerable to chosen-ciphertext attacks.

Aslan Askarov, Daniel Hedin, Andrei Sabelfeld

Proving the Properties of Communicating Imperfectly-Clocked Synchronous Systems

Abstract

Our work aims at certifying that all the executions of several collaborating synchronous systems in a realistic environment follow a given specification. In order to analyze the numerous executions that may happen while considering a set of synchronous systems whose clocks are non-perfect and that communicate through non-instantaneous channels, we define two new abstract domains. The Changes counting domain and the Integral bounding domain gap the imprecisions of the previously defined Constraint domain that occur because of these hardware imprecisions. We define a reduced product between these domains that allows a much more precise though sound analysis than the three analyses that may have been defined in each domain.

Julien Bertrane

Session 7

Parametric and Termination-Sensitive Control Dependence

Abstract

A parametric approach to control dependence is presented, where the parameter is any prefix-invariant property on paths in the control-flow graph (CFG). Existing control dependencies, both direct and indirect, can be obtained as instances of the parametric framework for particular properties on paths. A novel control dependence relation, called termination-sensitive control dependence, is obtained also as an instance of the parametric framework. This control dependence is sensitive to the termination information of loops, which can be given via annotations. If all loops are annotated as terminating then it becomes the classic control dependence, while if all loops are annotated as non-terminating then it becomes the weak control dependence; since in practice some loops are terminating and others are not, termination-sensitive control dependence is expected to improve the precision of analysis tools using it. The unifying formal framework for direct and indirect control dependence suggests also, in a natural way, a unifying terminology for the various notions of control dependence, which is also proposed in this paper. Finally, a worst-case O(n ²) algorithm to compute the indirect termination-sensitive control dependence for languages that allow only “structured” jumps (i.e., ones that do not jump into the middle of a different block), such as Java and C#, is given, avoiding the O(n ³) complexity of the trivial algorithm calculating the transitive closure of the direct dependence.

Feng Chen, Grigore Roşu

Memory Leak Analysis by Contradiction

Abstract

We present a novel leak detection algorithm. To prove the absence of a memory leak, the algorithm assumes its presence and runs a backward heap analysis to disprove this assumption. We have implemented this approach in a memory leak analysis tool and used it to analyze several routines that manipulate linked lists and trees. Because of the reverse nature of the algorithm, the analysis can locally reason about the absence of memory leaks. We have also used the tool as a scalable, but unsound leak detector for C programs. The tool has found several bugs in larger programs from the SPEC2000 suite.

Maksim Orlovich, Radu Rugina

Path-Sensitive Dataflow Analysis with Iterative Refinement

Abstract

In this paper, we present a new method for supporting abstraction refinement in path-sensitive dataflow analysis. We show how an adjustable merge criterion can be used as an interface to control the degree of abstraction. In particular, we partition the merge criterion with two sets of predicates — one related to the dataflow facts being propagated and the other related to path feasibility. These tracked predicates are then used to guide merge operations and path feasibility analysis, so that expensive computations are performed only at the right places. Refinement amounts to lazily growing the path predicate set to recover lost precision. We have implemented our refinement technique in ESP, a software validation tool for C/C++ programs. We apply ESP to validate a future version of Windows against critical security properties. Our experience suggests that applying iterative refinement to path-sensitive dataflow analysis is both effective in cutting down spurious errors and scalable enough for solving real world problems.

Dinakar Dhurjati, Manuvir Das, Yue Yang

Springer Professional

Table of Contents

Frontmatter

Invited Talk

Unleashing the Power of Static Analysis

Session 1

Static Analysis in Disjunctive Numerical Domains

Static Analysis of Numerical Algorithms

Static Analysis of String Manipulations in Critical Embedded C Programs

Session 2

Abstract Regular Tree Model Checking of Complex Dynamic Data Structures

Structural Invariants

Existential Label Flow Inference Via CFL Reachability

Session 3

Abstract Interpretation with Specialized Definitions

Underapproximating Predicate Transformers

Combining Widening and Acceleration in Linear Relation Analysis

Beyond Iteration Vectors: Instancewise Relational Abstract Domains

Invited Talk

Separation Logic and Program Analysis

Beyond Reachability: Shape Abstraction in the Presence of Pointer Arithmetic

Session 4

Specialized 3-Valued Logic Shape Analysis Using Structure-Based Refinement and Loose Embedding

Recency-Abstraction for Heap-Allocated Storage

Interprocedural Shape Analysis with Separated Heap Abstractions

Automated Verification of the Deutsch-Schorr-Waite Tree-Traversal Algorithm

Invited Talk

Shape Analysis for Low-Level Code

Session 5

Catching and Identifying Bugs in Register Allocation

Certificate Translation for Optimizing Compilers

Analysis of Low-Level Code Using Cooperating Decompilers

Session 6

Static Analysis for Java Servlets and JSP

Cryptographically-Masked Flows

Proving the Properties of Communicating Imperfectly-Clocked Synchronous Systems

Session 7

Parametric and Termination-Sensitive Control Dependence

Memory Leak Analysis by Contradiction

Path-Sensitive Dataflow Analysis with Iterative Refinement

Backmatter

Premium Partner