This short paper is the result of the invited talk I gave at the 2007 Haifa Verification Conference. Its purpose is to briefly summarize the main points of my talk and to provide background references. The original talk abstract was, “Dynamic verification (simulation, emulation) and formal verification often live in separate worlds, with minimal interaction between the two camps, yet both have unique strengths that could complement the other. In this talk, I’ll briefly enumerate what I believe are the best aspects of each verification style, and then explore some possibilities for drawing on the strengths of both camps.”
Simulation test coverage does not scale gracefully with growing system design size. Component interactions grow exponentially with the number of system components, while conventional system test at best can increase coverage as a linear function of allotted test time.
Likewise, capacity limitations are commonly cited as the essential gating factor that restricts the application of automatic formal verification (model checking) to at most a few design blocks.
Nonetheless, abstraction has long been used successfully in pilot projects to apply model checking to entire systems. Abstraction in conjunction with guided-random simulation can be used in the same way to increase coverage for conventional test.
While academic use of abstraction is old, its use in the EDA industry’s commercial tool sets has been very limited, due to a perception that its use entails an unacceptably disruptive methodology change. It is shown here how quite general data-path abstraction incorporated into a hierarchical design flow can be introduced with only a modest change in methodology. This hierarchical design flow supports verification based on either simulation or model checking that can scale gracefully with increasing design complexity.
With the growing maturity in hardware verification methods, there has been great interest in applying them to verification of software programs. Aside from issues of scale and complexity, there are many differences between the two domains in the underlying problem of searching for bugs. In this talk, I will describe our experiences with this transition, with emphasis on methods that worked and those that did not.
A program fails. How can we locate the cause? A new generation of program analysis techniques automatically determines failure causes even in the absence of any specification - in the input, in the set of code changes, or in the program state: ”GCC fails because of a cycle in the abstract syntax tree.” Relying on automated tests and dynamic execution data is just one example of how future program analysis techniques will access and leverage data beyond specs and code; leveraging all available data will result in automated assistance for all developer decisions.
Techniques for checking complex software range from model checking and static analysis to testing. We aim to use the power of exhaustive techniques, such as model checking and symbolic execution, to enable thorough testing of complex software. In particular, we have extended the Java PathFinder model checking tool (JPF)  with a symbolic execution capability [4,2] to enable test case generation for Java programs. Our techniques handle complex data structures, arrays, as well as multithreading, and generate optimized test suites that satisfy user-specified testing coverage criteria.
Modern hardware designs are typically based on multiple clocks. While a singly-clocked hardware design is easily described in standard temporal logics, describing a multiply-clocked design is cumbersome. Thus, it is desirable to have an easier way to formulate properties related to clocks in a temporal logic. In  a relatively simple solution built on top of the traditional ltl semantics was suggested and adopted by the IEEE standard temporal logic psl. The suggested semantics was examined relative to a list of design goals, and it was shown that it answered all requirements except for preserving the least fixed point characterization of the until operator under multiple clocks. In this work we show that with a minor addition to the semantics of  this requirement is met as well.
SystemC is a popular language used in modeling system-on-chip implementations. To support this task at a high level of abstraction, transaction-level modeling (TLM) libraries have been recently developped. While TLM libraries are useful, it is difficult to capture the reactive nature of certain transactions with the constructs currently available in the SystemC and TLM libraries. In this paper, we propose an approach to specify and verify reactive transactions in SystemC designs. Reactive transactions are different from TLM transactions in the sense that a transaction can be killed or reset. Our approach consists of: (1) a language to describe reactive transactions that can be translated to verification monitors, (2) an architectural pattern to implement reactive transactions, and (3) the verification support to verify that the design does not deadlock, allows only legal behaviors and is always responsive. We illustrate our approach through an example of a transactional memory system where a transaction can be killed or reset before its completion. We identify the architectural patterns for reactive transactions. Our results demonstrate the feasibility of our approach as well as support for a comprehensive verification using RuleBase/NuSMV tools.
Frederic Doucet, R. K. Shyamasundar, Ingolf H. Krüger, Saurabh Joshi, Rajesh K. Gupta
The paper presents a new approach to formal verification of generic (i.e. parametrised) hardware designs specified in VHDL. The proposed approach is based on a translation of such designs to counter automata and on exploiting the recent advances achieved in the area of their automated formal verification. We have implemented the proposed translation. Using one of the state-of-the-art tools for verification of counter automata, we were then able to verify several non-trivial properties of parametrised VHDL components, including a real-life one.
This paper studies the efficiency of several probabilistic model checkers by comparing verification times and peak memory usage for a set of standard case studies. The study considers the model checkers ETMCC, MRMC, PRISM (sparse and hybrid mode), YMER and VESTA, and focuses on fully probabilistic systems. Several of our experiments show significantly different run times and memory consumptions between the tools—up to various orders of magnitude—without, however, indicating a clearly dominating tool. For statistical model checking YMER clearly prevails whereas for the numerical tools MRMC and PRISM (sparse) are rather close.
David N. Jansen, Joost-Pieter Katoen, Marcel Oldenkamp, Mariëlle Stoelinga, Ivan Zapreev
Constraint Programming (CP) technology has been extensively used in Random Functional Test Generation during the recent years. However, while the existing CP methodologies are well tuned for traditional combinatorial applications e.g. logistics or scheduling, the problem domain of functional test generation remains largely unexplored by the CP community and many of its domain specific features and challenges are still unaddressed. In this paper we focus on the distinctive features of CP for the random functional test generation domain and show how these features can be addressed using a classical CP engine with custom extensions. We present some modeling and solving problems arising in this context and propose solutions. In particular, we address the way of model building in the problem domain of test generation which we refer to as multi-layer modeling. In this context we introduce constraint patterns of composite variable, implied condition and implied composite variable condition, define their semantics and propose schemes for their CSP modeling. The paper also addresses specific problems arising at the solving stage in the problem domain of random test generation. We propose solutions to these problems by means of custom random search algorithms. This approach is illustrated on the examples of the disjunction constraint and conditional variable instantiation. The latter algorithm addresses the feature of dynamic modeling required in the test generation task. To demonstrate the effectiveness of our approach we present experimental results based on the implementation using ILOG Solver as a CP engine.
Reaching hard-to-reach coverage events is a difficult task that requires both time and expertise. Data-driven Coverage Directed Generation (CDG) can assist in the task when the coverage events are part of a structured coverage model, but is a-priori less useful when the target events are singular and not part of a model. We present virtual coverage models as a mean for enabling data-driven CDG to reach singular events. A virtual coverage model is a structured coverage model (e.g., cross-product coverage) defined around the target event, such that the target event is a point in the structured model. With the structured coverage model around the target event, the CDG system can exploit the structure to learn how to reach the target event from covered points in the structured model. A case study of using CDG and virtual coverage to reach a hard-to-reach event in a multi-processor system demonstrates the usefulness of the proposed method.
Software verification is a hard yet important challenge. In general, the problem is undecidable. Nevertheless, it is still beneficial to look at solutions that either restrict the generality or are heuristic in nature (and do not guarantee to terminate). In this paper, we concentrate on a related problem, that of verifying that a cycle in the flow chart of a program does not terminate. We show some exact and sufficient conditions for cycle nontermination, and provide application for program verification. This allows us to check sequential and concurrent programs against temporal properties, using a truly symbolic approach, and to use temporal logic to guide the selection of test cases in such programs.
This paper presents dynamic testing, a method that exploits automata learning to systematically test (black box) systems almost without prerequisites. Based on interface descriptions, our method successively explores the system under test (SUT), while it at the same time extrapolates a behavioral model. This is in turn used to steer the further exploration process. Due to the applied learning technique, our method is optimal in the sense that the extrapolated models are most concise in consistently representing all the information gathered during the exploration. Using the LearnLib, our framework for automata learning, our method can elegantly be combined with numerous optimizations of the learning procedure, various choices of model structures, and last but not least, with the option to dynamically/interactively enlarge the alphabet underlying the learning process. All these features will be illustrated using as a case study the web application Mantis, a bug tracking system widely used in practice. We will show how the dynamic testing procedure proceeds and how the behavioral models arise that concisely summarize the current testing effort. It has turned out that these models, besides steering the automatic exploration process, are ideal for user guidance and to support analyzes to improve the system understanding.
Harald Raffelt, Bernhard Steffen, Tiziana Margaria
Implementations of computer systems comprise many layers and employ a variety of programming languages. Building such systems requires support of an often complex, accompanying tool chain.
The Verisoft project deals with the formal pervasive verification of computer systems. Making use of appropriate formal specification and proof tools, this task requires (i) specifying the layers and languages used in the implementation, (ii) specifying and verifying the algorithms employed by the tool chain (or, alternatively, validating their actual output), and (iii) proving simulation statements between layers, arguing about the programs residing at the different layers. Combining the simulation statements for all layers should allow to transfer correctness results for top-layer programs to their bottom-layer representation; in this manner, a verified stack can be built.
Maintaining all formal artifacts, the actual system implementation, and the (verification) tool chain is a challenging task. We call sets of tools that help addressing this task system verification environments. In this paper, we describe the structure, contents, and architecture of the system verification environment used in the Verisoft project.
Despite many advances, today’s software model checkers and extended static checkers still do not scale well to large code bases, when verifying properties that depend on complex interprocedural flow of data. An obvious approach to improve performance is to exploit software structure. Although a tremendous amount of work has been done on exploiting structure at various levels of granularity, the fine-grained shared structure among multiple verification conditions has been largely ignored. In this paper, we formalize the notion of shared structure among verification conditions, propose a novel and efficient approach to exploit this sharing, and provide experimental results that this approach can significantly improve the performance of verification, even on path- and context-sensitive and dataflow-intensive properties.
This paper presents an approach to the efficient verification of embedded systems. Such systems usually operate in uncertain environments, giving rise to a high degree of nondeterminism in the corresponding formal models, which in turn aggravates the state explosion problem. Careful handling of nondeterminism is therefore crucial for obtaining efficient model checking tools. Here, we support this goal by developing a formal computation model and an abstraction method, called delayed nondeterminism, which instantiates nondeterministic values only if and when this is required by the application code. It is shown how this technique can be integrated into our CTL model checking tool [mc]square by introducing symbolic abstract states which represent several concrete states. We also give a simulation relation between the concrete and the abstract state space, thus establishing the soundness of delayed nondeterminism with respect to “path-universal” logics such as ACTL and LTL. Furthermore, a case study is presented in which three different programs are used to demonstrate the effectiveness of our technique.
Pushdown systems (PDSs) consist of a stack and a finite state machine and are frequently used to model abstractions of software. They correspond to sequential recursive programs with finite-domain variables. This paper presents a novel algorithm for deciding reachability of particular locations of PDSs. We exploit the fact that most PDSs used in practice are shallow, and propose to use SAT-based Bounded Model Checking to search for counterexamples. Completeness is achieved by computing universal summaries of the procedures in the program.
Gérard Basler, Daniel Kroening, Georg Weissenbacher
A regression bug is a bug which causes a feature that worked correctly to stop working after a certain event (system upgrade, system patching, daylight saving time switch, etc.). Very often an encompassed bug fix included in a patch causes the regression bug. Regression bugs are an annoying and painful phenomena in the software development process, requiring a great deal of effort to find. Many tools have been developed in order to find the existence of such bugs. However, a great deal of manual work is still needed to find the exact source-code location that caused a regression bug.
In this paper we present the CodePsychologist, a tool which assists the programmer to locate source code segments that caused a given regression bug. The CodePsychologist goes beyond current tools, that identify all the lines of code that changed since the feature in question worked properly (with the help of a Source Control Tool). The CodePsychologist uses various heuristics to select the lines most likely to be the cause of the error, from these often large number of lines of code. This reduces the fixing time of regression bugs. It allows a quick response to field errors that need an immediate correction.
Code coverage is often defined as a measure of the degree to which the source code of a program has been tested . Various metrics for measuring code coverage exist. The vast majority of these metrics require instrumenting the source code to produce coverage data. However, for certain coverage metrics, it is also possible to instrument object code to produce coverage data. Traditionally, such instrumentation has been considered inferior to source level instrumentation because source code is the focus of code coverage. Our experience shows that object code instrumentation, specifically post-link instrumentation, can be very useful to users. Moreover, it does not only alleviate certain side-effects of source-level instrumentation, especially those related to compiler optimizations, but also lends itself to performance optimization that enables low-overhead instrumentation. Our experiments show an average of less than 1% overhead for instrumentation at the function level and an average of 4.1% and 0.4% overhead for SPECint2000 and SPECfp2000, respectively, for instrumentation at the basic block level. This paper demonstrates the advantages of post-link coverage and describes effective methodology and technology for applying it.
Unit testing plays a major role in the software development process. It enables the immediate detection of bugs introduced into a unit whenever code changes occur. Hence, unit tests provide a safety net of regression tests and validation tests which encourage developers to refactor existing code. Nevertheless, not all software systems contain unit tests. When changes to such software are needed, writing unit tests from scratch might not be cost effective.
In this paper we propose a technique which automatically generates unit tests for software that does not have such tests. We have implemented GenUTest, a tool which captures and logs inter-object interactions occurring during the execution of Java programs. These interactions are used to generate JUnit tests. They also serve in generating mock aspects – mock object like entities, which assist the testing process. The interactions are captured using the aspect oriented language AspectJ.