Skip to main content
Top

2025 | Book

Tests and Proofs

18th International Conference, TAP 2024, Milan, Italy, September 9–10, 2024, Proceedings

insite
SEARCH

About this book

​This book constitutes the proceedings of the 18th International Conference on Tests and Proofs, TAP 2024. TAP 2024 took place in Milan, Italy, on September 9 and 10, 2024 as part of the Formal Methods symposium (FM 2024), which included four more co-located conferences besides TAP: FMICS (Formal Methods in Industrial Critical Systems), LOPSTR (In ternational Symposium on Logic-based Program Synthesis and Transformation), PPDP (International Symposium on Principles and Practice of Declarative Pro gramming), and FACS (International Conference on Formal Aspects of Compo nent Software.

The 7 full papers together with 1 short paper included in this volume were carefully reviewed and selected from 14 submissions. TAP’s scope encompasses many aspects of verification technology, including foundational work, tool development, and empirical research.

Table of Contents

Frontmatter

Invited Talks

Frontmatter
Efficient Temporal Logic Runtime Monitoring for Tiny Systems
Abstract
While the theory and practice of runtime monitoring are overall well developed, in embedded systems, runtime monitoring is not as common as one would expect. Especially small-scale embedded systems, which are found in many household devices, are often somewhat safety-critical and could benefit from the possibility to detect software or hardware defects that cannot be uncovered with verification alone. While monitoring frameworks such as RTLola and Copilot are available to address this problem, employing them leads to a gap between verification and runtime monitoring by these frameworks having specialized specification languages for monitoring, and what can be expressed in them is incomparable to the capabilities of the temporal logics traditionally employed in formal verification.
This paper discusses how (linear) temporal logic runtime monitoring for small-scale embedded systems can be made more efficient and attractive to the embedded systems practitioner. This includes identifying why monitoring for traditional temporal logics is somewhat inefficient in software, how this problem can be addressed in a low-cost way, and how runtime monitors can become a more useful component of an embedded system. By providing a way to translate a specification to monitor code that also tracks the reason for a specification violation, the overall approach discussed in the paper makes spending developer time on writing a relatively precise specification of a system to build attractive, which also helps paving the way to make formal verification for small-scale embedded systems more common.
Rüdiger Ehlers

Quality of Tests and Proofs

Frontmatter
Is MCDC Really Better? Lessons from Combining Tests and Proofs
Abstract
MCDC, for “Modified Condition/Decision Coverage”, is a test coverage criterion recommended in several important industrial software safety standards. The intuition behind MCDC is that it should be more effective than industry’s default coverage criterion, branch coverage, since it uses a finer-grain decomposition of program conditions.
Manually generating MCDC-compliant test suites is, however, tedious and error-prone; there is, as a result, little empirical evidence of whether that intuition holds up in practice. Does MCDC really provide enough of an advantage over branch coverage to justify the extra work and longer test runs?
To help answer this question, the present work takes advantage of verification technology that combines tests and proofs, based on the AutoProof program verifier.
For each decision point in the program, the approach proceeds in four steps. First, compute a set of conditions that satisfy MCDC. Then, inject into the program, for each combination, an incorrect instruction (a “seeded contradiction” in the terminology of previous work). Next, apply the prover to verify the instrumented program; the prover naturally fails, but in the process it produces a set of counterexamples covering all the combinations. Finally, turn this result into a directly usable test suite—guaranteed by construction to satisfy the MCDC criterion.
The results, on a significant set of example programs, yield insights on how MCDC coverage compares to branch coverage and adaptive random testing when trying to detect faults.
Li Huang, Bertrand Meyer, Manuel Oriol
Refining CEGAR-Based Test-Case Generation with Feasibility Annotations
Abstract
Often, testing is part of software quality assurance, in which testers may use automatic tools to generate test cases. To avoid redundant work, the tools may consider collateral coverage, i.e., use that a test case generated for a particular test goal, e.g., a branch, may cover further goals. We present a novel approach to use collateral coverage during test-case generation with software model checkers that apply counterexample-guided abstraction refinement. For every test goal, those model checkers perform a reachability analysis, which we plan to support with previous test inputs. More concretely, we use a feasibility annotation to map already covered goals to their test inputs. Whenever the reachability analysis detects a syntactical path to an uncovered test goal, we select a test input from the feasibility annotation that is likely a (partial) input for the path and use this input as a potential guide to check whether the path is semantically feasible. So far, we have integrated our approach into https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-72044-4_3/MediaObjects/558118_1_En_3_Figa_HTML.gif and evaluated it on the Test-Comp tasks. Compared to the original https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-72044-4_3/MediaObjects/558118_1_En_3_Figb_HTML.gif approach, our approach performs similar or better. In total, we increase the achieved branch coverage by 2%, but may achieve a significantly larger increase for some tasks.
Max Barth, Marie-Christine Jakobs
No Smoke Without Fire: Detecting Specification Inconsistencies with Frama-C/WP
Abstract
Deductive verification provides a proof that, under the provided pre-conditions, each terminating execution of a given function satisfies the stated post-conditions. In general, pre- and post-conditions are expressed in a logical specification language and typically rely on theories including abstract definitions, axioms and lemmas. As they are written by humans, errors may be introduced into specifications. Some errors can be detected when the proof fails, but sometimes, they remain unnoticed due to misleading proofs: most of the program may become dead code under the provided pre-conditions, or the proof may succeed because of inconsistencies in hypotheses and axioms. In this tool paper, we explore how to detect such unwanted situations by using deductive verification techniques and describe the smoke test mechanism in Frama-C/WP, a popular deductive verifier for C code. We show that, while the intuitive idea is simple, making it practical requires optimizations to scale up, and report on experiments with critical industrial code. Although our method is based on proof techniques, it is not complete and is similar to testing. In the end, can we ever be sure that our programs are proved well enough?
Allan Blanchard, Loïc Correnson, Adel Djoudi, Nikolai Kosmatov

Testing and Proving Advanced Properties

Frontmatter
Runtime Verification for High-Level Security Properties: Case Study on the TPM Software Stack
Abstract
The Trusted Platform Module (TPM) is a cryptoprocessor designed to provide hardware-based secure storage and protect integrity of modern computers. Communications with the TPM go through the TPM Software Stack (TSS), a popular implementation of which is the open-source library tpm2-tss. It is thus crucial to ensure that no leak of sensitive data may occur in the TSS during communications between the host platform and the TPM. Recent work on deductive verification of functional and safety properties for this library faced several challenges. The purpose of this case study paper is to focus on high-level security properties, and to propose an alternative validation approach for such properties on the library by using runtime verification. We describe the proposed approach, apply it to specify and verify at runtime some key security properties using the Frama-C verification platform and report on our first evaluation results.
Yani Ziani, Nikolai Kosmatov, Frédéric Loulergue, Daniel Gracia Pérez
Cyclone: A New Tool for Verifying/Testing Graph-Based Structures
Tool Paper
Abstract
We present a new tool called Cyclone that can be used for verifying and testing graph-based models. Different from other verification tools, Cyclone provides users with a unique way of building models and specifying their properties, by explicitly constructing and reasoning about graphs. As graphs can naturally capture both static and dynamic behaviours of a system, this allows users to represent many challenging problems. In this tool paper, we describe Cyclone’s basic features through different examples. Our preliminary evaluation results show that Cyclone has a promising potential in handling verification tasks from various domains and good usability for those who are not familiar with using verification tools.
Hao Wu, Thomas Flinkow, Dominique Méry

Applications of Tests and Proofs

Frontmatter
Model-Based Testing of Quantum Computations
Abstract
Quantum computers are able to effectively solve certain types of problems being intractable by classical computers, including tasks from linear optimization, machine learning and simulation of natural phenomena. The counter-intuitive behavior of quantum algorithms makes it particularly important to develop practical techniques and tools to support the analysis and quality assurance of quantum programs. For classical programs, software testing is today one of the most effective and easy-to-use quality-assurance techniques. However, many postulates of quantum physics like superposition, entanglement and non-cloneability of qubit states as well as the probabilistic outcome and destructive effects of qubit measurements obstruct any straight-forward adaptation of classical testing techniques to quantum programs. Recent works either treat quantum programs as black-box components with classical interfaces to apply end-to-end testing or shift the focus of quality assurance to formal verification of quantum programs. In this paper, we instead propose a model-based framework for testing quantum computations at the level of execution traces. Our approach is independent of the quantum programming language and hardware used and utilizes probabilistic transition systems as abstract test model for both the specification and implementation to be tested. In our model, we carefully distinguish between controllable and observable as well as between nondeterministic and probabilistic test steps. Our framework makes quantum program testing a partly reactive process that can be complemented by statistical approaches for identifying erroneous qubit states by repeated measurements.
Malte Lochau, Ina Schaefer
Affinitree: A Compositional Framework for Formal Analysis and Explanation of Deep Neural Networks
Abstract
We present Affinitree, a compositional framework for analyzing Deep Neural Networks (DNNs) based on three elementary principles: (1) symbolic execution, (2) infeasible path elimination, and (3) abstraction. The combination of these principles allows one to elegantly solve a number of interesting analysis and verification tasks, like traditional verification problems with pre- and post-conditions, model explanations in terms of semantically equivalent decision trees, concolic execution for slice-oriented testing, and visual verification of two-dimensional slices. The paper illustrates the flexibility of Affinitree over three different use cases covering fairness evaluation, adversarial examples, and counterfactuals. Affinitree is available as a modular open source library for replication, experimentation, and extension.
Maximilian Schlüter, Bernhard Steffen
Exploring Loose Coupling of Slicing with Dynamic Symbolic Execution on the JVM
Abstract
Dynamic symbolic execution is an effective technique for fuzzing programs with application to software verification as well if the search terminates during symbolic execution. However, the state space explosion problem often prevents the symbolic search from exploring the entire state space. As static analysis operates on the whole program rather than single execution paths, it can cut out parts of the program that are irrelevant for verifying a property. In this paper, we explore how slicing can be used before dynamic symbolic execution for analyzing assertion reachability in Java programs. We report on our first experience using state-of-the-art open-source Java slicers on the SV-COMP Java benchmark before applying GDart, a dynamic symbolic execution engine for Java. Currently, this experiment has not been successful, as the slicers are not mature enough to reliably produce correct slices of a task in the SV-COMP benchmark. However, in a few examples, the approach delivered promising results that motivate the future development of slicers for Java.
Malte Mues, Julian Rüschoff, Ben Hermann
Backmatter
Metadata
Title
Tests and Proofs
Editors
Marieke Huisman
Falk Howar
Copyright Year
2025
Electronic ISBN
978-3-031-72044-4
Print ISBN
978-3-031-72043-7
DOI
https://doi.org/10.1007/978-3-031-72044-4

Premium Partner