Fundamental Approaches to Software Engineering
28th International Conference, FASE 2025, Held as Part of the International Joint Conferences on Theory and Practice of Software, ETAPS 2025, Hamilton, ON, Canada, May 3–8, 2025, Proceedings
- Open Access
- 2025
- Open Access
- Book
- Editors
- Artur Boronat
- Gordon Fraser
- Book Series
- Lecture Notes in Computer Science
- Publisher
- Springer Nature Switzerland
About this book
This open access book constitutes the proceedings of the 28th International Conference on Fundamental Approaches to Software Engineering, FASE 2025, which was held as part of the International Joint Conferences on Theory and Practice of Software, ETAPS 2025, in Hamilton, Canada, in May 2025.
The 9 full and 2 short papers included in the proceedings, together with one invited keynote paper and 3 tool competition papers, were carefully reviewed and selected from 31 submissions. They deal with up to date research in software engineering and its applications in, e.g., quality and testing foundations for AI-based systems, requirements engineering, etc.
Table of Contents
-
Capturing System Designs with Formal Executable Specifications
- Open Access
Download PDF-versionThe chapter delves into the transformative potential of formal executable specifications in system design, arguing for their integration from the earliest stages of development. It underscores the limitations of informal specifications, which often lack predictive power and make it challenging to uncover subtle flaws and vulnerabilities. The text presents compelling examples, such as the formal specification of Internet Explorer, the C programming language, and the DNS, to illustrate how formal methods can uncover critical issues that informal methods miss. The chapter also explores the use of formal patterns to enhance modularity and reusability in system design, providing a framework for developing high-quality, verifiable systems. Additionally, it discusses the integration of formal executable specifications with model-based design, highlighting the benefits of combining these approaches to achieve more robust and reliable system designs. The chapter concludes with a vision for the future, emphasizing the potential of formal executable specifications to revolutionize standards, programming languages, and software modeling.AI Generated
This summary of the content was generated with the help of AI.
AbstractBasing system designs on informal specifications and applying formal methods after system implementation greatly reduces the benefits that formal methods can provide. Systems of high quality and trustworthiness can be developed in a faster and much more efficient way by capturing system designs with formal executable specifications and subjecting them to automated formal verification from the earliest stages of system design. Even greater benefits can be gained by making such formal designs highly composable and reusable by means of formal patterns. The experience on using the rewriting-logic-based language Maude and its tool environment and formal patterns for all these purposes is presented and illustrated with concrete examples. The benefits of combining model-based design approaches with the one based on formal executable specifications is also discussed an illustrated with examples. -
Towards Large Language Model Guided Kernel Direct Fuzzing
- Open Access
Download PDF-versionThis chapter explores the integration of large language models (LLMs) with direct kernel fuzzing to enhance the detection of software vulnerabilities in operating systems. The authors present a novel framework, SyzAgent, which extends the capabilities of the existing Syzkaller tool by incorporating real-time feedback from LLMs. This dynamic approach allows the fuzzing process to adapt to changes in the kernel, improving efficiency and coverage. The chapter delves into the architecture of SyzAgent, detailing its components and interactions with Syzkaller. It also provides preliminary experimental results that demonstrate the effectiveness of the LLM-driven method in breaking coverage plateaus and outperforming traditional fuzzing tools. The authors discuss the challenges and insights gained from integrating LLMs with kernel fuzzers, highlighting the potential of this approach for future advancements in software security. The chapter concludes with a discussion on the future directions and enhancements for SyzAgent, emphasizing the need for further research to fully harness the capabilities of LLMs in kernel fuzzing.AI Generated
This summary of the content was generated with the help of AI.
AbstractDirect kernel fuzzing is a targeted approach that focuses on specific areas of the kernel, effectively addressing the challenges of frequent updates and the inherent complexity of operating systems, which are critical infrastructure. This paper introduces SyzAgent, a framework integrating LLMs with the state-of-the-art kernel fuzzer Syzkaller, where the LLMs are used to guide the mutation and generation of test cases in real-time. We present preliminary results demonstrating that this method is effective on around 67% cases in our benchmark during the experiment. -
DeepCRCEval: Revisiting the Evaluation of Code Review Comment Generation
- Open Access
Download PDF-versionThe chapter explores the evolution of code review practices from formal inspections to modern, streamlined processes integral to both open-source and industrial software development. It highlights the resource-intensive nature of manual code reviews and the shift towards automating code review comments to alleviate labor demands. The text critically examines the reliance on text similarity metrics, such as BLEU and ROUGE, for evaluating code review comment generators, arguing that these metrics fall short of capturing the essence of effective code review comments. Through an empirical study, the chapter analyzes benchmark comments from major datasets, assessing their quality, category, tone, and context. This analysis reveals significant ambiguities and inconsistencies in the reference comments, questioning their reliability as benchmarks. In response, the chapter introduces DeepCRCEval, an innovative evaluation framework that incorporates both human and Large Language Model (LLM) evaluators. This framework moves away from indirect text similarity measures, focusing instead on the intrinsic merits of review comments. Additionally, the chapter proposes LLM-Reviewer, a lightweight, training-free baseline tool that directly addresses the actual aims of code review. The chapter revisits the evaluation of state-of-the-art code review comment generators using DeepCRCEval and LLM-Reviewer, revealing a considerable shortfall in the performance of existing models compared to the new baseline. The findings underscore the need for more nuanced and direct evaluation metrics in code review automation, paving the way for future research and improvements in this critical area of software development.AI Generated
This summary of the content was generated with the help of AI.
AbstractCode review is a vital but demanding aspect of software development, generating significant interest in automating review comments. Traditional evaluation methods for these comments, primarily based on text similarity, face two major challenges: inconsistent reliability of human-authored comments in open-source projects and the weak correlation of text similarity with objectives like enhancing code quality and detecting defects.This study empirically analyzes benchmark comments using a novel set of criteria informed by prior research and developer interviews. We then similarly revisit the evaluation of existing methodologies. Our evaluation framework, DeepCRCEval, integrates human evaluators and Large Language Models (LLMs) for a comprehensive reassessment of current techniques based on the criteria set. Besides, we also introduce an innovative and efficient baseline, LLM-Reviewer, leveraging the few-shot learning capabilities of LLMs for a target-oriented comparison.Our research highlights the limitations of text similarity metrics, finding that less than 10% of benchmark comments are high quality for automation. In contrast, DeepCRCEval effectively distinguishes between high and low-quality comments, proving to be a more reliable evaluation mechanism. Incorporating LLM evaluators into DeepCRCEval significantly boosts efficiency, reducing time and cost by 88.78% and 90.32%, respectively. Furthermore, LLM-Reviewer demonstrates significant potential of focusing task real targets in comment generation. -
VOCE: A Virtual On-Call Engineer for Automated Alert Incident Analysis Using a Large Language Model
- Open Access
Download PDF-versionIn the realm of online service systems, the detection and analysis of system faults are paramount to maintaining availability and customer satisfaction. This chapter delves into the complexities of alert incident analysis, highlighting the limitations of existing methods that rely on manual intervention and statistical features. It introduces VOCE, a Virtual On-Call Engineer powered by large language models, designed to emulate the analytical processes of operations experts. VOCE's innovative approach involves comprehending local anomalies recorded by alerts, inferring the propagation of these anomalies, and suggesting the originating alert that triggers other anomalies. The chapter provides an in-depth investigation into the key information factors considered during alert incident analysis, such as system layer, impact scope, and severity. It also presents a hierarchical causality mining strategy that breaks down the analysis into manageable sub-tasks, enhancing the accuracy and robustness of the results. Through extensive experiments on real-world alert incidents, the chapter demonstrates VOCE's superior performance in terms of accuracy and efficiency, making it a valuable tool for automated alert incident analysis. The evaluation includes a comparison with other large language models and a detailed case study, showcasing VOCE's ability to accurately identify the originating alert and construct fault propagation graphs. The chapter concludes with a discussion on the limitations and future directions for enhancing VOCE's capabilities, paving the way for more advanced and autonomous operational solutions.AI Generated
This summary of the content was generated with the help of AI.
AbstractIn a service system, operations engineers generally deploy numerous monitoring mechanisms in system components to detect anomalies caused by system faults and generate alerts, also known as alarms, that record the phenomenon of anomalies. Due to the topological relationships between system components, a system fault in a system component may affect other components and result in various local anomalies and generate multiple alerts across different components. Therefore, to facilitate troubleshooting, alerts of the same system fault are usually correlated into one group, called alert incident. However, although there are existing approaches that can automatically correlate alerts for operations engineers, analyzing alert incidents still rely on manual work. In this paper, we propose an approach, VOCE (Virtual On-Call Engineer). Using the emerging capabilities of a large language model, VOCE can automatically comprehend the anomaly information described by alerts and emulate the process of operations engineers analyzing an alert incident. Extensive experiments conducted on real alert incidents and two popular large language models demonstrate the effectiveness and efficiency of VOCE in automatically analyzing alert incidents. -
Hybridize Functions: A Tool for Automatically Refactoring Imperative Deep Learning Programs to Graph Execution
- Open Access
Download PDF-versionThe chapter delves into the tension between efficiency and ease of use in deep learning frameworks, highlighting the trade-offs between deferred and eager execution styles. It introduces Hybridize Functions, an open-source tool designed to automatically refactor imperative deep learning code for improved performance. The tool integrates static analyses from WALA and Ariadne within a PyDev Eclipse IDE plug-in, enabling developers to transform Python functions for graph execution safely and efficiently. The chapter discusses the engineering challenges faced during the tool's implementation, including the integration of tensor type inference and the modernization of Ariadne to support contemporary Python constructs and TensorFlow 2 APIs. It also presents an evaluation of the tool on 19 diverse Python deep learning projects, demonstrating an average speedup of 2.16 with negligible semantic differences. The chapter concludes with a discussion on future work, including the potential incorporation of advanced container-based analyses and automatic code splitting to further enhance hybridization opportunities.AI Generated
This summary of the content was generated with the help of AI.
AbstractEfficiency is essential to support responsiveness w.r.t. ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code—supporting symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, imperative DL frameworks encouraging eager execution have emerged but at the expense of run-time performance. Though hybrid approaches aim for the “best of both worlds,” using them effectively requires subtle considerations to make code amenable to safe, accurate, and efficient graph execution—avoiding performance bottlenecks and semantically inequivalent results. We discuss the engineering aspects of a refactoring tool that automatically determines when it is safe and potentially advantageous to migrate imperative DL code to graph execution and vice-versa. -
Compositional Learning for Synchronous Parallel Automata
- Open Access
Download PDF-versionThis chapter explores the advanced technique of active automata learning, which is instrumental in extracting concise and abstract models from black-box systems. The focus is on synchronous parallel composition of Mealy machines, a type of finite-state machine that produces outputs based on current states and inputs. The chapter introduces a groundbreaking synchronous compositional learning algorithm that extends existing methods by learning systems composed of synchronous components without prior knowledge of their structure or synchronizing actions. This algorithm automatically identifies and learns the individual components and their interactions, ensuring output determinism and scalability. The chapter provides a thorough theoretical foundation, including proofs of termination and correctness, and evaluates the algorithm's performance using both synthetic and realistic benchmarks. The results demonstrate the algorithm's superiority in handling large-scale models, making it a valuable contribution to the field of automata learning. The detailed evaluation and experimental results highlight the practical applications and advantages of the proposed algorithm, making it a must-read for anyone interested in advanced automata learning techniques.AI Generated
This summary of the content was generated with the help of AI.
AbstractAutomata learning is an approach for extracting a model in the shape of an automaton from a black-box system. This approach has recently gained much attention in both industry and academia. In this paper, we introduce a compositional automata learning algorithm for systems comprising synchronous parallel components. Our algorithm assumes no prior knowledge about the number of components, their individual alphabets, and the synchronizing alphabets. The learning process is automatic and figures out the alphabet symbols on-the-fly during learning the components. We prove that the proposed algorithm terminates and correctly learns the individual components. We use a number of case studies from the industrial automotive domain and synthetic benchmarks to evaluate the performance of the proposed algorithm. The experimental results show that the algorithm requires significantly fewer input symbols and resets to learn the system compositionally. -
Symbolic State Partitioning for Reinforcement Learning
- Open Access
Download PDF-versionThis chapter delves into the integration of symbolic execution with reinforcement learning to address the challenges of state space partitioning. It introduces SymPar, a method that uses symbolic execution to create adaptive partitions, capturing the semantics of the environment and accommodating non-linear behaviors. Unlike traditional methods, SymPar is offline, automated, and general, making it applicable to a wide range of problems. The chapter presents a thorough evaluation of SymPar against baseline methods, showcasing its ability to improve learning efficiency and policy reliability. It also discusses the limitations and future directions of SymPar, highlighting its potential for online partitioning and handling complex simulators. The chapter provides a detailed overview of the related work, the theoretical foundations of SymPar, and its empirical evaluation, making it a valuable resource for researchers and practitioners in the field of reinforcement learning.AI Generated
This summary of the content was generated with the help of AI.
AbstractTabular reinforcement learning methods cannot operate directly on continuous state spaces. One solution to this problem is to partition the state space. A good partitioning enables generalization during learning and more efficient exploitation of prior experiences. Consequently, the learning process becomes faster and produces more reliable policies. However, partitioning introduces approximation, which is particularly harmful in the presence of nonlinear relations between state components. An ideal partition should be as coarse as possible, while capturing the key structure of the state space for the given problem. This work extracts partitions from the environment dynamics by symbolic execution. We show that symbolic partitioning improves state space coverage with respect to environmental behavior and allows reinforcement learning to perform better for sparse rewards. We evaluate symbolic state space partitioning with respect to precision, scalability, learning agent performance and state space coverage for the learned policies. -
Formal Architectural Patterns for Adaptive Robotic Software
- Open Access
Download PDF-versionThe chapter delves into the critical need for robots to adapt to unexpected environmental changes while maintaining safe and effective operation. It introduces the MAPLE-K architecture, an extension of the MAPE-K framework, which includes an additional legitimisation step to ensure that adaptations are safe before execution. This extension is particularly important when using neural networks in the analysis or planning steps, as they cannot always guarantee compliance with safety requirements. The chapter also presents a model-driven engineering approach using RoboArch, RoboChart, and AADL to ensure traceability and verifiability from high-level architectural designs to implementation. This approach allows for formal verification of the architecture and automatic code generation, ensuring that the MAPLE-K loop, including the legitimisation component, is maintained throughout the development process. The chapter demonstrates this approach using a robot navigation scenario, showcasing the practical application of the MAPLE-K architecture in handling LiDAR sensor occlusions and adapting to faults. The chapter also discusses related work in modelling MAPE-K architectures and highlights the unique contributions of the presented approach in providing a pathway from high-level descriptions to deployment architectures.AI Generated
This summary of the content was generated with the help of AI.
AbstractIt is often the case that a robot must adapt to unexpected changes in its environment. It is, however, important that these changes can be demonstrated to maintain the safe operation of the robot. The adaptive systems community has developed the MAPE-K pattern as a widely recognised conceptual architecture. We propose extending MAPE-K to incorporate runtime verification, resulting in an architecture we call MAPLE-K. In this paper, we capture and formalise both the MAPE-K and MAPLE-K architectures using a domain-specific language. Additionally, we provide support for translation from architectural models to software models and code to facilitate the deployment of verified applications. MAPE-K is rarely maintained at the implementation level, but our work ensures traceability between the code and its design, enabling the use of architectural information to verify the correctness of the software. -
RoboScene: Notation for Formal Verification of Human-Robot Interaction
- Open Access
Download PDF-versionThe chapter addresses the challenge of capturing assumptions for mathematical reasoning about human behaviors in human-robot interaction (HRI) systems. It introduces RoboScene, a UML-based notation designed to be accessible to human-behavior experts, enabling them to model assumptions about human behavior in a precise and formal manner. RoboScene diagrams can be connected to models of system software, hardware, and operational scenarios, providing a holistic view of the robotic system. The notation includes novel constructs to capture nondeterministic timed interactions, human decisions, and different human traits, such as 'tired' or 'untrained'. The chapter presents a user-needs analysis that confirms the necessity for such a notation and identifies its requirements. It also demonstrates the use of RoboScene through a search-and-rescue drone example, derived from a Hierarchical Task Analysis (HTA) carried out by Thales. The chapter describes the formal semantics of RoboScene in tock-CSP, a timed variant of CSP, and presents an automated engineering process enabled by RoboScene. It highlights the gaps identified in the HTA and proves properties about the time taken to complete a search. The chapter also discusses related work, comparing RoboScene with other formal approaches to reason about HRI and works that formalize UML-like sequence diagrams. It concludes by outlining future work, including evaluation through additional industrial case studies and the development of runtime verifiers based on RoboScene diagrams.AI Generated
This summary of the content was generated with the help of AI.
AbstractProving properties about robotic systems with humans-in-the-loop relies on assumptions about human behaviour. Existing technologies require expertise not reasonably expected from psychologists and human-factors engineers, for instance. A user-needs analysis of industrial design techniques for human-robot interaction has identified a lack of standardised approach. We present RoboScene, a notation based on UML sequence diagrams that can be used to capture assumptions derived from human-factors artefacts, through novel constructs enabling consideration of stakeholders with different traits. We describe a tock-CSP semantics for RoboScene, and show how we can connect (mathematically) RoboScene diagrams to platform-independent software models. This is applied in the context of a Human-Centered Engineering process, demonstrated via an industrial case study. -
Stochastic Timed Graph Transformation Systems
- Open Access
Download PDF-versionThis chapter delves into the intricacies of modeling and analyzing large-scale distributed embedded real-time systems, focusing on the challenges posed by complex coordinating behavior and stochastic failures. It begins by discussing the limitations of existing model-driven approaches, particularly Probabilistic Timed Graph Transformation Systems (PTGTSs), which lack support for stochastic behavior. The core innovation presented is the introduction of Stochastic Timed Graph Transformation Systems (STGTSs), an extension of PTGTSs that incorporates stochastic behavior through the use of Cumulative Distribution Functions (CDFs). This extension enables the modeling of systems with probabilistic and stochastic failures, such as those found in automotive and railway transportation systems. The chapter provides a detailed syntax and semantics of STGTSs, exemplifying their capabilities through a running example of a shuttle system navigating a track topology with stochastic traffic light failures. Additionally, it outlines a model checking approach for STGTSs, adapting existing techniques to handle the new stochastic variables and their sampling. The chapter concludes with a discussion on future work, including the formalization and implementation of the model checking algorithm and the extension of an existing simulator to support STGTSs. This comprehensive exploration of STGTSs offers a significant advancement in the field of model-driven analysis, providing a more accurate and robust framework for understanding and improving the behavior of complex distributed embedded real-time systems.AI Generated
This summary of the content was generated with the help of AI.
AbstractThe correct operation of safety-critical distributed embedded systems is crucial. Following a model-driven approach, the relevant system aspects must be captured and rigorous ideally fully-automatic analysis of (probabilistic) timed safety properties must be supported. Probabilistic Timed Graph Transformation Systems (PTGTSs) support the modeling of such systems and analysis of such properties via model checking or simulation. However, they only support the modeling of probabilistic choice with a fixed numbers of outcomes and non-deterministic timing delays via timing constraints limiting applicability (descriptive expressiveness) and usefulness (precision of analysis results).To remedy this drawback of PTGTSs, we (a) extend PTGTSs to Stochastic Timed Graph Transformation Systems (STGTSs) integrating discrete/continuous random variables to capture stochastic behavior and (b) outline an adaptation of PTGTS model checking for STGTSs to enable analysis w.r.t. probabilistic timed safety properties. Relying on a running example in which shuttles navigating on a track topology must avoid derailing, we exemplify STGTS support for modeling and analysis. -
Prove your Colorings: Formal Verification of Cache Coloring of Bao Hypervisor
- Open Access
Download PDF-versionThis chapter explores the formal verification of cache coloring in the Bao hypervisor, a critical process for ensuring secure and efficient resource allocation in embedded systems. Cache coloring is essential for maintaining memory isolation and preventing side-channel attacks, particularly in systems with multiple virtual machines (VMs) sharing hardware resources. The Bao hypervisor implements a sophisticated cache coloring mechanism that segments memory based on cache sets, ensuring that VMs do not compete for the same cache resources. The chapter delves into the implementation details of cache coloring in Bao, highlighting the challenges posed by bit-level operations, complex arithmetic, and nested loops. It provides a comprehensive overview of the verification process using the Frama-C platform, including the identification and correction of subtle bugs in the existing implementation. The authors present a detailed specification and verification approach, demonstrating the use of predicates, ghost code, and loop invariants to ensure the correctness of the cache coloring mechanism. The chapter also discusses potential optimizations and the broader implications of formal verification for critical embedded systems, making it a valuable resource for those interested in advanced software engineering and security.AI Generated
This summary of the content was generated with the help of AI.
AbstractHypervisors allow sharing of computing resources between applications—possibly of various levels of criticality—that makes them increasingly relevant for modern embedded systems. In this context, memory isolation properties (including low-level cache isolation) are essential to guarantee. This paper presents a case study on formal verification of the cache coloring mechanism implemented in the Bao hypervisor. It proposes an original technique for coloring memory pages and assigning to each virtual machine only pages of certain colors, aimed to provide strong isolation guarantees. The implementation presents several challenges for formal verification, such as bit-level operations, complex arithmetic operations, multiple levels of nested loops, and linked lists. We identify two subtle bugs in the existing implementation breaking the expected guarantees, and propose bug fixes. We provide formal specification for the key functions of the mechanism and verify their (fixed) version in the Frama-C verification platform with a few lemmas proved in the Coq proof assistant. We present our specification choices, verification approach and obtained results. Finally, we outline possible optimizations of the current implementation. -
Reasoning about Substitutability at the Level of JVM Bytecode
- Open Access
Download PDF-versionThis chapter explores the critical concept of behavioral substitutability in the context of JVM bytecode, building upon the foundational work by Liskov and Wing. It underscores the importance of ensuring that types related by subtyping maintain behavioral consistency, which is essential for the correctness and flexibility of object-oriented programs. The chapter introduces a groundbreaking approach to model and verify behavioral substitutability constraints within a deductive verification framework, leveraging the ByteBack verifier. This method extends the verifier's capabilities to handle multiple JVM languages, including Java, Scala, and Kotlin, making it applicable to a wide range of programs. The chapter delves into the challenges posed by language features that can bypass static type checks, such as method overriding in Java, and presents innovative solutions to address these issues. It also discusses the extension of ByteBack with specification features like ghost predicates, class invariants, and specification inheritance, which are crucial for reasoning about substitutability. The chapter concludes with a series of experiments that demonstrate the verifier's ability to precisely analyze and verify substitutability properties in complex, multi-language programs, highlighting its practical applications and advantages over existing verification techniques.AI Generated
This summary of the content was generated with the help of AI.
AbstractSubtyping in object-oriented languages is widely based on Liskov’s substitution principle, which offers static correctness guarantees of type safety while abstracting implementation details. Unfortunately, the type systems of languages like Java cannot statically enforce full behavioral substitutability, and in fact there are numerous examples of libraries some of whose components are related by inheritance but not substitutable (for example, because they do not implement “optional” operations).In this paper, we present a novel approach to precisely specify and reason about substitutability in JVM languages. A distinctive feature of our approach is that it targets JVM bytecode, as opposed to a program’s source code, as it is based on the ByteBack deductive verifier. To support reasoning about substitutability, we extended ByteBack with ghost specifications, a (restricted) form of class invariants, and substitutability-preserving specification inheritance (precondition weakening and postcondition strengthening). Equipped with these features, ByteBack can now reason precisely about behavioral substitutability violations in a way that is applicable to realistic examples (such as with optional operations of Java’sListinterface). Our experiments also demonstrate that ByteBack can analyze substitutability in programs written in a combination of JVM languages, including multi-language code where Scala or Kotlin code interacts with Java libraries. -
Advances in Automatic Software Testing: Test-Comp 2025
- Open Access
Download PDF-versionThe 7th edition of the International Competition on Software Testing (Test-Comp) presents a detailed comparison of automatic test-suite generators for C programs, highlighting the current state of the art in software testing. The competition utilized advanced tools such as BenchExec, BenchCloud, FM-Weck, and FM-Tools to execute and validate test cases, ensuring a robust and reproducible benchmarking process. The report delves into the competition's goals, which include establishing standards for software test generation, creating a set of benchmarks for the community, and providing an overview of available tools. The results are presented through tables, graphs, and an interactive web interface, offering a comprehensive view of the effectiveness and performance of various test generators. The competition also introduced new validation techniques, using different compiler backends and formatting choices to achieve the best possible coverage results. This edition saw the participation of 20 test-suite generators, with detailed insights into their features and technologies. The report concludes with a discussion on the significance of the competition's findings and their impact on the field of software testing, making it an essential read for anyone interested in the latest advancements in automatic software testing.AI Generated
This summary of the content was generated with the help of AI.
AbstractThe 7th edition of the Competition on Software Testing (Test-Comp 2025) provides an overview and comparative evaluation of automatic test-suite generators for C programs. The experimental evaluation was performed on a benchmark set of 11 226 test-generation tasks for C programs. Each test-generation task consisted of a program and a test specification. The test specifications included error coverage (generate a test suite that exhibits a bug) and branch coverage (generate a test suite that executes as many program branches as possible). Test-Comp 2025 evaluated 20 software systems for test generation that are all freely available. This included 13 test-suite generators that participated with active support from teams led by 12 different representatives from 8 countries (actively maintained software systems, participation in competition jury). Test-Comp 2025 had 1 new participant (
) and 2 re-entries (ESBMC-incr, ESBMC-kind). The evaluation included also 7 test-generation tools from previous years.
-
Fizzer with Local Space Fuzzing
(Competition Contribution)- Open Access
Download PDF-versionThis chapter presents a sophisticated test-generation approach that leverages dynamic analysis to achieve high branch coverage. The core of this method is Fizzer, a gray-box fuzzer that instruments programs to gather detailed execution information, focusing on the evaluation of atomic Boolean expressions (abes). Unlike traditional gray-box fuzzers, Fizzer collects extensive data to create more targeted inputs, aiming to evaluate each abe in various calling contexts. The original version of Fizzer, which won a bronze medal in the Cover-Branches category of Test-Comp 2024, has been enhanced with a dynamic taint-flow analysis and local search analysis to address its limitations. These improvements mitigate issues such as slow sensitivity analysis and execution path divergencies, leading to better branch coverage. The chapter also discusses the software architecture of Fizzer, its implementation in C++, and its dependence on the LLVM infrastructure. It compares the coverage achievements of the original and new versions of Fizzer on Cover-Branches benchmarks, highlighting significant improvements. The experimental results, presented in a scatter plot, demonstrate that the new version of Fizzer generally achieves better branch coverage, sometimes by an order of magnitude. Additionally, the chapter provides insights into the tool's setup, configuration, and the contributions of the development team, making it a comprehensive resource for understanding advanced fuzzing techniques and their practical applications.AI Generated
This summary of the content was generated with the help of AI.
AbstractFizzer is a gray-box fuzzer introduced at Test-Comp 2024. This paper summarizes the lessons learned with the original version and describes the major changes including new analyses implemented in the current version of Fizzer. In particular, Fizzer now uses dynamic taint-flow analysis and local space fuzzing. We also provide experimental results showing the progress between the two versions. -
ESBMC v7.7: Automating Branch Coverage Analysis Using CFG-Based Instrumentation and SMT Solving
(Competition Contribution)- Open Access
Download PDF-versionThe chapter presents a comprehensive overview of ESBMC v7.7, an advanced tool for verifying C programs, with a focus on its enhanced branch coverage analysis capabilities. It begins by outlining ESBMC's general workflow, which involves converting input programs into a simplified Control Flow Graph (CFG) for symbolic execution and verification. The core innovation lies in the extension of ESBMC to include branch coverage analysis, achieved through a two-step process: branch coverage property instrumentation in the GOTO program and multi-property verification in the backend. This approach involves transforming branch constraints into properties, represented as assertions, and instrumenting them into the program for verification. The chapter delves into the intricacies of symbolic execution, program slicing, and property splitting, demonstrating how these techniques are used to reason about and generate test suites for multiple branch coverage properties. It also discusses the strengths and weaknesses of ESBMC v7.7, highlighting its ability to automatically explore deeper execution paths and mitigate state-space explosion through repeated program slicing. The chapter concludes with a detailed description of the tool's setup and configuration, providing practical insights into its implementation and usage. Readers will gain a deep understanding of the advanced techniques employed by ESBMC v7.7 and their potential applications in improving software verification and test generation.AI Generated
This summary of the content was generated with the help of AI.
AbstractESBMC, a bounded model checking (BMC) verifier based on SMT solving, has demonstrated its effectiveness in bug detection in recent software verification competitions. We extend its capabilities to enable branch coverage analysis and test suite generation. Our contributions are twofold: (1) we define a branch coverage property and instrument the control flow graph (CFG) to compute branch coverage using SMT solving, and (2) we propose an incremental multi-property reasoning algorithm for efficient and sound test case generation. ESBMC is ranked 7th in theCover-Branchescategory of Test-Comp 2025.
- Title
- Fundamental Approaches to Software Engineering
- Editors
-
Artur Boronat
Gordon Fraser
- Copyright Year
- 2025
- Publisher
- Springer Nature Switzerland
- Electronic ISBN
- 978-3-031-90900-9
- Print ISBN
- 978-3-031-90899-6
- DOI
- https://doi.org/10.1007/978-3-031-90900-9
Accessibility information for this book is coming soon. We're working to make it available as quickly as possible. Thank you for your patience.