Skip to main content

2006 | Buch

Rigorous Development of Complex Fault-Tolerant Systems

herausgegeben von: Michael Butler, Cliff B. Jones, Alexander Romanovsky, Elena Troubitsyna

Verlag: Springer Berlin Heidelberg

Buchreihe : Lecture Notes in Computer Science

insite
SUCHEN

Über dieses Buch

Many software systems have reached a level of complication, mainly because of their size, heterogeneity and distribution, which results in faults appearing that cannot be traced back easily to the code. Some of these "faults" could also be unexpected program behavior that appears as a result of interactions between different parts of the program; this is commonly known as complexity. New methods, approaches, tools and techniques are needed to cope with the increasing complexity in software systems; amongst them, fault-tolerance techniques and formal methods, supported by the corresponding tools, are promising solutions. This book brings together papers focusing on the application of rigorous design techniques to the development of fault-tolerant, software-based systems.

This volume is an outcome of the REFT 2005 Workshop on Rigorous Engineering of Fault-Tolerant Systems held in conjunction with the Formal Methods 2005 conference at Newcastle upon Tyne, UK, in July 2005. The authors of the best workshop papers were asked to enhance and expand their work and a number of well-established researchers working in the area contributed invited chapters. From the 19 refereed and revised papers presented, 12 are versions reworked from the workshop; 9 of them are totally new. The book is rounded off by two provocatively different position on the role of programming languages.

Inhaltsverzeichnis

Frontmatter

Train Systems

Train Systems
Abstract
This chapter presents the modelling of a software controller in charge of managing the movements of trains on a track network. Some methodological aspects of this development are emphasized: the preliminary informal presentation of the requirements, the careful definition of a refinement strategy, the attention payed to the precise mathematical definition of the train network, and the modelling of a complete system including the external environment. A special attention is given to the prevention of errors and also (but to a less extend) to their tolerance. The modelling notation which is used in this presentation is Event-B.
Jean-Raymond Abrial
Formalising Reconciliation in Partitionable Networks with Distributed Services
Abstract
Modern command and control systems are characterised by computing services provided to several actors at different geographical locations. The actors operate on a common state that is modularly updated at distributed nodes using local data services and global integrity constraints for validity of data in the value and time domains. Dependability in such networked applications is measured through availability of the distributed services as well as the correctness of the state updates that should satisfy integrity constraints at all times. Providing support in middleware is seen as one way of achieving a high level of service availability and well-defined performance guarantees. However, most recent works [1,2] that address fault-aware middleware cover crash faults and provision of timely services, and assume network connectivity as a basic tenet.
Mikael Asplund, Simin Nadjm-Tehrani
The Fault-Tolerant Insulin Pump Therapy
Abstract
The “Fault-Tolerant Insulin Pump Therapy” is based on the Continuous Subcutaneous Insulin Injection technique which combines devices (a sensor and a pump) and software in order to make glucose sensing and insulin delivery automatic. These devices are not physically connected together and they come with the necessary features to detect malfunctions which they may have.
As the patient’s health is the most important, the therapy has to be able to work despite the fact that hardware and/or software faults have or may ocurr.
This paper presents the development cycle for the Insulin Pump Therapy Control System case study, starting from requirements and reaching the implementation following a top-down approach. It will show how the Coordinated Atomic Actions (CAAs) structuring mechanism can be used for modelling Faul-Tolerant (FT) systems and how CAA-DRIP development environment is used to implement it.
Alfredo Capozucca, Nicolas Guelfi, Patrizio Pelliccione
Reasoning About Exception Flow at the Architectural Level
Abstract
An important challenge faced by the developers of fault- tolerant systems is to build reliable fault tolerance mechanisms. To achieve the desired levels of reliability, mechanisms for detecting and handling errors should be designed since early phases of software development, preferably using a rigorous or formal methodology. In recent years, many researchers have been advocating the idea that exception handling-related issues should be addressed at the architectural level, as a complement to implementation-level exception handling. However, few works in the literature have addressed the problem of describing how exceptions flow amongst architectural elements. This work proposes a solution to this problem to support the early detection of mismatches between architectural elements due to exceptions. Moreover, it makes it possible to validate whether the architecture satisfies some properties of interest regarding exception flow before the system is actually built. Our solution proposes a model for describing the architectural flow of exceptions which is precise and automatically analyzable by means of a tool.
Fernando Castor Filho, Patrick Henrique da S. Brito, Cecília Mary F. Rubira
Are Practitioners Writing Contracts?
Abstract
For decades now, modular design methodologies have helped software engineers cope with the size and complexity of modern-day industrial applications. To be truly effective though, it is essential that module interfaces be rigorously specified. Design by Contract (DBC) is an increasingly popular method of interface specification for object-oriented systems. Many researchers are actively adding support for DBC to various languages such as Ada, Java and C#. Are these research efforts justified? Does having support for DBC mean that developers will make use of it? We present the results of an empirical study measuring the proportion of assertion statements used in Eiffel contracts. The study results indicate that programmers using Eiffel (the only active language with built-in support for DBC) tend to write assertions in a proportion that is higher than for other languages.
Patrice Chalin
Determining the Specification of a Control System: An Illustrative Example
Abstract
Creating the specification of a system by focusing primarily on the detailed properties of the digital controller can lead to complex descriptions that are nearly incoherent. An argument given by Hayes, Jackson, and Jones provides reasons to focus first on the wider environment in which the system will reside. In their approach are two major ideas: pushing out the specification boundaries, and carefully distinguishing between the requirements of the system and the assumptions about the environment. Pushing out the boundaries of the system specification to include the pragmatic intent of the system being specified allows the specification to be understood relative to the environmental context, rather than remaining a mysterious black box in isolation. Clarifying the distinction between assumptions about the environment and requirements that the specification must meet increases the clarity of the specification, and has the potential to seriously reduce the complexity of the final specification. The example of a gas burner is explored in depth to illustrate this approach to system specification.
Joey W. Coleman
Achieving Fault Tolerance by a Formally Validated Interaction Policy
Abstract
This paper addresses the rigorous validation of an integrity policy by means of the application of formal methods and related support tools. We show how the policy, which provides a flexible fault tolerant schema, can be specified using a process algebra and verified using model checking techniques. Actually, we show how this approach allows both the generic validation of a middleware based on such integrity policy, and the validation of an integrated application which internally uses this mechanism. In the first case, the fault tolerance of a system, possibly composed of Commercial Off The Shelf (COTS) components, is guaranteed by a validated resident interaction control middleware. The second case applies instead when the application is forced to use a given middleware, as it is the case of Web Services.
Alessandro Fantechi, Stefania Gnesi, Laura Semini
F(I)MEA-Technique of Web Services Analysis and Dependability Ensuring
Abstract
Dependability analysis of the Web Services (WSs), dsclosure of possible failure modes and their effects are open problems. This paper gives results of the Web Services dependability analysis using standardized FMEA- (Failure Modes and Effects Analysis) technique and its proposed modification IMEA- (Intrusion Modes and Effects Analysis) technique. Obtained results of FMEA-technique application were used for determining the necessary means of error recovery, fault prevention, fault-tolerance ensuring and fault removal. Systematization and analysis of WS intrusions and means of intrusion-tolerance were fulfilled by use of IMEA-technique. We also propose the architectures of the fault and intrusion-tolerant Web Services based on the components diversity and dynamical reconfiguration as well as discuss principles and results of dependable and secure Web Services development and deployment by use of F(I)MEA-technique and multiversion approach.
Anatoliy Gorbenko, Vyacheslav Kharchenko, Olga Tarasyuk, Alexey Furmanov
On Specification and Verification of Location-Based Fault Tolerant Mobile Systems
Abstract
In this paper, we investigate context aware location-based mobile systems. In particular, we are interested how their behaviour, including fault tolerant aspects, could be captured using a formal semantics, which would then be suitable for analysis and verification. We propose a new formalism and middleware, called Cama, which provides a rich environment to test our approach. The approach itself aims at giving Cama a formal concurrency semantics in terms of a suitable process algebra, and then applying efficient model checking techniques to the resulting process expressions in a way which alleviates the state space explosion. The model checking technique adopted in our work is partial order model checking based on Petri net unfoldings, and we use a semantics preserving translation from the process terms used in the modelling of Cama to a suitable class of high-level Petri nets.
Alexei Iliasov, Victor Khomenko, Maciej Koutny, Alexander Romanovsky
Formal Development of Mechanisms for Tolerating Transient Faults
Abstract
Transient faults belong to a wide-spread class of faults typical for control systems. These are the faults that only appear for a short period of time and might reappear later. However, even by appearing for a short time, they might cause dangerous system errors. Hence, designing mechanisms for tolerating and recovering from the transient faults is an acute issue, especially in the development of the safety-critical control systems. In this paper we propose formal development of a software-based mechanism for tolerating transient faults in the B Method. The mechanism relies on a specific architecture of the error detection actions called the evaluating tests. These tests are executed (with different frequencies) on the predefined subsets of the analyzed data. Our formal model allows us to formally express and verify the interdependencies between the tests as well as to define the test scheduling. Application of the proposed approach ensures proper damage confinement caused by the transient faults. Our approach aims at the avionics domain by focusing on formal development of the engine Failure Management System. However, the proposed specification and refinement patterns can be applied in the development of control systems in other application domains as well.
Dubravka Ilić, Elena Troubitsyna, Linas Laibinis, Colin Snook
Separating Concerns in Requirements Analysis: An Example
Abstract
Often, a requirements document is structured as a long list of individual ”requirements”, each describing an anticipated function or user interaction. An alternative approach is to identify a collection of subproblems, each representing an aspect of the larger problem, and to describe each subproblem in isolation, deferring their composition to a later stage. This paper illustrates the approach by applying it to the requirements of the positioning functions of a proton therapy installation. It explains how a flaw in the design of the system can be isolated to a single subproblem, which can be formalized and subjected to automatic analysis.
Daniel Jackson, Michael Jackson
Rigorous Fault Tolerance Using Aspects and Formal Methods
Abstract
This paper examines the hypothesis that rigorous fault tolerance can be achieved by using aspect oriented software development in conjunction with formal methods of verification and analysis. After brief summaries on fault tolerance, aspect-oriented programming, and formal methods, some examples of aspects for fault tolerance are outlined. Then some recent research on applying formal methods to aspects is described, with the potential implications for rigorous fault tolerance using aspects.
Shmuel Katz
Rigorous Development of Fault-Tolerant Agent Systems
Abstract
Agent systems are examples of complex distributed systems. Though agents operate in unreliable communication environment, often such systems have high reliability requirements imposed on them. Therefore, we need methods which allow us not only to ensure system correctness but also to integrate design of fault tolerance mechanisms in the development process. In this paper we present a formal approach for the development of fault tolerant location-based mobile agent systems. Our approach is based on stepwise refinement in the Event B framework. We start from an abstract system specification modelling agents together with their communication environment and gradually introduce implementation details in a number of correctness-preserving transformations. Such stepwise development allows us to specify complex system properties, such as fault tolerance, in a structured and rigorous way. Moreover, it enables a formal representation of essential abstractions used in the development of fault tolerant agent systems, including scopes, roles, locations, and agents. Application of the proposed approach results in designing fault tolerant agent systems in which inter-consistency and inter-operability of agents is ensured by construction.
Linas Laibinis, Elena Troubitsyna, Alexei Iliasov, Alexander Romanovsky
Formal Service-Oriented Development of Fault Tolerant Communicating Systems
Abstract
Telecommunication systems should have a high degree of availability, i.e., high probability of correct and timely provision of requested services. To achieve this, correctness of software for such systems and system fault tolerance should be ensured. Application of formal methods helps us to gain confidence in building correct software. However, to be used in practice, formal methods should be well integrated into existing development process. In this paper we propose a formal model-driven approach to development of communicating systems. Essentially our approach formalizes and extends Lyra – a top-down service-oriented method for development of communicating systems. Lyra is based on transformation and decomposition of models expressed in UML2. We formalize Lyra in the B Method by proposing a set of formal specification and refinement patterns reflecting the essential models and transformations of the Lyra service specification, decomposition and distribution phases. Moreover, we extend Lyra to integrate reasoning about fault tolerance in the entire development flow.
Linas Laibinis, Elena Troubitsyna, Sari Leppänen, Johan Lilius, Qaisar Ahmad Malik
Programming-Logic Analysis of Fault Tolerance: Expected Performance of Self-stabilisation
Abstract
Formal proofs of functional correctness and rigorous analyses of fault tolerance have, traditionally, been separate processes. In the former a programming logic (proof) or computational model (model checking) is used to establish that all the system’s behaviours satisfy some (specification) criteria. In the latter, techniques derived from engineering are used to determine quantitative properties such as probability of failure (given failure of some component) or expected performance (an average measure of execution time, for example).
To combine the formality and the rigour requires a quantitative approach within which functional correctness can be embedded. Programming logics for probability are capable in principle of doing so, and in this article we illustrate the use of the probabilistic guarded-command language (pGCL) and its logic for that purpose.
We take self-stabilisation as an example of fault tolerance, and present program-logical techniques for determining, on the one hand, that termination occurs with probability one and, on the other, the the expected time to termination is bounded above by some value. An interesting technical novelty required for this is the recognition of both “angelic” and “demonic” refinement, reflecting our simultaneous interest in both upper- and lower bounds.
C. C. Morgan, A. K. McIver
Formal Analysis of the Operational Concept for the Small Aircraft Transportation System
Abstract
The Small Aircraft Transportation System (SATS) is a NASA project aimed at increasing access to small non-towered non-radar airports in the US. SATS is a radical new approach to air traffic management where pilots flying instrument flight rules are responsible for separation without air traffic control services. In this paper, the SATS project serves as a case study of an operational air traffic concept that has been designed and analyzed primarily using formal techniques. The SATS concept of operations is modeled using non-deterministic, asynchronous transition systems, which are then formally analyzed using state exploration techniques. The objective of the analysis is to show, in a mathematical framework, that the concept of operation complies with a set of safety requirements such as absence of dead-locks, maintaining aircraft separation, and robustness with respect to the occurrence of off-nominal events. The models also serve as design tools. Indeed, they were used to configure the nominal flight procedures and the geometry of the SATS airspace.
César Muñoz, Víctor Carreño, Gilles Dowek
Towards a Method for Rigorous Development of Generic Requirements Patterns
Abstract
We present work in progress on a method for the engineering, validation and verification of generic requirements using domain engineering and formal methods. The need to develop a generic requirement set for subsequent system instantiation is complicated by the addition of the high levels of verification demanded by safety-critical domains such as avionics. Our chosen application domain is the failure detection and management function for engine control systems: here generic requirements drive a software product line of target systems.
A pilot formal specification and design exercise is undertaken on a small (two-sensor) system element. This exercise has a number of aims: to support the domain analysis, to gain a view of appropriate design abstractions, for a B novice to gain experience in the B method and tools, and to evaluate the usability and utility of that method. We also present a prototype method for the production and verification of a generic requirement set in our UML-based formal notation, UML-B, and tooling developed in support. The formal verification both of the structural generic requirement set, and of a particular application, is achieved via translation to the formal specification language, B, using our U2B and ProB tools.
Colin Snook, Michael Poppleton, Ian Johnson
Rigorous Design of Fault-Tolerant Transactions for Replicated Database Systems Using Event B
Abstract
System availability is improved by the replication of data objects in a distributed database system. However, during updates, the complexity of keeping replicas identical arises due to failures of sites and race conditions among conflicting transactions. Fault tolerance and reliability are key issues to be addressed in the design and architecture of these systems. Event B is a formal technique which provides a framework for developing mathematical models of distributed systems by rigorous description of the problem, gradually introducing solutions in refinement steps, and verification of solutions by discharge of proof obligations. In this paper, we present a formal development of a distributed system using Event B that ensures atomic commitment of distributed transactions consisting of communicating transaction components at participating sites. This formal approach carries the development of the system from an initial abstract specification of transactional updates on a one copy database to a detailed design containing replicated databases in refinement. Through refinement we verify that the design of the replicated database confirms to the one copy database abstraction.
Divakar Yadav, Michael Butler
Engineering Reconfigurable Distributed Software Systems: Issues Arising for Pervasive Computing
Abstract
This chapter establishes a common base for discussing reconfigurability in distributed software systems in general and in pervasive systems in particular, by introducing a generic reconfiguration cycle. Following this cycle, we discuss in detail three former efforts on reconfigurable pervasive systems, and draw conclusions about the capacity of existing approaches to deal with open, dynamic, ad hoc environments. We, then, outline our approach towards uncontrolled reconfiguration targeting environments in which no centralized coordination or prior awareness between services being composed is assumed. Our solution supports awareness of service semantics and related service discovery, configuration change detection and state transfer, interface-aware dynamic adaptation of service orchestrations and conversation-aware checkpointing and recovery.
Apostolos Zarras, Manel Fredj, Nikolaos Georgantas, Valerie Issarny

Position Papers

Tools for Developing Large Systems (A Proposal)
Abstract
It is claimed, as a provocative thesis, that high level programming languages and corresponding compilers might not be the right tools to be used to construct large reliable software systems. An alternative is proposed which is based on the concept of a System Development Database.
Jean-Raymond Abrial
Why Programming Languages Still Matter
Abstract
This paper examines some aspects of the aims and goals of the RODIN project and asks whether a successful outcome of the project will remove the need for us to worry about programming languages and the meaning of program source code. In common with some other currently ascendent approaches to software engineering, such as model-based development, RODIN is leading towards the construction of software models (in RODIN’s case precise software models) from which we may hope to generate source or even object code. So, does this remove the need for us to be concerned with the form these automatically-generated, intermediate representations take? Perhaps rather surprisingly, I conclude that the need to show an unbroken chain of confidence from requirements to object code means that programming languages and their analysis, remain an extremely important topic. I hope to show that the ability to produce better specifications and designs, as promised by approaches exemplified by RODIN, is a necessary precondition for effective high-integrity software development rather than a substitute for approaches currently in use.
Peter Amey
Backmatter
Metadaten
Titel
Rigorous Development of Complex Fault-Tolerant Systems
herausgegeben von
Michael Butler
Cliff B. Jones
Alexander Romanovsky
Elena Troubitsyna
Copyright-Jahr
2006
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-540-48267-3
Print ISBN
978-3-540-48265-9
DOI
https://doi.org/10.1007/11916246