Skip to main content

Über dieses Buch

As software systems become increasingly ubiquitous, issues of dependability become ever more crucial. Given that solutions to these issues must be considered from the very beginning of the design process, it is clear that dependability and security have to be addressed at the architectural level. This book, as well as its six predecessors, was born of an effort to bring together the research communities of software architectures, dependability, and security. This state-of-the-art survey contains expanded, peer-reviewed papers based on selected contributions from the Workshop on Architecting Dependable Systems (WADS 2009), held at the International Conference on Dependable Systems and Networks (DSN 2009), as well as a number of invited papers written by renowned experts in the area. The 13 papers are organized in topical sections on: mobile and ubiquitous systems, architecting systems, fault management, and experience and vision.



Part 1. Mobile and Ubiquitous Systems

Self-healing for Pervasive Computing Systems

The development of small wireless sensors and smart-phones have facilitated new pervasive applications. These pervasive systems are expected to perform in a broad set of environments with different capabilities and resources. Application requirements may change dynamically requiring flexible adaptation. Sensing faults appear during their lifetime and as users are not expected to have technical skills, the system needs to be self-managing. We discuss the Self-Managed Cell as an architectural paradigm and describe some fundamental components to address distributed management of sensing faults as well as adaptation for wireless sensor nodes.
Themistoklis Bourdenas, Morris Sloman, Emil C. Lupu

Self Organization and Self Maintenance of Mobile Ad Hoc Networks through Dynamic Topology Control

One way in which wireless nodes can organize themselves into an ad hoc network is to execute a topology control protocol, which is designed to build a network satisfying specific properties. A number of basic topology control protocols exist and have been extensively analyzed. Unfortunately, most of these protocols are designed primarily for static networks and the protocol designers simply advise that the protocols should be repeated periodically to deal with failures, mobility, and other sources of dynamism. However, continuously maintaining a network topology with basic connectivity properties is a fundamental requirement for overall network dependability. Current approaches consider failures only as an afterthought or take a static fault tolerance approach, which results in extremely high energy usage and low throughput. In addition, most of the existing topology control protocols assume that transmission power is a continuous variable and, therefore, nodes can choose an arbitrary power value between some minimum and maximum powers. However, wireless network interfaces with dynamic transmission power control permit the power to be set to one of a discrete number of possible values. This simple restriction complicates the design of the topology control protocol substantially. In this paper, we present a set of topology control protocols, which work with discrete power levels and for which we specify a version that deals specifically with dynamic networks that experience failures, mobility, and other dynamic conditions. Our protocols are also novel in the sense that they are the first to consider explicit coordination between neighboring nodes, which results in more efficient power settings. In this paper, we present the design of these topology control protocols, and we report on extensive simulations to evaluate them and compare their performance against existing protocols. The results demonstrate that our protocols produce very similar topologies as the best protocols that assume power is a continuous variable, while having very low communication cost and seamlessly handling failures and mobility.
Douglas M. Blough, Giovanni Resta, Paolo Santi, Mauro Leoncini

Data Backup for Mobile Nodes: A Cooperative Middleware and an Experimentation Platform

In this paper, we present a middleware for dependable mobile systems and an experimentation platform for its evaluation. Our middleware is based on three original building blocks: a Proximity Map, a Trust and Cooperation Oracle, and a Cooperative Data Backup service. A Distributed Black-box application is used as an illustrative application of our architecture, and is evaluated on top of our mobile experimental platform.
Marc-Olivier Killijian, Matthieu Roy

Part 2. Architecting Systems

Identification of Security Requirements in Systems of Systems by Functional Security Analysis

Cooperating systems typically base decisions on information from their own components as well as on input from other systems. Safety critical decisions based on cooperative reasoning however raise severe concerns to security issues. Here, we address the security requirements elicitation step in the security engineering process for such systems of systems. The method comprises the tracing down of functional dependencies over system component boundaries right onto the origin of information as a functional flow graph. Based on this graph, we systematically deduce comprehensive sets of formally defined authenticity requirements for the given security and dependability objectives. The proposed method thereby avoids premature assumptions on the security architecture’s structure as well as the means by which it is realised. Furthermore, a tool-assisted approach that follows the presented methodology is described.
Andreas Fuchs, Roland Rieke

Implementing Reliability: The Interaction of Requirements, Tactics and Architecture Patterns

An important way that the reliability of a software system is enhanced is through the implementation of specific run-time measures called runtime tactics. Because reliability is a system-wide property, tactic implementations affect the software structure and behavior at the system, or architectural level. For a given architecture, different tactics may be a better or worse fit for the architecture, depending on the requirements and how the architecture patterns used must change to accommodate the tactic: different tactics may be a better or worse fit for the architecture. We found three important factors that influence the implementation of reliability tactics. One is the nature of the tactic, which indicates whether the tactic influences all components of the architecture or just a subset of them. The second is the interaction between architecture patterns and tactics: specific tactics and patterns are inherently compatible or incompatible. The third is the reliability requirements which influence which tactics to use and where they should be implemented. Together, these factors affect how, where, and the difficulty of implementing reliability tactics. This information can be used by architects and developers to help make decisions about which patterns and tactics to use, and can also assist these users in learning what modifications and additions to the patterns are needed.
Neil B. Harrison, Paris Avgeriou

A Framework for Flexible and Dependable Service-Oriented Embedded Systems

The continued development and deployment of distributed, real-time embedded systems technologies in recent years has resulted in a multitude of ecosystems in which service-oriented embedded systems can now be realised. Such ecosystems are often exposed to dynamic changes in user requirements, environmental conditions and network topologies that require service-oriented embedded systems to evolve at runtime. This paper presents a framework for service-oriented embedded systems that can dynamically adapt to changing conditions at runtime. Supported by model-driven development techniques, the framework facilitates lightweight dynamic service composition in embedded systems while predicting the temporal nature of unforeseen service assemblies and coping with adverse feature interactions following dynamic service composition. This minimises the complexity of evolving software where services are deployed dynamically and ultimately, enables flexible and dependable service-oriented embedded systems.
Shane Brennan, Serena Fritsch, Yu Liu, Ashley Sterritt, Jorge Fox, Éamonn Linehan, Cormac Driver, René Meier, Vinny Cahill, William Harrison, Siobhán Clarke

Architecting Robustness and Timeliness in a New Generation of Aerospace Systems

Aerospace systems have strict dependability and real-time requirements, as well as a need for flexible resource reallocation and reduced size, weight and power consumption. To cope with these issues, while still maintaining safety and fault containment properties, temporal and spatial partitioning (TSP) principles are employed. In a TSP system, the various onboard functions (avionics, payload) are integrated in a shared computing platform, however being logically separated into partitions. Robust temporal and spatial partitioning means that partitions do not mutually interfere in terms of fulfilment of real-time and addressing space encapsulation requirements. This chapter describes in detail the foundations of an architecture for robust TSP aiming a new generation of spaceborne systems, including advanced dependability and timeliness adaptation control mechanisms. A formal system model which allows verification of integrator-defined system parameters is defined, and a prototype implementation demonstrating the current state of the art is presented.
José Rufino, João Craveiro, Paulo Verissimo

Part 3. Fault Management

Architecting Dependable Systems with Proactive Fault Management

Management of an ever-growing complexity of computing systems is an everlasting challenge for computer system engineers. We argue that we need to resort to predictive technologies in order to harness the system’s complexity and transform a vision of proactive system and failure management into reality. We describe proactive fault management, provide an overview and taxonomy for online failure prediction methods and present a classification of failure prediction-triggered methods. We present a model to assess the effects of proactive fault management on system reliability and show that overall dependability can significantly be enhanced. After having shown the methods and potential of proactive fault management we describe a blueprint how proactive fault management can be incorporated into a dependable system’s architecture.
Felix Salfner, Miroslaw Malek

ASDF: An Automated, Online Framework for Diagnosing Performance Problems

Performance problems account for a significant percentage of documented failures in large-scale distributed systems, such as Hadoop. Localizing the source of these performance problems can be frustrating due to the overwhelming amount of monitoring information available. We automate problem localization using ASDF, an online diagnostic framework that transparently monitors and analyzes different time-varying data sources (e.g., OS performance counters, Hadoop logs) and narrows down performance problems to a specific node or a set of nodes. ASDF’s flexible architecture allows system administrators to easily customize data sources and analysis modules for their unique operating environments. We demonstrate the effectiveness of ASDF’s diagnostics on documented performance problems in Hadoop; our results indicate that ASDF incurs an average monitoring overhead of 0.38% of CPU time and achieves a balanced accuracy of 80% at localizing problems to the culprit node.
Keith Bare, Soila P. Kavulya, Jiaqi Tan, Xinghao Pan, Eugene Marinelli, Michael Kasick, Rajeev Gandhi, Priya Narasimhan

Part 4. Experience and Vision

Is Collaborative QoS the Solution to the SOA Dependability Dilemma?

Service-oriented architectures (SOAs) are an approach to structuring software in which distributed applications are constructed as collections of interacting services. While they promise many benefits including significant cost savings through service reuse and faster application design and implementation, many of the very aspects that make SOAs attractive amplify the dependability challenges faced by distributed applications. This dependability dilemma becomes especially pronounced when the services making up an application are owned and managed by different organizations or are executed on resources owned and operated by third parties, such as cloud computing or utility computing providers. This paper reviews the vision of SOAs, and discusses the characteristics that make them particularly challenging for dependability. It then discusses techniques that have been proposed for building dependable SOAs and why a comprehensive solution remains elusive despite these efforts. Finally, we argue that—despite the fact that service independence is often cited as one of the main attractions of SOAs—any successful solution requires collaborative quality of service (QoS) in which services, service providers, and resource providers cooperate to implement dependability. The primary goals of this paper are to highlight the dependability implications of architectures based on decoupled and independent services such as SOAs, and to suggest possible approaches to enhancing dependability by weakening these characteristics in a controlled way.
Matti A. Hiltunen, Richard D. Schlichting

Software Assumptions Failure Tolerance: Role, Strategies, and Visions

At our behest or otherwise, while our software is being executed, a huge variety of design assumptions is continuously matched with the truth of the current condition. While standards and tools exist to express and verify some of these assumptions, in practice most of them end up being either sifted off or hidden between the lines of our codes. Across the system layers, a complex and at times obscure web of assumptions determines the quality of the match of our software with its deployment platforms and run-time environments. Our position is that it becomes increasingly important being able to design software systems with architectural and structuring techniques that allow software to be decomposed to reduce its complexity, but without hiding in the process vital hypotheses and assumptions. In this paper we discuss this problem, introduce three potentially dangerous consequences of its denial, and propose three strategies to facilitate their treatment. Finally we propose our vision towards a new holistic approach to software development to overcome the shortcomings offered by fragmented views to the problem of assumption failures.
Vincenzo De Florio

Architecting Dependable Systems Using Reflective Computing: Lessons Learnt and Some Challenges

The use of the reflection paradigm was motivated by the need of separation of concerns in dependable systems. The separation of the application from its fault tolerance mechanisms for instance was a good way to make the system adaptive, the application and mechanisms reusable. One may ask, however, to which extent this separation of concerns is of interest for practical dependable systems. This depends very much on the mechanisms considered, and on some target objective of the system designer in terms of system properties. The present paper attempts to shed some light on these factors by drawing the lessons gained from several research projects with colleagues in the dependability community and beyond. We also claim that some novel technologies are of high interest and that their use should be based on the experience gained in the field of reflective computing. Finally, we express some of the challenges we feel of interest for the development of dependable systems in general and of adaptive fault tolerant systems in particular.
Jean-Charles Fabre

Architecting and Validating Dependable Systems: Experiences and Visions

The world of computer systems today is composed of very different kind of critical architectures: from embedded safety-critical sensors and safety equipment (e.g., train on-board equipment), to large, highly dependable multi-computers (e.g. plant control systems), to smart resilient components for ubiquitous networks (e.g., biometrics monitoring applications). The common trend for all of them is to become open and part of an integrated cyber world; still, each of them brings specific challenges that need to be addressed for their design and validation, possibly leading to the different architectural and validation solutions. This paper discusses the experiences gained by the authors on architecting and validating dependable systems, considering the activities they carried out during recently ended European FP6 projects, which concerned traditional embedded systems (in the railway domain – SAFEDMI project), large-scale critical infrastructures (in the electric domain – CRUTIAL project), and distributed mobile systems (in the automotive domain – HIDENETS project). The vision on upcoming and future challenges and trends is finally provided considering pervasive/ubiquitous systems in the context of the just started FP7 ALARP project and considering Future Internet scenarios.
Andrea Bondavalli, Andrea Ceccarelli, Paolo Lollini


Weitere Informationen

Premium Partner