ABSTRACT
The development of dependable software systems is a costly undertaking. Fault tolerance techniques as well as self-repair capabilities usually result in additional system complexity which can even spoil the intended improvement with respect to dependability. We therefore present a pattern-based approach for the design of service-based systems which enables self-managing capabilities by reusing proven fault tolerance techniques in form of Fault Tolerance Patterns. The pattern specification consists of a service-based architectural design and deployment restrictions in form of UML deployment diagrams for the different architectural services. The architectural design is reused when designing the system architecture. The deployment restrictions are employed to determine valid deployment scenarios for an application. During run-time the same restrictions are at first used to automatically map additional services on suitable nodes. If node crashes are detected, we secondly employ the restrictions to guide the self-repair of the system in such a way that only suitable repair decisions are made.
- T. Anderson and P. Lee. Fault Tolerance - Principles and Practice. Prentice Hall, 1981. Google ScholarDigital Library
- J. Bowen and M. Hinchey, editors. High-Integrity System Specification and Design. Springer Verlag, 1999. Google ScholarDigital Library
- F. Bübl and A. Leicher. Designing Distributed Component-Based Systems With DCL. In 7thIEEE Intern. Conference on Engineering of Complex Computer Systems ICECCS, Skövde, Sweden, June 2001. Google ScholarDigital Library
- A. Dearle, G. Kirby, and A. McCarthy. A F}ramework for Constraint-Based Deployment and Autonomic Management of Distributed Applications. Technical Report CS/04/1, University of St Andrews, 2004.Google Scholar
- J. C. Laprie, editor. Dependability: basic concepts and terminology in English, French, German, Italian and Japanese {IFIP WG 10.4, Dependable Computing and Fault Tolerance}, volume 5 of Dependable computing and fault tolerant systems. Springer Verlag, Wien, 1992. Google ScholarDigital Library
- M. Mikic-Rakic, S. Malek, N. Beckman, and N. Medvidovic. A Tailorable Environment for Assessing the Quality of Deployment Architectures in Highly Distributed Settings. In W. Emmerich and A. L. Wolf, editors, Component Deployment, Second International Working Conference, CD 2004, Edinburgh, UK, May 20--21, 2004, Proceedings, volume 3083 of Lecture Notes in Computer Science, pages 1--17, Springer, 2004.Google Scholar
- C. Nentwich, W. Emmerich, A. Finkelstein, and E. Ellmer. Flexible consistency checking. ACM Transactions on Software Engineering and Methodology (TOSEM), 12(1):28--63, 2003. Google ScholarDigital Library
- N. L. Sommer and F. Guidec. A Contract-Based Approach of Resource-Constrained Software Deployment. In Proc. of the Component Deployment: IFIP/ACM Working Conference, CD 2002, Berlin, Germany, volume 2370 of Lecture Notes in Computer Science, pages 15--30, June 2002. Google ScholarDigital Library
- N. Storey. Safety-Critical Computer Systems. Addison-Wesley, 1996. Google ScholarDigital Library
- M. Tichy, B. Becker, and H. Giese. Component Templates for Dependable Real-Time Systems. In Proc. of the 2nd International Fujaba Days 2004, Darmstadt, Germany, September 2004.Google Scholar
- M. Tichy and H. Giese. An Architecture for Configurable Dependability of Application Services. In R. de Lemos, C. Gacek, and A. Romanowsky, editors, Proc. of the Workshop on Software Architectures for Dependable Systems (WADS), Portland, USA (ICSE 2003 Workshop 7), May 2003.Google Scholar
- M. Tichy and H. Giese. Seamless UML Support for Service-based Software Architectures. In Proc. of the International Workshop on scientiFic engIneering of Distributed Java applications (FIDJI) 2003, Luxembourg, volume 2952 of Lecture Notes in Computer Science, pages 128--138, November 2003.Google Scholar
- M. Tichy and H. Giese. A Self-Optimizing Run-Time Architecture for Configurable Dependability of Services. In R. de Lemos, C. Gacek, and A. Romanovsky, editors, Architecting Dependable Systems II, number 3069 in Lecture Notes in Computer Science, pages 25--51. Springer Verlag, 2004. (to appear).Google Scholar
- J. Waldo, G. Wyant, A. Wollrath, and S. Kendal. A Note on Distributed Computing. techreport, Sun Microsystems Laboratories, November 1994, TR-94-29.Google Scholar
- Design of self-managing dependable systems with UML and fault tolerance patterns
Recommendations
Fault Tolerance in Multiprocessor Systems Without Dedicated Redundancy
An algorithm called RAFT (recursive algorithm for fault tolerance) for achieving fault tolerance in multiprocessor systems is described. Through the use of a combination of dynamic space- and time- redundancy techniques, RAFT achieves fault tolerance in ...
A Fault Tolerance Infrastructure for Dependable Computing with High-Performance COTS Components
DSN '00: Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)The failure rates of current COTS processors have dropped to 100 FITs (failures per 109 hours), indicating a potential MTTF of over 1100 years. However, our recent study of Intel P6 family processors has shown that they have very limited error detection ...
Comments