ABSTRACT
Dependability and fault-tolerance, which are key requirements for business- or safety-critical applications, require explicit knowledge of potential faults that may occur within a system. In contrast to other major research directions, the emerging field of distributed event-based systems is yet lacking a common understanding of faults. In this paper we take a step forward and study potential origins and effects of faults in such systems. Our work on a unified fault taxonomy follows a rigorous methodology. We first identify five core sub-areas in the broader field of event-based systems, and discuss commonalities and differences among them. Then we derive from the existing literature a coherent domain model, which accurately captures the specifics of the different areas. The domain model provides a holistic view and covers both structural and procedural aspects of event-based systems. Based on this model, we elaborate a detailed taxonomy of faults, in line with well-established fault dimensions from dependable and secure computing. The fault taxonomy forms the basis for a comprehensive discussion of fault instances across the five sub-areas of event processing.
- D. J. Abadi et al. The Design of the Borealis Stream Processing Engine. In 2nd CIDR, 2005.Google Scholar
- I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci. Wireless sensor networks: a survey. Computer Networks, 38(4):393--422, Elsevier, 2002. Google ScholarDigital Library
- T. Aslam, I. Krsul, and E. Spafford. Use of a taxonomy of security faults. 19th NISSC, 1996.Google Scholar
- A. Avizienis and Y. He. Microprocessor entomology: a taxonomy of design faults in COTS microprocessors. In Dependable Comput. for Crit. Applications 7, 1999. Google ScholarDigital Library
- A. Avizienis, J. Laprie, B. Randell, and C. Landwehr. Basic concepts and taxonomy of dependable and secure computing. IEEE TDSC, 1:11--33, 2004. Google ScholarDigital Library
- B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In 21st PODS, pages 1--16, 2002. Google ScholarDigital Library
- S. Babu and J. Widom. Continuous queries over data streams. SIGMOD Record, 30:109--120, 2001. Google ScholarDigital Library
- G. Banavar, T. Chandra, R. Strom, and D. Sturman. A case for message oriented middleware. In SDC, 1999. Google ScholarDigital Library
- R. S. Barga, J. Goldstein, M. H. Ali, and M. Hong. Consistent streaming through time: A vision for event stream processing. In 3rd CIDR, pages 363--374, 2007.Google Scholar
- R. V. Binder. Testing object-oriented software: a survey. STVR Journal, 6(3--4):125--252, Wiley, 1996.Google Scholar
- S. Bruning, S. Weissleder, and M. Malek. A Fault Taxonomy for Service-Oriented Architecture. In 10th IEEE HASE, pages 367--368, 2007. Google ScholarDigital Library
- Z. Butler and D. Rus. Event-based motion control for mobile-sensor networks. Pervasive Comp., 2(4), 2003. Google ScholarDigital Library
- A. Carzaniga, D. Rosenblum, and A. Wolf. Design and evaluation of a wide-area event notification service. ACM Transactions on Computer Systems, 19(3), 2001. Google ScholarDigital Library
- K. S. Chan, J. Bishop, J. Steyn, L. Baresi, and S. Guinea. A fault taxonomy for web service composition. In 7th ICSOC - Workshops, 2009. Google ScholarDigital Library
- M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Çetintemel, Y. Xing, and S. B. Zdonik. Scalable distributed stream processing. In CIDR, 2003.Google Scholar
- C. Ciordas, T. Basten, A. Radulescu, K. Goossens, and J. Meerbergen. An event-based network-on-chip monitoring service. In 9th IEEE HLDVT, 2004. Google ScholarDigital Library
- F. Dabek, N. Zeldovich, F. Kaashoek, D. Mazières, and R. Morris. Event-driven programming for robust software. In 10th ACM SIGOPS (workshops), 2002. Google ScholarDigital Library
- A. Das, J. Gehrke, and M. Riedewald. Approximate join processing over data streams. In SIGMOD, 2003. Google ScholarDigital Library
- J. Duraes and H. Madeira. Emulation of software faults: A field data study and a practical approach. IEEE TSE, 32(11), 2006. Google ScholarDigital Library
- E. Elnozahy, L. Alvisi, Y.-M. Wang, and D. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Comp. Surveys, 34(3), 2002. Google ScholarDigital Library
- O. Etzion and P. Niblett. Event Processing in Action. Manning Publications Co., 2010. Google ScholarDigital Library
- P. T. Eugster, P. A. Felber, R. Guerraoui, and A.-M. Kermarrec. The many faces of publish/subscribe. ACM Computing Surveys, 35:114--131, 2003. Google ScholarDigital Library
- L. Fiege, F. Gartner, O. Kasten, and A. Zeidler. Supporting Mobility in Content-Based Publish/Subscribe Middleware. In Middleware, 2003. Google ScholarDigital Library
- C. Fowler and B. Qasemizadeh. Towards a Common Event Model for an Integrated Sensor Information System. In Workshop on the Semantic Sensor Web, 2009.Google Scholar
- D. Garlan and D. Notkin. Formalizing design spaces: Implicit invocation mechanisms. In 4th VDM Europe Symp. on Formal Softw. Dev., pages 31--44, 1991. Google ScholarDigital Library
- D. Gelernter. Multiple tuple spaces in linda. Parallel Architectures and Languages Europe, 366:20--27, 1989. Google ScholarDigital Library
- V. Hadzilacos and S. Toueg. A modular approach to fault-tolerant broadcasts and related problems. Technical report, Cornell University, 1994. Google ScholarDigital Library
- J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, and K. Pister. System architecture directions for networked sensors. SIGPLAN Notices, 35:93--104, 2000. Google ScholarDigital Library
- W. Hummer, P. Leitner, B. Satzger, and S. Dustdar. Dynamic migration of processing elements for optimized query execution in event-based systems. In OnTheMove Federated Conferences, 2011. Google ScholarDigital Library
- R. Isermann. Model-based fault-detection and diagnosis - status and applications. Annual Reviews in Control, 29(1):71--85, 2005.Google ScholarCross Ref
- G. Jacques-Silva, B. Gedik, H. Andrade, K.-L. Wu, and R. Iyer. Fault injection-based assessment of partial fault tolerance in stream processing applications. In 5th DEBS, 2011. Google ScholarDigital Library
- P. Jalote. Fault tolerance in distributed systems. Prentice Hall, 1994. Google ScholarDigital Library
- Z. Jerzak and C. Fetzer. Bloom filter based routing for content-based publish/subscribe. In 2nd DEBS, 2008. Google ScholarDigital Library
- L. Krishnamachari, D. Estrin, and S. Wicker. The impact of data aggregation in wireless sensor networks. In 22nd DCS - Workshops, pages 575--578, 2002. Google ScholarDigital Library
- C. Krügel, T. Toth, and C. Kerer. Decentralized event correlation for intrusion detection. In 4th ICISC, 2002.Google ScholarCross Ref
- G. T. Lakshmanan, Y. G. Rabinovich, and O. Etzion. A stratified approach for supporting high throughput event processing applications. In 3rd DEBS, 2009. Google ScholarDigital Library
- J. Laprie, editor. Dependability: Basic Concepts and Terminology. Springer, 1992. Google ScholarDigital Library
- M. Leszak, D. E. Perry, and D. Stoll. A case study in root cause defect analysis. In 22nd ICSE, 2000. Google ScholarDigital Library
- G. Li, V. Muthusamy, and H.-A. Jacobsen. A distributed service-oriented architecture for business process execution. ACM TWEB, 4:2:1--2:33, 2010. Google ScholarDigital Library
- C. Liebig, M. Cilia, and A. Buchmann. Event composition in time-dependent distributed systems. In 4th CoopIS, pages 70--78, 1999. Google ScholarDigital Library
- D. Luckham et al. Specification and analysis of system architecture using Rapide. IEEE TSE, 21(4), 1995. Google ScholarDigital Library
- D. C. Luckham. The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems. Addison-Wesley Longman, 2001. Google ScholarDigital Library
- D. C. Luckham and B. Frasca. Complex Event Processing in Distributed Systems. Analysis, 28, 1998.Google Scholar
- S. Mahambre, M. Kumar, and U. Bellur. A Taxonomy of QoS-Aware, Adaptive Event-Dissemination Middleware. IEEE Internet Comp., 11(4):35--44, 2007. Google ScholarDigital Library
- S. Malek, M. Mikic-Rakic, and N. Medvidovic. A style-aware architectural middleware for resource-constrained, distributed systems. IEEE TSE, 31, 2005. Google ScholarDigital Library
- M. Mansouri-Samani and M. Sloman. GEM: a generalized event monitoring language for distributed systems. Distributed Systems Engineering, 4(2), 1997.Google Scholar
- D. McCarthy and U. Dayal. The architecture of an active database management system. In SIGMOD'89. Google ScholarDigital Library
- R. Meier and V. Cahill. Taxonomy of Distributed Event-Based Programming Systems. Computer Journal, 48(5):602--626, 2005. Google ScholarDigital Library
- A. Michlmayr, F. Rosenberg, P. Leitner, and S. Dustdar. Advanced event processing and notifications in service runtime environments. In DEBS, 2008. Google ScholarDigital Library
- C. Moxey et al. A conceptual model for event processing systems. IBM Redguide publication, 2010. http://www.redbooks.ibm.com/abstracts/redp4642.html.Google Scholar
- G. Mühl, L. Fiege, and P. R. Pietzuch. Distributed event-based systems. Springer, 2006. Google ScholarDigital Library
- E. Nakamura, A. Loureiro, and A. Frery. Information fusion for wireless sensor networks: Methods, models, and classifications. ACM Computing Surveys, 39, 2007. Google ScholarDigital Library
- P. Pietzuch and J. Bacon. Hermes: a distributed event-based middleware architecture. In ICDCS, 2002. Google ScholarDigital Library
- P. A. Porras and P. G. Neumann. EMERALD: Event Monitoring Enabling Responses to Anomalous Live Disturbances. 20th NISSC, pages 353--365, 1997.Google Scholar
- P. Ramadge and W. Wonham. The control of discrete event systems. Proceedings of the IEEE, 77(1), 1989.Google ScholarCross Ref
- A. Rozinat and W. van der Aalst. Conformance testing: Measuring the fit and appropriateness of event logs and process models. In BPM Workshops, 2006. Google ScholarDigital Library
- S. Rozsnyai, A. Slominski, and G. T. Lakshmanan. Discovering event correlation rules for semi-structured business processes. In 5th DEBS, 2011. Google ScholarDigital Library
- L. Ruiz, I. Siqueira, L. Oliveira, H. Wong, J. Nogueira, and A. Loureiro. Fault management in event-driven wireless sensor networks. In 7th ACM MSWiM, 2004. Google ScholarDigital Library
- A.-W. Scheer, O. Thomas, and O. Adam. Process-Aware Information Systems, chapter Process Modeling using Event-Driven Process Chains. Wiley, 2005.Google Scholar
- N. P. Schultz-Møller, M. Migliavacca, and P. Pietzuch. Distributed complex event processing with query rewriting. In 3rd DEBS, 2009. Google ScholarDigital Library
- M. A. Shah, J. M. Hellerstein, and E. Brewer. Highly available, fault-tolerant, parallel dataflows. In ACM SIGMOD Conference, pages 827--838, 2004. Google ScholarDigital Library
- G. Sharon and O. Etzion. Event-processing network model and implementation. IBM Systems Journal, 47(2):321--334, 2008. Google ScholarDigital Library
- M. Steinder and A. S. Sethi. A survey of fault localization techniques in computer networks. Science of Computer Programming, 53(2), Elsevier, 2004.Google Scholar
- R. Stephens. A survey of stream processing. Acta Informatica, 34:491--541, 1997.Google ScholarCross Ref
- R. N. Taylor et al. A Component- and Message-Based Architectural Style for GUI Software. IEEE TSE, 22(6):390--406, 1996. Google ScholarDigital Library
- J.-Y. Tigli et al. WComp middleware for ubiquitous computing: Aspects and composite event-based Web services. Annales des Télécommunications, 64, 2009.Google Scholar
- S. Tilak, N. B. Abu-Ghazaleh, and W. Heinzelman. A taxonomy of wireless micro-sensor network models. ACM SIGMOBILE MC2R, 6(2):28--36, 2002. Google ScholarDigital Library
- W. van der Aalst. Formalization and verification of event-driven process chains. IST, 41, Elsevier, 1999.Google Scholar
- W. van der Aalst, T. Weijters, and L. Maruster. Workflow mining: discovering process models from event logs. IEEE TKDE, 16(9):1128--1142, 2004. Google ScholarDigital Library
- A. V. D. Goor and Z. Al-Ars. Functional memory faults: a formal notation and a taxonomy. In VLSI Test, 2000. Google ScholarDigital Library
- R. von Ammon et al. Existing and future standards for event-driven business process management. In 3rd DEBS, 2009. Google ScholarDigital Library
- F. Wang, S. Liu, P. Liu, and Y. Bai. Bridging Physical and Virtual Worlds: Complex Event Processing for RFID Data Streams. In 10th EDBT, 2006. Google ScholarDigital Library
- U. Westermann and R. Jain. Toward a common event model for multimedia applications. MultiMedia, 2007. Google ScholarDigital Library
- M. Wieland, D. Martin, O. Kopp, and F. Leymann. SOEDA: A Methodology for Specification and Implementation of Applications on a Service-Oriented Event-Driven Architecture. In 12th BIS, 2009.Google ScholarCross Ref
- E. Wu, Y. Diao, and S. Rizvi. High-performance complex event processing over streams. In SIGMOD, 2006. Google ScholarDigital Library
Index Terms
- Deriving a unified fault taxonomy for event-based systems
Recommendations
A Simulated Fault Injection Framework for Time-Triggered Safety-Critical Embedded Systems
SAFECOMP 2014: Proceedings of the 33rd International Conference on Computer Safety, Reliability, and Security - Volume 8666This paper presents a testing and simulated fault injection framework for time-triggered safety-critical embedded systems. Our approach facilitates the validation of fault-tolerance mechanisms by performing non-intrusive (SFI) on models of the system at ...
An approach towards benchmarking of fault-tolerant commercial systems
FTCS '96: Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)This paper presents a benchmark for dependable systems. The benchmark consists of two metrics, number of catastrophic incidents and performance degradation, which are obtained by a tool that (1) generates synthetic workloads that produce a high level of ...
A Fault Taxonomy for Web Service Composition
Service-Oriented Computing - ICSOC 2007 WorkshopsWeb services are becoming progressively popular in the building of both inter- and intra-enterprise business processes. These processes are composed from existing Web services based on defined requirements. In collecting together the services for such a ...
Comments