skip to main content
research-article

Fault-tolerance in the borealis distributed stream processing system

Published:21 March 2008Publication History
Skip Abstract Section

Abstract

Over the past few years, Stream Processing Engines (SPEs) have emerged as a new class of software systems, enabling low latency processing of streams of data arriving at high rates. As SPEs mature and get used in monitoring applications that must continuously run (e.g., in network security monitoring), a significant challenge arises: SPEs must be able to handle various software and hardware faults that occur, masking them to provide high availability (HA). In this article, we develop, implement, and evaluate DPC (Delay, Process, and Correct), a protocol to handle crash failures of processing nodes and network failures in a distributed SPE.

Like previous approaches to HA, DPC uses replication and masks many types of node and network failures. In the presence of network partitions, the designer of any replication system faces a choice between providing availability or data consistency across the replicas. In DPC, this choice is made explicit: the user specifies an availability bound (no result should be delayed by more than a specified delay threshold even under failure if the corresponding input is available), and DPC attempts to minimize the resulting inconsistency between replicas (not all of which might have seen the input data) while meeting the given delay threshold. Although conceptually simple, the DPC protocol tolerates the occurrence of multiple simultaneous failures as well as any further failures that occur during recovery.

This article describes DPC and its implementation in the Borealis SPE. We show that DPC enables a distributed SPE to maintain low-latency processing at all times, while also achieving eventual consistency, where applications eventually receive the complete and correct output streams. Furthermore, we show that, independent of system size and failure location, it is possible to handle failures almost up-to the user-specified bound in a manner that meets the required availability without introducing any inconsistency.

References

  1. Abadi, D. J., Ahmad, Y., Balazinska, M., Çetintemel, U., Cherniack, M., Hwang, J.-H., Lindner, W., Maskey, A. S., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., and Zdonik, S. 2005. The design of the Borealis stream processing engine. In Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR).Google ScholarGoogle Scholar
  2. Abadi, D. J., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., and Zdonik, S. 2003. Aurora: A new model and architecture for data stream management. VLDB J. 12, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Aleri. http://www.aleri.com/index.html.Google ScholarGoogle Scholar
  4. Alonso, G., Günthör, R., Kamath, M., Agrawal, D., El Abbadi, A., and Mohan, C. 1995. Exotica/FMDC: Handling disconnected clients in a workflow management system. In Proceedings of the 3rd International Conference on Cooperative Information Systems.Google ScholarGoogle Scholar
  5. Alonso, G. and Mohan, C. 1997. WFMS: The next generation of distributed processing tools. S. Jajodia and L. Kerschberg, Eds. In Advanced Transaction Models and Architectures, Kluwer.Google ScholarGoogle Scholar
  6. Alonso, G., Mohan, C., Günthör, R., Agrawal, D., El Abbadi, A., and Kamath, M. 1995. Exotica/FMQM: A persistent message-based architecture for distributed workflow management. In Proceedings of IFIP WG8.1 Working Conference on Information Systems for Decentralized Organizations.Google ScholarGoogle Scholar
  7. Alonso, G., Reinwald, B., and Mohan, C. 1997. Distributed data management in workflow environments. In Proceedings of the 7th ACM RIDE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Balakrishnan, H., Balazinska, M., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Galvez, E., Salz, J., Stonebraker, M., Tatbul, N., Tibbets, R., and Zdonik, S. 2004. Retrospective on Aurora. VLDB J. 13, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Balazinska, M. 2006. Fault-tolerance and load management in a distributed stream processing system. Ph.D. thesis, Massachusetts Institute of Technology. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Balazinska, M., Balakrishnan, H., Madden, S., and Stonebraker, M. 2005. Fault-tolerance in the Borealis distributed stream processing system. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bernstein, P. A., Hsu, M., and Mann, B. 1990. Implementing recoverable requests using queues. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Brewer, E. A. 2001. Lessons from giant-scale services. IEEE Internet Comput. 5, 4, 46--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M. J., Hellerstein, J. M., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., and Shah, M. 2003. TelegraphCQ: Continuous dataflow processing for an uncertain world. In Proceedings of the 1st Biennial Conference on Innovative Data Systems Research (CIDR).Google ScholarGoogle Scholar
  14. Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Çetintemel, U., Xing, Y., and Zdonik, S. 2003. Scalable distributed stream processing. In Proceedings of the 1st Biennial Conference on Innovative Data Systems Research (CIDR).Google ScholarGoogle Scholar
  15. Coral8. http://coral8.com/.Google ScholarGoogle Scholar
  16. Cranor, C., Johnson, T., Shkapenyuk, V., and Spatscheck, O. 2003. Gigascope: A stream database for network applications. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Elnozahy, E. N. M., Alvisi, L., Wang, Y.-M., and Johnson, D. B. 2002. A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv. 34, 3, 375--408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Fekete, A., Gupta, D., Luchangco, V., Lynch, N., and Shvartsman, A. 1996. Eventually-serializable data services. In Proceedings of the 15th ACM Symposium on Principles of Distributed Computing (PODC'96). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Garcia-Molina, H. and Barbara, D. 1985. How to assign votes in a distributed system. J. ACM 32, 4, 841--860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Gifford, D. K. 1979. Weighted voting for replicated data. In Proceedings of the 7th ACM Symposium on Operating Systems Principles (SOSP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Gilbert, S. and Lynch, N. 2002. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant Web services. ACM SIGACT News 33, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gray, J., Helland, P., O'Neil, P., and Shasha, D. 1996. The dangers of replication and a solution. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gray, J. and Reuters, A. 1993. Transaction Processing: Concepts and Techniques. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hsu, M. 1995. Special issue on workflow systems. IEEE Data Eng. Bull. 18, 1.Google ScholarGoogle Scholar
  25. Hwang, J.-H., Balazinska, M., Rasin, A., Çetintemel, U., Stonebraker, M., and Zdonik, S. 2005. High-availability algorithms for distributed stream processing. In Proceedings of the 21st International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kamath, M., Alonso, G., Guenthor, R., and Mohan, C. 1996. Providing high availability in very large workflow management systems. In Proceedings of the 5th International Conference on Extending Database Technology. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kao, B. and Garcia-Molina, H. 1995. An overview of real-time database systems. In Advances in Real-Time Systems. Prentice-Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kawell, L., Beckhardt, S., Halvorsen, T., Ozzie, R., and Greif, I. 1988. Replicated document management in a group communication system. In Proceedings of the ACM Conference on Computer-Supported Cooperative Work (CSCW). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lam, K.-W., Son, S. H., Hung, S.-L., and Wang, Z. 2000. Scheduling transactions with stringent real-time constraints. Inform. Syst. 25, 6, 431--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lomet, D. and Tuttle, M. 2003. A theory of redo recovery. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Melliar-Smith, P. M. and Moser, L. E. 2004. Progress in real-time fault tolerance. In Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems (SRDS'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Mossé, D., Melhem, R., and Ghosh, S. 2003. A nonpreemptive real-time scheduler with recovery from transient faults and its implementation. IEEE Trans. Softw. Engin. 29, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., and Varma, R. 2003. Query processing, approximation, and resource management in a data stream management system. In Proceedings of the 1st Biennial Conference on Innovative Data Systems Research (CIDR).Google ScholarGoogle Scholar
  34. Naughton et al. 2001. The Niagara Internet query system. IEEE Data Engin. Bull. 24, 2.Google ScholarGoogle Scholar
  35. Olston, C. 2003. Approximate replication. Ph.D. thesis, Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Olston, C., Jiang, J., and Widom, J. 2003. Adaptive filters for continuous queries over distributed data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Radoslavov, P., Govindan, R., and Estrin, D. 2001. Topology-informed Internet replica placement. In Proceedings of the 6th International Workshop on Web Caching and Content Distribution (WCW'01).Google ScholarGoogle Scholar
  38. Raman, V. and Hellerstein, J. M. 2002. Partial results for online query processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Saito, Y. and Shapiro, M. 2005. Optimistic replication. ACM Comput. Surv. 37, 1, 42--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shah, M., Hellerstein, J., and Brewer, E. 2004. Highly-available, fault-tolerant, parallel dataflows. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Srivastava, U. and Widom, J. 2004. Flexible time management in data stream systems. In Proceedings of the 23rd ACM Symposium on Principles of Database Systems (PODS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. StreamBase. http://www.streambase.com/.Google ScholarGoogle Scholar
  43. Strom, R. E. 2004. Fault-tolerance in the SMILE stateful publish-subscribe system. In Proceedings of the 3rd International Workshop on Distributed Event-Based Systems (DEBS'04).Google ScholarGoogle ScholarCross RefCross Ref
  44. Tang, X., Chi, H., and Chanson, S. T. 2007. Optimal replica placement under TTL-based consistency. IEEE Trans. Parall. Distrib. Syst. 18, 3, 351--363. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Terry, D. B., Theimer, M., Petersen, K., Demers, A. J., Spreitzer, M., and Hauser, C. 1995. Managing update conflicts in Bayou, a weakly connected replicated storage system. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP). Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. The NTP Project. NTP: The Network Time Protocol. http://www.ntp.org/.Google ScholarGoogle Scholar
  47. Tucker, P. A. and Maier, D. 2003. Dealing with disorder. In Proceedings of the Workshop on Management and Processing of Data Streams (MPDS).Google ScholarGoogle Scholar
  48. Urbano, R. 2003. Oracle Streams Replication Administrator's Guide, 10g Release 1 (10.1). Oracle Corporation.Google ScholarGoogle Scholar

Index Terms

  1. Fault-tolerance in the borealis distributed stream processing system

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Database Systems
        ACM Transactions on Database Systems  Volume 33, Issue 1
        March 2008
        211 pages
        ISSN:0362-5915
        EISSN:1557-4644
        DOI:10.1145/1331904
        Issue’s Table of Contents

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 March 2008
        • Accepted: 1 August 2007
        • Revised: 1 May 2007
        • Received: 1 September 2006
        Published in tods Volume 33, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader