skip to main content
research-article

Out-of-order processing: a new architecture for high-performance stream systems

Published:01 August 2008Publication History
Skip Abstract Section

Abstract

Many stream-processing systems enforce an order on data streams during query evaluation to help unblock blocking operators and purge state from stateful operators. Such in-order processing (IOP) systems not only must enforce order on input streams, but also require that query operators preserve order. This order-preserving requirement constrains the implementation of stream systems and incurs significant performance penalties, particularly for memory consumption. Especially for high-performance, potentially distributed stream systems, the cost of enforcing order can be prohibitive. We introduce a new architecture for stream systems, out-of-order processing (OOP), that avoids ordering constraints. The OOP architecture frees stream systems from the burden of order maintenance by using explicit stream progress indicators, such as punctuation or heartbeats, to unblock and purge operators. We describe the implementation of OOP stream systems and discuss the benefits of this architecture in depth. For example, the OOP approach has proven useful for smoothing workload bursts caused by expensive end-of-window operations, which can overwhelm internal communication paths in IOP approaches. We have implemented OOP in two stream systems, Gigascope and NiagaraST. Our experimental study shows that the OOP approach can significantly outperform IOP in a number of aspects, including memory, throughput and latency.

References

  1. Abadi, D., et al. Aurora: A New Model and Architecture for Data Stream Management. VLDB Journal 12(2), August 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arasu, A., Babu, S., Widom, J. The CQL Continuous Query Language: Semantic Foundations and Query Execution. VLDB Journal 14(1), March 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Avnur, R., Hellerstein, J. M. Eddies: Continuously Adaptive Query Processing. SIGMOD 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Arasu, A., Widom, J. Resource Sharing in Continuous Sliding-window Aggregates. VLDB 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Balazinska, M, Balakrishnan, H., Madden, S., Stonebraker M. Fault-Tolerance in the Borealis Distributed Stream Processing System. SIGMOD 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cormode, G., et al. Holistic UDAFs at Streaming Speeds. SIGMOD 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cranor, C., Johnson, T., Spatashek, O. Gigascope: A Stream Database for Network Applications. SIGMOD 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ding, L., et al. Joining Punctuated Streams. EDBT 2004.Google ScholarGoogle Scholar
  9. Ding, L., Rundensteiner, E. A. Evaluating Window Joins over Punctuated Streams. CIKM 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Golab, L., Ozsu, M. T. Processing Sliding Window multi-joins in Continuous queries over Data Streams. VLDB 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hwang, J-H, Cetintemel, U., Zdonik, S. Fast and Highly-Available Stream Processing over Wide Area Networks. ICDE 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hammad, M. et al. Optimizing In-Order Execution of Continuous Queries over Streamed Sensor Data. SSDBM 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hammad, M., et al. Scheduling for Shared Window Joins over Data Streams. VLDB 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Li, Hua-Gang, et al. Safety Guarantee of Continuous Join Queries over Punctuated Data Streams. VLDB 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Jin Li, et al. No Pane, No Gain: Efficient Evaluation of Sliding-Window Aggregates over Data Streams. SIGMOD Record 34(1), March 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jin Li, et al. Semantics and Evaluation Techniques for Window Aggregates in Data Streams. SIGMOD 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Theodore Johnson, et al. A Heartbeat Mechanism and Its Application in Gigascope. VLDB 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kang, J., Naughton, J. F., Viglas, S. Evaluating Window Joins over Unbounded Streams. ICDE 2003.Google ScholarGoogle ScholarCross RefCross Ref
  19. Ken Keys, et al. A Robust System for Accurate Real-time Summaries of Internet Traffic. ACM SIGMETRICS Performance Evaluation Review 33(1), June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Krishnamurthy, S., Wu, C., Franklin, M. J. On-the-Fly Sharing for Streamed Aggregation. SIGMOD 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Naughton, J. et al. The Niagara Internet Query System. IEEE Data Eng. Bulletin 24(2), June 2001.Google ScholarGoogle Scholar
  22. Passive Measurement and Analysis Project. San Diego Supercomputer Center. http://pma.nlanr.net/PMA.Google ScholarGoogle Scholar
  23. Raman, V., Raman, B., Hellerstein, J. M. Online Dynamic Reordering for Interactive Data Processing. VLDB 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Rundensteiner, E. A., et al. CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity. VLDB 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Srivastava, U, Widom, J. Flexible Time Management in Data Stream Systems. PODS 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Srivastava, U, Widom, J. Memory-Limited Execution of Windowed Stream Joins. VLDB 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. StreamSQL. http://www.streamsql.org.Google ScholarGoogle Scholar
  28. Tucker, P. Punctuated Data Streams. Doctoral Dissertation. Oregon Health & Science University, Portland, OR, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Tucker, P., et al. Exploiting Punctuation Semantics in Continuous Data Streams. IEEE Trans. on Knowledge and Data Engineering, 15(3), May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Urhan, T., Franklin, M. J. XJoin: A Reactively-Scheduled Pipelined Join Operator. IEEE Data Eng. Bull. 23(2), 2000.Google ScholarGoogle Scholar
  31. Urban, T. and Franklin, M. J. Dynamic Pipeline Scheduling for Improving Interactive Query Performance. VLDB 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Viglas, S., Naughton, J. F., Burger, J. Maximizing the Output Rate of Multi-Way Join Queries over Streaming Information Sources. VLDB 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Out-of-order processing: a new architecture for high-performance stream systems

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader