skip to main content
10.1145/3225058.3225120acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Dual-Paradigm Stream Processing

Authors Info & Claims
Published:13 August 2018Publication History

ABSTRACT

Existing stream processing frameworks operate either under data stream paradigm processing data record by record to favor low latency, or under operation stream paradigm processing data in micro-batches to desire high throughput. For complex and mutable data processing requirements, this dilemma brings the selection and deployment of stream processing frameworks into an embarrassing situation. Moreover, current data stream or operation stream paradigms cannot handle data burst efficiently, which probably results in noticeable performance degradation. This paper introduces a dual-paradigm stream processing, called DO (Data and Operation) that can adapt to stream data volatility. It enables data to be processed in micro-batches (i.e., operation stream) when data burst occurs to achieve high throughput, while data is processed record by record (i.e., data stream) in the remaining time to sustain low latency. DO embraces a method to detect data bursts, identify the main operations affected by the data burst and switch paradigms accordingly. Our insight behind DO's design is that the trade-off between latency and throughput of stream processing frameworks can be dynamically achieved according to data communication among operations in a fine-grained manner (i.e., operation level) instead of framework level. We implement a prototype stream processing framework that adopts DO. Our experimental results show that our framework with DO can achieve 5x speedup over operation stream under low data stream sizes, and outperforms data stream on throughput by 2.1x to 3.2x under data burst.

References

  1. Tyler Akidau, Alex Balikov, Kaya Bekiroğlu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: Fault-tolerant Stream Processing at Internet Scale. Proceedings of the VLDB Endowment. 6, 11 (Aug. 2013), 1033--1044. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni. 2013. Adaptive Online Scheduling in Storm. In Proceedings of the 7th ACM International Conference on Distributed Event-based Systems (DEBS '13). ACM, 207--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brian Babcock, Shivnath Babu, Rajeev Motwani, and Mayur Datar. 2003. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD '03). ACM, 253--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brian Babcock, Mayur Datar, and Rajeev Motwani. 2003. Load shedding techniques for data stream systems. In Proceedings of the 2003 Management and Processing of Data Streams Workshop (MPDS '03), Vol. 577. ACM.Google ScholarGoogle Scholar
  5. Yahoo! Streaming Benchmark. 2015. https://yahooeng.tumblr.com/post/135321837876/Google ScholarGoogle Scholar
  6. Pramod Bhatotia, Alexander Wieder, Rodrigo Rodrigues, Umut A. Acar, and Rafael Pasquin. 2011. Incoop: MapReduce for Incremental Computations. In Proceedings of the 2011 ACM Symposium on Cloud Computing (SOCC '11). ACM, Article 7, 14 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce Online. In Proceedings of the 7th USENIX Symposium on Networked Systems Design and Implementation (NSDI '10). USENIX, 313--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tathagata Das, Yuan Zhong, Ion Stoica, and Scott Shenker. 2014. Adaptive Stream Processing Using Dynamic Batch Sizing. In Proceedings of the 2014 ACM Symposium on Cloud Computing (SOCC '14). ACM, Article 16, 13 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Structured Streaming Programming Guide. 2018. http://spark.apache.org/docs/latest/structured-streaming-programming-guide.htmlGoogle ScholarGoogle Scholar
  10. Martin Hirzel, Robert Soulé, Scott Schneider, Buğra Gedik, and Robert Grimm. 2014. A Catalog of Stream Processing Optimizations. ACM Comput. Surv. 46, 4, Article 46 (March2014), 34 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. JStorm. 2015. http://jstorm.io/Google ScholarGoogle Scholar
  12. Asterios Katsifodimos and Sebastian Schelter. 2016. Apache Flink: Stream Analytics at Scale. In Proceedings of the 2016 IEEE International Conference on Cloud Engineering Workshop (IC2EW '16). IEEE, 193--193.Google ScholarGoogle ScholarCross RefCross Ref
  13. Wilhelm Kleiminger, Evangelia Kalyvianaki, and Peter Pietzuch. 2011. Balancing load in stream processing with the cloud. In Proceedings of the 27th IEEE International Conference on Data Engineering Workshops (ICDEW '11). IEEE, 16--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream Processing at Scale. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). ACM, 239--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Julie Letchner, Christopher Ré, Magdalena Balazinska, and Matthai Philipose. 2010. Approximation trade-offs in Markovian stream processing: An empirical study. In Proceedings of the 26th IEEE International Conference on Data Engineering (ICDE '10). IEEE, 936--939.Google ScholarGoogle ScholarCross RefCross Ref
  16. Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: A Timely Dataflow System. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP '13). ACM, 439--455. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Leonardo Neumeyer, Bruce Robbins, Anish Nair, and Anand Kesari. 2010. S4: Distributed Stream Computing Platform. In Proceedings of the 2010 IEEE International Conference on Data Mining Workshops (ICDMW '10). IEEE, 170--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Christopher Olston, Greg Chiou, Laukik Chitnis, Francis Liu, Yiping Han, Mattias Larsson, Andreas Neumann, Vellanki B.N. Rao, Vijayanand Sankarasubramanian, Siddharth Seth, Chao Tian, Topher ZiCornell, and Xiaodan Wang. 2011. Nova: Continuous Pig/Hadoop Workflows. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11). ACM, 1081--1090. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Samza. 2015. http://samza.apache.org/Google ScholarGoogle Scholar
  20. Michael Stonebraker, Uğur Çetintemel, and Stan Zdonik. 2005. The 8 Requirements of Real-time Stream Processing. SIGMOD Rec. 34, 4 (Dec. 2005), 42--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jaspar Subhlok and Gary Vondran. 1996. Optimal Latency-throughput Tradeoffs for Data Parallel Pipelines. In Proceedings of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA '96). ACM, 62--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. 2014. Storm@twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). ACM, 147--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Trident. 2015. http://storm.apache.org/releases/1.1.0/Trident-tutorial.htmlGoogle ScholarGoogle Scholar
  24. Shivaram Venkataraman, Aurojit Panda, Kay Ousterhout, Michael Armbrust, Ali Ghodsi, Michael J. Franklin, Benjamin Recht, and Ion Stoica. 2017. Drizzle: Fast and Adaptable Stream Processing at Scale. In Proceedings of the 26th ACM Symposium on Operating Systems Principles (SOSP '17). ACM, 374--389. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Naga Vydyanathan, Umit Catalyurek, Tahsin Kurc, Ponnuswamy Sadayappan, and Joel Saltz. 2011. Optimizing latency and throughput of application workflows on clusters. Parallel Comput. 37, 10 (2011), 694--712. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Walter Willinger, Murad S. Taqqu, and Ashok Erramilli. 1996. A Bibliographical Guide to Self-Similar Traffic and Performance Modeling for Modern High-Speed Networks. Stochastic Networks Theory and Applications (1996), 339--366.Google ScholarGoogle Scholar
  27. Jielong Xu, Zhenhua Chen, Jian Tang, and Sen Su. 2014. T-Storm: Traffic-Aware Online Scheduling in Storm. In Proceedings of the 34th IEEE International Conference on Distributed Computing Systems (ICDCS '14). IEEE, 535--544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized Streams: Fault-tolerant Streaming Computation at Scale. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP '13). ACM, 423--438. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICPP '18: Proceedings of the 47th International Conference on Parallel Processing
    August 2018
    945 pages
    ISBN:9781450365109
    DOI:10.1145/3225058

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 August 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    ICPP '18 Paper Acceptance Rate91of313submissions,29%Overall Acceptance Rate91of313submissions,29%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader