skip to main content
10.1145/2517349.2522737acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article
Open Access

Discretized streams: fault-tolerant streaming computation at scale

Published:03 November 2013Publication History

ABSTRACT

Many "big data" applications must act on data in real time. Running these applications at ever-larger scales requires parallel platforms that automatically handle faults and stragglers. Unfortunately, current distributed stream processing models provide fault recovery in an expensive manner, requiring hot replication or long recovery times, and do not handle stragglers. We propose a new processing model, discretized streams (D-Streams), that overcomes these challenges. D-Streams enable a parallel recovery mechanism that improves efficiency over traditional replication and backup schemes, and tolerates stragglers. We show that they support a rich set of operators while attaining high per-node throughput similar to single-node systems, linear scaling to 100 nodes, sub-second latency, and sub-second fault recovery. Finally, D-Streams can easily be composed with batch and interactive query models like MapReduce, enabling rich applications that combine these modes. We implement D-Streams in a system called Spark Streaming.

Skip Supplemental Material Section

Supplemental Material

d3-04-tathagata-das.mp4

mp4

1.3 GB

References

  1. T. Akidau, A. Balikov, K. Bekiroglu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. MillWheel: Fault-tolerant stream processing at internet scale. In VLDB, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. H. Ali, C. Gerea, B. S. Raman, B. Sezgin, T. Tarnavski, T. Verona, P. Wang, P. Zabback, A. Ananthanarayan, A. Kirilov, M. Lu, A. Raizman, R. Krishnan, R. Schindlauer, T. Grabs, S. Bjeletich, B. Chandramouli, J. Goldstein, S. Bhat, Y. Li, V. Di Nicola, X. Wang, D. Maier, S. Grell, O. Nano, and I. Santos. Microsoft CEP server and online behavioral targeting. Proc. VLDB Endow., 2(2):1558, Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Apache Flume. http://incubator.apache.org/flume/.Google ScholarGoogle Scholar
  4. A. Arasu, B. Babcock, S. Babu, M. Datar, K. Ito, I. Nishizawa, J. Rosenstein, and J. Widom. STREAM: The Stanford stream data management system. SIGMOD 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Balazinska, H. Balakrishnan, S. R. Madden, and M. Stonebraker. Fault-tolerance in the Borealis distributed stream processing system. ACM Trans. Database Syst., 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Bhatotia, A. Wieder, R. Rodrigues, U. A. Acar, and R. Pasquin. Incoop: MapReduce for incremental computations. In SOCC '11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams: a new class of data management applications. In VLDB '02, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. Shah. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR, 2003.Google ScholarGoogle Scholar
  9. M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, and S. B. Zdonik. Scalable distributed stream processing. In CIDR, 2003.Google ScholarGoogle Scholar
  10. T. Condie, N. Conway, P. Alvaro, and J. M. Hellerstein. MapReduce online. NSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. EsperTech. Performance-related information. http://esper.codehaus.org/esper/performance/performance.html, Retrieved March 2013.Google ScholarGoogle Scholar
  13. EsperTech. Tutorial. http://esper.codehaus.org/tutorials/tutorial/tutorial.html, Retrieved March 2013.Google ScholarGoogle Scholar
  14. M. Franklin, S. Krishnamurthy, N. Conway, A. Li, A. Russakovsky, and N. Thombre. Continuous analytics: Rethinking query processing in a network-effect world. CIDR, 2009.Google ScholarGoogle Scholar
  15. S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proceedings of SOSP '03, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Hammerbacher. Who is using flume in production? http://www.quora.com/Flume/Who-is-using-Flume-in-production/answer/Jeff-Hammerbacher.Google ScholarGoogle Scholar
  17. B. He, M. Yang, Z. Guo, R. Chen, B. Su, W. Lin, and L. Zhou. Comet: batched stream processing for data intensive distributed computing. In SoCC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Hunter, T. Moldovan, M. Zaharia, S. Merzgui, J. Ma, M. J. Franklin, P. Abbeel, and A. M. Bayen. Scaling the Mobile Millennium system in the cloud. In SOCC '11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J.-H. Hwang, M. Balazinska, A. Rasin, U. Cetintemel, M. Stonebraker, and S. Zdonik. High-availability algorithms for distributed stream processing. In ICDE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. hyon Hwang, Y. Xing, and S. Zdonik. A cooperative, self-configuring high-availability solution for stream processing. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  21. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys 07, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Krishnamurthy, M. Franklin, J. Davis, D. Farina, P. Golovko, A. Li, and N. Thombre. Continuous analytics over discontinuous streams. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. D. Logothetis, C. Olston, B. Reed, K. C. Webb, and K. Yocum. Stateful bulk processing for incremental analytics. SoCC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Logothetis, C. Trezzo, K. C. Webb, and K. Yocum. In-situ MapReduce for log processing. In USENIX ATC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Marz. Trident: a high-level abstraction for realtime computation. http://engineering.twitter.com/2012/08/trident-high-level-abstraction-for.html.Google ScholarGoogle Scholar
  26. F. McSherry, D. G. Murray, R. Isaacs, and M. Isard. Differential dataflow. In Conference on Innovative Data Systems Research (CIDR), 2013.Google ScholarGoogle Scholar
  27. D. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: A timely dataflow system. In SOSP '13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In Intl. Workshop on Knowledge Discovery Using Cloud and Distributed Computing Platforms (KDCloud), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Ongaro, S. M. Rumble, R. Stutsman, J. K. Ousterhout, and M. Rosenblum. Fast crash recovery in RAMCloud. In SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Oracle. Oracle complex event processing performance. http://www.oracle.com/technetwork/middleware/complex-event-processing/overview/cepperformancewhitepaper-128060.pdf, 2008.Google ScholarGoogle Scholar
  31. D. Peng and F. Dabek. Large-scale incremental processing using distributed transactions and notifications. In OSDI 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Z. Qian, Y. He, C. Su, Z. Wu, H. Zhu, T. Zhang, L. Zhou, Y. Yu, and Z. Zhang. Timestream: Reliable stream computation in the cloud. In EuroSys '13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Shah, J. Hellerstein, and E. Brewer. Highly available, fault-tolerant, parallel dataflows. SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Z. Shao. Real-time analytics at Facebook. XLDB 2011, http://www-conf.slac.stanford.edu/xldb2011/talks/xldb2011_tue_0940_facebookrealtimeanalytics.pdf.Google ScholarGoogle Scholar
  35. U. Srivastava and J. Widom. Flexible time management in data stream systems. In PODS, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Storm. https://github.com/nathanmarz/storm/wiki.Google ScholarGoogle Scholar
  37. Guaranteed message processing (Storm wiki). https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing.Google ScholarGoogle Scholar
  38. K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song. Design and evaluation of a real-time URL spam filtering service. In IEEE Symposium on Security and Privacy, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R. Tibbetts. Streambase performance & scalability characterization. http://www.streambase.com/wp-content/uploads/downloads/StreamBase_White_Paper_Performance_and_Scalability_Characterization.pdf, 2009.Google ScholarGoogle Scholar
  40. H. Wang, L.-S. Peh, E. Koukoumidis, S. Tao, and M. C. Chan. Meteor shower: A reliable stream processing system for commodity data centers. In IPDPS '12, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI '08, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
    November 2013
    498 pages
    ISBN:9781450323888
    DOI:10.1145/2517349

    Copyright © 2013 Owner/Author

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 3 November 2013

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate131of716submissions,18%

    Upcoming Conference

    SOSP '24

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader