article

MillWheel: fault-tolerant stream processing at internet scale

Authors:
Tyler Akidau

Google

Google
View Profile

,
Alex Balikov

Google

Google
View Profile

,
Kaya Bekiroğlu

Google

Google
View Profile

,
Slava Chernyak

Google

Google
View Profile

,
Josh Haberman

Google

Google
View Profile

,
Reuven Lax

Google

Google
View Profile

,
Sam McVeety

Google

Google
View Profile

,
Daniel Mills

Google

Google
View Profile

,
Paul Nordstrom

Google

Google
View Profile

,
Sam Whittle

Google

Google
View Profile

Proceedings of the VLDB Endowment Volume 6 Issue 11pp 1033–1044https://doi.org/10.14778/2536222.2536229

Published:01 August 2013Publication History

Proceedings of the VLDB Endowment

Abstract

MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework's fault-tolerance guarantees.

This paper describes MillWheel's programming model as well as its implementation. The case study of a continuous anomaly detector in use at Google serves to motivate how many of MillWheel's features are used. MillWheel's programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. In practice, we find that MillWheel's unique combination of scalability, fault tolerance, and a versatile programming model lends itself to a wide variety of problems at Google.

References

D. J. Abadi, Y. Ahmad, M. Balazinska, M. Cherniack, J. hyon Hwang, W. Lindner, A. S. Maskey, E. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the borealis stream processing engine. In In CIDR, pages 277-289, 2005.Google Scholar
D. J. Abadi, D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: a new model and architecture for data stream management. The VLDB Journal, 12(2):120-139, 2003. Google Scholar
A. Adya, J. Dunagan, and A. Wolman. Centrifuge: Integrated lease management and partitioning for cloud services. In NSDI, pages 1-16. USENIX Association, 2010. Google Scholar
Apache. Apache hadoop. http://hadoop.apache.org, 2012.Google Scholar
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 1-16. ACM, 2002. Google Scholar
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss, and M. A. Shah. Telegraphcq: continuous dataflow processing. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pages 668-668. ACM, 2003. Google Scholar
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26:4:1-4:26, June 2008. Google Scholar
T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, and R. Sears. Mapreduce online. Technical report, University of California, Berkeley, 2009.Google Scholar
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, et al. Spanner: Googles globally-distributed database. To appear in Proceedings of OSDI, page 1, 2012. Google Scholar
C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, and O. Spatscheck. Gigascope: High performance network monitoring with an sql interface. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pages 623-623. ACM, 2002. Google Scholar
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51:107-113, Jan. 2008. Google Scholar
E. Deelman and B. K. Szymanski. Continuously monitored global virtual time. Technical report, in Intern. Conf. Parallel and Distributed Processing Techniques and Applications, Las Vegas, NV, 1996.Google Scholar
Google. Protocol buffers. http://code.google.com/p/protobuf/, 2012.Google Scholar
J.-H. Hwang, M. Balazinska, A. Rasin, U. Cetintemel, M. Stonebraker, and S. Zdonik. High-availability algorithms for distributed stream processing. In Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on, pages 779-790. IEEE, 2005. Google Scholar
D. R. Jefferson. Virtual time. ACM Transactions on Programming Languages and Systems, 7:404-425, 1985. Google Scholar
T. Johnson, S. Muthukrishnan, V. Shkapenyuk, and O. Spatscheck. A heartbeat mechanism and its application in gigascope. In Proceedings of the 31st international conference on Very large data bases, pages 1079-1088. VLDB Endowment, 2005. Google Scholar
Y. Kwon, M. Balazinska, and A. Greenberg. Fault-tolerant stream processing using a distributed, replicated file system. Proceedings of the VLDB Endowment, 1(1):574-585, 2008. Google Scholar
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558-565, July 1978. Google Scholar
J. Li, K. Tufte, V. Shkapenyuk, V. Papadimos, T. Johnson, and D. Maier. Out-of-order processing: a new architecture for high-performance stream systems. Proceedings of the VLDB Endowment, 1(1):274-288, 2008. Google Scholar
D. Logothetis, C. Olston, B. Reed, K. C. Webb, and K. Yocum. Stateful bulk processing for incremental analytics. In Proceedings of the 1st ACM symposium on Cloud computing, pages 51-62. ACM, 2010. Google Scholar
S. Madden and M. J. Franklin. Fjording the stream: An architecture for queries over streaming sensor data. In Data Engineering, 2002. Proceedings. 18th International Conference on, pages 555-566. IEEE, 2002. Google Scholar
N. Marz. Trident. https://github.com/nathanmarz/storm/wiki/Trident-tutorial, 2012.Google Scholar
N. Marz. Twitter storm. https://github.com/nathanmarz/storm/wiki, 2012.Google Scholar
R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. Query processing, resource management, and approximation in a data stream management system. Technical Report 2002-41, Stanford InfoLab, 2002.Google Scholar
D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: a universal execution engine for distributed data-flow computing. In Proceedings of the 8th USENIX conference on Networked systems design and implementation, page 9. USENIX Association, 2011. Google Scholar
L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In Data Mining Workshops (ICDMW), 2010 IEEE International Conference on, pages 170-177, dec. 2010. Google Scholar
D. Peng, F. Dabek, and G. Inc. Large-scale incremental processing using distributed transactions and notifications. In 9th USENIX Symposium on Operating Systems Design and Implementation, 2010. Google Scholar
M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, and M. J. Franklin. Flux: An adaptive partitioning operator for continuous query systems. In Data Engineering, 2003. Proceedings. 19th International Conference on, pages 25-36. IEEE, 2003.Google Scholar
U. Srivastava and J. Widom. Flexible time management in data stream systems. In Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 263-274. ACM, 2004. Google Scholar
M. Stonebraker, U. Çetintemel, and S. Zdonik. The 8 requirements of real-time stream processing. ACM SIGMOD Record, 34(4):42-47, 2005. Google Scholar
P. A. Tucker, D. Maier, T. Sheard, and L. Fegaras. Exploiting punctuation semantics in continuous data streams. Knowledge and Data Engineering, IEEE Transactions on, 15(3):555-568, 2003. Google Scholar
F. Yang, Z. Qian, X. Chen, I. Beschastnikh, L. Zhuang, L. Zhou, and J. Shen. Sonora: A platform for continuous mobile-cloud computing. Technical report, Technical Report. Microsoft Research Asia.Google Scholar
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, 2011. Google Scholar
M. Zaharia, T. Das, H. Li, S. Shenker, and I. Stoica. Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters. In Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing, pages 10-10. USENIX Association, 2012. Google Scholar

Index Terms

MillWheel: fault-tolerant stream processing at internet scale
1. Computer systems organization

Recommendations

An Explorative Study of the Mobile App Ecosystem from App Developers' Perspective
WWW '17: Proceedings of the 26th International Conference on World Wide Web

With the prevalence of smartphones, app markets such as Apple App Store and Google Play has become the center stage in the mobile app ecosystem, with millions of apps developed by tens of thousands of app developers in each major market. This paper ...
Read More
A Measurement-based Study on Application Popularity in Android and iOS App Stores
Mobidata '15: Proceedings of the 2015 Workshop on Mobile Big Data

Mobile application stores (appstores) are emerging digital distribution platforms with explosive growth. Although there have been some observations on the mobile application (app) popularity in Android appstores, there is no report on the app popularity ...
Read More
Apple iPhone Encyclopedia - iPhone 8 Inside-Out Features: Comprehensive Inside out features of iPhone 8 and what to Expect in iPhone 8 plus
Read More

Reviews

Reviewer: Peng Li

In this paper, the authors introduce MillWheel, a framework for processing real-time streamed data that is used at Google. It is based on the paradigm of streaming computing, in which users specify a computation graph and provide code for individual nodes. After the computation starts, the application receives continuous data and processes it in a real-time fashion to reduce latency. MillWheel targets latency-sensitive applications, such as the detection of query hikes or query dips, and the delivery of ads. The MillWheel model features persistent storage, low watermarks, and duplicate prevention. The persistent storage stores the state of each node so that if a node fails, the system can resume the computation without losing data by reading the stored state. The low watermark subsystem tracks the timestamps of processed data at each node, providing the basis for fault tolerance and duplication prevention. With those features, MillWheel is capable of processing continuous data streams that consist of data tokens with keys and timestamps. If the user wants to trigger a specific event at a future time, he or she can use the provided timer mechanism (which is optional) to register the event. Fault tolerance is a key feature of MillWheel. Since the framework might run on thousands of nodes continuously, the chance of node failure is high. When a node fails, it is required that computed results are saved; if the node keeps state, the state can be resumed so that future computations are still correct. After the node restarts from the failure, duplication of computed results should also be prevented. To achieve these goals, MillWheel ensures exactly once delivery with the help of persistent storage and data acknowledgment. For stateful computations (in which repeated computations with the same input may have different results), MillWheel provides a mechanism called “strong productions” to checkpoint produced data before changing the node state. This form of fault tolerance, however, might incur unnecessary cost for stateless computations (in which computations with the same input always yield the same output). MillWheel provides an option to turn off strong productions and use weak productions to improve the performance of stateless computations. The state of computations is stored in both disks and memories. While disks provide space for huge amounts of state data, memories have a speed advantage. To ensure consistency, all state write operations are wrapped in per-key atomic operations. However, in case of work migration or node restart, there might be zombie writers and network remnants that issue stale writes. To prevent this, all write operations are assigned with a unique sequence. The user can customize the granularity of state modification operations for performance benefit according to the failure probabilities. MillWheel has been implemented on Google clusters. Streams are delivered via remote procedure call. Load distribution is handled by a replicated master. Persistent state is maintained by a database like BigTable or Spanner. Low watermarks are computed conservatively by a central authority, but interested consumers should compute the low watermark according to their own records and those of subscribed senders. Experiments show that with weak productions, the median record delay of a node is several milliseconds. With strong productions and exactly-once delivery enabled, the median delay increases to tens of milliseconds, which is still within human reaction time. Framework-level cache is also used to reduce traffic between storage layers. MillWheel has been used in various Google internal systems, such as ad delivery and image processing for Google Street View. However, the authors point out that there are applications that MillWheel does not suit well. For example, if an application cannot be parallelized well among different keys, there could be bottleneck stages that slow down the whole computation. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 6, Issue 11
August 2013
237 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2013
Published in pvldb Volume 6, Issue 11
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 186
  Total Citations
  View Citations
- 2,079
  Total Downloads
- Downloads (Last 12 months)133
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

MillWheel: fault-tolerant stream processing at internet scale

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

An Explorative Study of the Mobile App Ecosystem from App Developers' Perspective

A Measurement-based Study on Application Popularity in Android and iOS App Stores

Apple iPhone Encyclopedia - iPhone 8 Inside-Out Features: Comprehensive Inside out features of iPhone 8 and what to Expect in iPhone 8 plus

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

MillWheel: fault-tolerant stream processing at internet scale

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

An Explorative Study of the Mobile App Ecosystem from App Developers' Perspective

A Measurement-based Study on Application Popularity in Android and iOS App Stores

Apple iPhone Encyclopedia - iPhone 8 Inside-Out Features: Comprehensive Inside out features of iPhone 8 and what to Expect in iPhone 8 plus

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media