ABSTRACT
In recent years, new highly scalable storage systems have significantly contributed to the success of Cloud Computing. Systems like Dynamo or Bigtable have underpinned their ability to handle tremendous amounts of data and scale to a very large number of nodes. Although these systems are designed the store data, the fundamental architectural properties and the techniques used (e.g., request routing, replication and load balancing) can also be applied to data streaming systems. In this paper, we present Stormy, a distributed stream processing service for continuous data processing. Stormy is based on proven techniques from existing Cloud storage systems that are adapted to efficiently execute streaming workloads. The primary design focus lies in providing a scalable, elastic, and fault-tolerant framework for continuous data processing, while at the same time optimizing resource utilization and increasing cost efficiency. Stormy is able to process any kind of streaming workloads, thus, covering a wide range of use cases ranging from realtime data analytics to long-term data aggregation jobs.
- D. J. Abadi, Y. Ahmad, M. Balazinska, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, E. Rasin, E. Ryvkina, N. Tatbul, Y. Xing, and S. Zdonik. The design of the Borealis stream processing engine. In CIDR '05, pages 277--289, 2005.Google Scholar
- Apache Zookeeper. http://zookeeper.apache.org/.Google Scholar
- A. Arasu, M. Cherniack, E. Galvez, D. Maier, A. S. Maskey, E. Ryvkina, M. Stonebraker, and R. Tibbetts. Linear Road: A stream data management benchmark. In VLDB '04, pages 480--491, 2004. Google ScholarDigital Library
- H. Balakrishnan, M. F. Kaashoek, D. Karger, R. Morris, and I. Stoica. Looking up data in P2P systems. CACM, 46:43--48, February 2003. Google ScholarDigital Library
- M. Balazinska, H. Balakrishnan, S. R. Madden, and M. Stonebraker. Fault-tolerance in the Borealis distributed stream processing system. ACM Transactions on Database Systems, 33:1--44, March 2008. Google ScholarDigital Library
- M. Burrows. The Chubby lock service for loosely-coupled distributed systems. In OSDI '06, pages 335--350, 2006. Google ScholarDigital Library
- S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss, and M. A. Shah. TelegraphCQ: Continuous dataflow processing. In SIGMOD '03, page 668, 2003. Google ScholarDigital Library
- M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Çetintemel, Y. Xing, and S. Zdonik. Scalable distributed stream processing. In CIDR '03, pages 257--268, 2003.Google Scholar
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI '04, pages 137--150, 2004. Google ScholarDigital Library
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In SOSP '07, pages 205--220, 2007. Google ScholarDigital Library
- M. Grinev, M. Grineva, M. Hentschel, and D. Kossmann. Analytics for the real-time web. PVLDB, 4(12):1391--1394, September 2011.Google Scholar
- V. Gulisano, R. Jimenez-Peris, M. Patino-Martinez, and P. Valduriez. StreamCloud: A large scale data streaming system. In ICDCS '10, pages 126--137, 2010. Google ScholarDigital Library
- M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys '07, pages 59--72, 2007. Google ScholarDigital Library
- D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In STOC '97, pages 654--663, 1997. Google ScholarDigital Library
- D. Kossmann, T. Kraska, S. Loesing, S. Merkli, R. Mittal, and F. Pfaffhauser. Cloudy: A modular cloud storage system. PVLDB, 3:1533--1536, September 2010. Google ScholarDigital Library
- A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Operating Systems Review, 44:35--40, 2010. Google ScholarDigital Library
- L. Lamport. The part-time parliament. ACM Transactions on Computer Systems, 16:133--169, May 1998. Google ScholarDigital Library
- N. Marz. A Storm is coming. http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html, August 2011.Google Scholar
- F. McSherry, R. Isaacs, M. Isard, and D. G. Murray. Naiad: The animating spirit of rivers and streams. In SOSP '11, 2011.Google Scholar
- Microsoft SQL Server StreamInsight. http://www.microsoft.com/sqlserver/en/us/solutions-technologies/business-intelligence/complex-event-processing.aspx.Google Scholar
- MXQuery: A lightweight, full-featured XQuery engine. http://mxquery.org/.Google Scholar
- D. Peng and F. Dabek. Large-scale incremental processing using distributed transactions and notifications. In OSDI '10, pages 251--264, 2010. Google ScholarDigital Library
- M. Shah, J. Hellerstein, S. Chandrasekaran, and M. Franklin. Flux: An adaptive partitioning operator for continuous query systems. In ICDE '03, pages 25--36, 2003.Google ScholarCross Ref
- I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. In SIGCOMM '01, pages 149--160, 2001. Google ScholarDigital Library
- StreamBase Systems, Inc. http://www.streambase.com/.Google Scholar
- N. Tatbul, U. Çetintemel, and S. Zdonik. Staying FIT: efficient load shedding techniques for distributed stream processing. In VLDB '07, pages 159--170, 2007. Google ScholarDigital Library
- Y. Xing, S. Zdonik, and J.-H. Hwang. Dynamic load distribution in the Borealis stream processor. In ICDE '05, pages 791--802, 2005. Google ScholarDigital Library
- Stormy: an elastic and highly available streaming service in the cloud
Recommendations
A dark and stormy night: Reallocation storms in edge computing
AbstractEfficient resource usage in edge computing requires clever allocation of the workload of application components. In this paper, we show that under certain circumstances, the number of superfluous workload reallocations from one edge server to ...
Study On Purchase Intention In Different Live Streaming Scenarios Based On Experimental Approach
ICEBI '22: Proceedings of the 2022 6th International Conference on E-Business and InternetLive streaming e-commerce has exploded recently. While the live streaming traffic is dominated by the top live streamers, merchants and ordinary live streamers attempt to establish self-operating live streaming, but the number of fans and sales ...
MedSMan: a live multimedia stream querying system
Querying live media streams is a challenging problem that is becoming an essential requirement in a growing number of applications. Research in multimedia information systems has addressed and made good progress in dealing with archived data. Meanwhile, ...
Comments