Abstract
Stream Processing (SP) has evolved as the leading paradigm to process and gain value from the high volume of streaming data produced, e.g., in the domain of the Internet of Things. An SP system is a middleware that deploys a network of operators between data sources, such as sensors, and the consuming applications. SP systems typically face intense and highly dynamic data streams. Parallelization and elasticity enable SP systems to process these streams with continuous high quality of service. The current research landscape provides a broad spectrum of methods for parallelization and elasticity in SP. Each method makes specific assumptions and focuses on particular aspects. However, the literature lacks a comprehensive overview and categorization of the state of the art in SP parallelization and elasticity, which is necessary to consolidate the state of the research and to plan future research directions on this basis. Therefore, in this survey, we study the literature and develop a classification of current methods for both parallelization and elasticity in SP systems.
- Daniel J. Abadi, Don Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. 2003. Aurora: A new model and architecture for data stream management. VLDB J. 12, 2 (Aug. 2003), 120--139. Google ScholarDigital Library
- Asaf Adi and Opher Etzion. 2004. Amit—The situation manager. VLDB J. 13, 2 (May 2004), 177--203. Google ScholarDigital Library
- Tyler Akidau, Alex Balikov, Kaya Bekiroğlu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, and Sam Whittle. 2013. MillWheel: Fault-tolerant stream processing at internet scale. Proc. VLDB Endow. 6, 11 (Aug. 2013), 1033--1044. Google ScholarDigital Library
- Elias Alevizos, Alexander Artikis, and George Paliouras. 2017. Event forecasting with pattern Markov chains. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems (DEBS’17). ACM, New York, NY, 146--157. Google ScholarDigital Library
- L. Amini, N. Jain, A. Sehgal, J. Silber, and O. Verscheure. 2006. Adaptive control of extreme-scale stream-processing systems. In Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS’06). 71--71. Google ScholarDigital Library
- H. Andrade, B. Gedik, K.-L. Wu, and P.S. Yu. 2011. Processing high data rate streams in System S. J. Parallel Distrib. Comput. 71, 2 (Feb. 2011), 145--156. Google ScholarDigital Library
- Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni. 2013. Adaptive online scheduling in storm. In Proceedings of the 7th ACM International Conference on Distributed Event-based Systems (DEBS’13). ACM, New York, NY, 207--218. Google ScholarDigital Library
- Arvind Arasu, Shivnath Babu, and Jennifer Widom. 2006. The CQL continuous query language: Semantic foundations and query execution. VLDB J. 15, 2 (June 2006), 121--142. Google ScholarDigital Library
- Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. 2010. A view of cloud computing. Commun. ACM 53, 4 (Apr. 2010), 50--58. Google ScholarDigital Library
- Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and Eli Upfal. 1999. Balanced allocations. SIAM J. Comput. 29, 1 (Sept. 1999), 180--200. Google ScholarDigital Library
- Nathan Backman, Rodrigo Fonseca, and Uǧur Çetintemel. 2012. Managing parallelism for stream processing in the cloud. In Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing (HotCDP’12). ACM, New York, NY, 1:1--1:5. Google ScholarDigital Library
- Cagri Balkesen, Nihal Dindar, Matthias Wetter, and Nesime Tatbul. 2013. Rip: Run-based intra-query parallelism for scalable complex event processing. In Proceedings of the 7th ACM International Conference on Distributed Event-based Systems. ACM, 3--14. http://dl.acm.org/citation.cfm?id=2488257 Google ScholarDigital Library
- Cagri Balkesen and Nesime Tatbul. 2011. Scalable data partitioning techniques for parallel sliding window processing over data streams. In Proceedings of the International Workshop on Data Management for Sensor Networks (DMSN’11). Retrieved from http://www.softnet.tuc.gr/dmsn11/papers/paper03.pdf.Google Scholar
- Cagri Balkesen, Nesime Tatbul, and M. Tamer Özsu. 2013. Adaptive input admission and management for parallel stream processing. In Proceedings of the 7th ACM International Conference on Distributed Event-based Systems. ACM, 15--26. Google ScholarDigital Library
- P. Basanta-Val, N. Fernández-García, L. Sánchez-Fernández, and J. Arias-Fisteus. 2017. Patterns for distributed real-time stream processing. IEEE Trans. Parallel Distrib. Syst. 28, 11 (Nov. 2017), 3243--3257.Google ScholarDigital Library
- Flavio Bonomi, Rodolfo Milito, Jiang Zhu, and Sateesh Addepalli. 2012. Fog computing and its role in the internet of things. In Proceedings of the 1st Edition of the MCC Workshop on Mobile Cloud Computing (MCC’12). ACM, New York, NY, 13--16. Google ScholarDigital Library
- Michael Borkowski, Christoph Hochreiner, and Stefan Schulte. 2018. Moderated resource elasticity for stream processing applications. In Proceedings of the Parallel Processing Workshops. Lecture Notes in Computer Science (Euro-Par’17), Dora B. Heras et al. (Ed.), Vol. 10569. Springer, Cham, 5--16.Google Scholar
- Lars Brenna, Johannes Gehrke, Mingsheng Hong, and Dag Johansen. 2009. Distributed event stream processing with non-deterministic finite automata. In Proceedings of the 3rd ACM International Conference on Distributed Event-Based Systems (DEBS’09). ACM, New York, NY, 3:1--3:12. Google ScholarDigital Library
- A. Brito, A. Martin, T. Knauth, S. Creutz, D. Becker, S. Weigert, and C. Fetzer. 2011. Scalable and low-latency data processing with stream MapReduce. In Proceedings of the IEEE 3rd International Conference on Cloud Computing Technology and Science. 48--58. Google ScholarDigital Library
- Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache flink: Stream and batch processing in a single engine. IEEE Data Eng. Bull. 38 (2015), 28--38.Google Scholar
- Valeria Cardellini, Vincenzo Grassi, Francesco Lo Presti, and Matteo Nardelli. 2016. Optimal operator placement for distributed stream processing applications. In Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems (DEBS’16). ACM, New York, NY, 69--80. Google ScholarDigital Library
- Valeria Cardellini, Vincenzo Grassi, Francesco Lo Presti, and Matteo Nardelli. 2017. Optimal operator replication and placement for distributed stream processing systems. SIGMETRICS Perform. Eval. Rev. 44, 4 (May 2017), 11--22. Google ScholarDigital Library
- Valeria Cardellini, Francesco Lo Presti, Matteo Nardelli, and Gabriele Russo Russo. 2018. Decentralized self-adaptation for elastic data stream processing. Future Generation Computer Systems 87 (Oct. 2018), 171--185.Google Scholar
- Sharma Chakravarthy and Deepak Mishra. 1994. Snoop: An expressive event specification language for active databases. Data Knowl. Eng. 14, 1 (1994), 1--26. Retrieved from http://www.sciencedirect.com/science/article/pii/0169023X9490006X. Google ScholarDigital Library
- Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel R. Madden, Fred Reiss, and Mehul A. Shah. 2003. TelegraphCQ: Continuous dataflow processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’03). ACM, New York, NY, 668--668. Google ScholarDigital Library
- Mitch Cherniack, Hari Balakrishnan, Magdalena Balazinska, Donald Carney, Ugur Cetintemel, Ying Xing, and Stanley B. Zdonik. 2003. Scalable distributed stream processing. In Proceedings of the Conference on Innovative Data Systems Research (CIDR’03), Vol. 3. 257--268. Retrieved from http://nms.csail.mit.edu/papers/CIDR_CRC.pdf.Google Scholar
- Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, Khaled Elmeleegy, and Russell Sears. 2010. MapReduce online. In Proceedings of the USENIX Conference on Networked Systems Design and Implementation (NSDI’10), Vol. 10. 20. Retrieved from http://static.usenix.org/events/nsdi10/tech/full_papers/condie.pdf. Google ScholarDigital Library
- Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, John Gerth, Justin Talbot, Khaled Elmeleegy, and Russell Sears. 2010. Online aggregation and continuous query support in MapReduce. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD’10). ACM, New York, NY, 1115--1118. Google ScholarDigital Library
- Gianpaolo Cugola and Alessandro Margara. 2010. TESLA: A formally defined event specification language. In Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems. ACM, 50--61. Retrieved from http://dl.acm.org/citation.cfm?id=1827427. Google ScholarDigital Library
- Gianpaolo Cugola and Alessandro Margara. 2012. Low latency complex event processing on parallel hardware. J. Parallel Distrib. Comput. 72, 2 (Feb. 2012), 205--218. Google ScholarDigital Library
- Gianpaolo Cugola and Alessandro Margara. 2012. Processing flows of information: From data stream to complex event processing. Comput. Surveys 44, 3 (June 2012), 1--62. Google ScholarDigital Library
- Marco Danelutto, Peter Kilpatrick, Gabriele Mencagli, and Massimo Torquati. 0. State access patterns in stream parallel computations. Int. J. High Performance Comput. Appl. 0, 0 (0), 1--12. arXiv:https://doi.org/10.1177/1094342017694134 Google ScholarDigital Library
- Marcos Dias de Assuncao, Alexandre da Silva Veith, and Rajkumar Buyya. 2017. Resource elasticity for distributed data stream processing: A survey and future directions. arXiv preprint arXiv:1709.01363 (2017).Google Scholar
- Tiziano De Matteis and Gabriele Mencagli. 2016. Keep calm and react with foresight: Strategies for low-latency and energy-efficient elastic data stream processing. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’16). ACM, New York, NY. Google ScholarDigital Library
- Tiziano De Matteis and Gabriele Mencagli. 2017. Elastic scaling for distributed latency-sensitive data stream operators. In Proceedings of the 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP’17). 61--68.Google ScholarCross Ref
- Tiziano De Matteis and Gabriele Mencagli. 2017. Parallel patterns for window-based stateful operators on data streams: An algorithmic skeleton approach. Int. J. Parallel Program. 45, 2 (Apr. 2017), 382--401. Google ScholarDigital Library
- Tiziano De Matteis and Gabriele Mencagli. 2017. Proactive elasticity and energy awareness in data stream processing. Journal of Systems and Software 127 (2017), 302--319. Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Operating Systems Design 8 Implementation—Volume 6 (OSDI’04). USENIX Association, Berkeley, CA, 10--10. http://dl.acm.org/citation.cfm?id=1251254.1251264 Google ScholarDigital Library
- Alan J. Demers, Johannes Gehrke, Biswanath Panda, Mirek Riedewald, Varun Sharma, Walker M. White, et al. 2007. Cayuga: A general purpose event monitoring system. In Proceedings of the Conference on Innovative Data Systems Research (CIDR’07), Vol. 7. 412--422. Retrieved from http://www.ccis.northeastern.edu/home/mirek/papers/2007-CIDR-CayugaImp.pdf.Google Scholar
- Y. Drougas and V. Kalogeraki. 2009. Accommodating bursts in distributed stream processing systems. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing. 1--11. Google ScholarDigital Library
- Esper. 2019. Esper. Retrieved from http://www.espertech.com/.Google Scholar
- Patrick Th. Eugster, Pascal A. Felber, Rachid Guerraoui, and Anne-Marie Kermarrec. 2003. The many faces of publish/subscribe. ACM Comput. Surv. 35, 2 (June 2003), 114--131. Google ScholarDigital Library
- R. C. Fernandez, P. Garefalakis, and P. Pietzuch. 2016. Java2SDG: Stateful big data processing for the masses. In Proceedings of the IEEE 32nd International Conference on Data Engineering (ICDE’16). 1390--1393.Google Scholar
- Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki, and Peter Pietzuch. 2013. Integrating scale out and fault tolerance in stream processing using operator state management. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’13). ACM, New York, NY, 725--736. Google ScholarDigital Library
- Raul Castro Fernandez, Matteo Migliavacca, Evangelia Kalyvianaki, and Peter Pietzuch. 2014. Making state explicit for imperative big data processing. In Proceedings of the USENIX Annual Technical Conference (USENIX-ATC’14). USENIX Association, Berkeley, CA, 49--60. http://dl.acm.org/citation.cfm?id=2643634.2643640. Google ScholarDigital Library
- L. Fischer and A. Bernstein. 2015. Workload scheduling in distributed stream processors using graph partitioning. In Proceedings of the IEEE International Conference on Big Data (BigData’15). 124--133. Google ScholarDigital Library
- Ioannis Flouris, Nikos Giatrakos, Antonios Deligiannakis, Minos Garofalakis, Michael Kamp, and Michael Mock. 2017. Issues in complex event processing: Status and prospects in the Big Data era. J. Syst. Softw. 127, Supplement C (2017), 217--236. Google ScholarDigital Library
- Apache Software Foundation. 2015. Apache Storm Project Website. Retrieved from http://storm.apache.org.Google Scholar
- The Apache Software Foundation. 2019. Apache Flink Project Website. Retrieved from http://flink.apache.org/.Google Scholar
- Buğra Gedik. 2014. Generic windowing support for extensible stream processing systems. Software: Pract. Exper. 44, 9 (2014), 1105--1128. Google ScholarDigital Library
- Buğra Gedik. 2014. Partitioning functions for stateful data parallelism in stream processing. VLDB J. 23, 4 (Aug. 2014), 517--539. Google ScholarDigital Library
- B. Gedik, S. Schneider, M. Hirzel, and K. L. Wu. 2014. Elastic scaling for data stream processing. IEEE Trans. Parallel Distrib. Syst. 25, 6 (June 2014), 1447--1463. Google ScholarDigital Library
- B. Gedik, H. G. Özsema, and Ö. Öztürk. 2016. Pipelined fission for stream programs with dynamic selectivity and partitioned state. J. Parallel Distrib. Comput. 96 (2016), 106--120. Google ScholarDigital Library
- Michael I. Gordon, William Thies, and Saman Amarasinghe. 2006. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). ACM, New York, NY, 151--162. Google ScholarDigital Library
- Michael I. Gordon, William Thies, Michal Karczmarek, Jasper Lin, Ali S. Meli, Andrew A. Lamb, Chris Leger, Jeremy Wong, Henry Hoffmann, David Maze, and Saman Amarasinghe. 2002. A stream compiler for communication-exposed architectures. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS X). ACM, New York, NY, 291--303. Google ScholarDigital Library
- Michael Grossniklaus, David Maier, James Miller, Sharmadha Moorthy, and Kristin Tufte. 2016. Frames: Data-driven Windows. In Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems (DEBS’16). ACM, New York, NY, 13--24. Google ScholarDigital Library
- V. Gulisano, R. Jiménez-Peris, M. Patiño-Martínez, C. Soriente, and P. Valduriez. 2012. StreamCloud: An elastic and scalable data streaming system. IEEE Trans. Parallel Distrib. Syst. 23, 12 (Dec. 2012), 2351--2365. Google ScholarDigital Library
- Thomas Heinze, Leonardo Aniello, Leonardo Querzoni, and Zbigniew Jerzak. 2014. Cloud-based data stream processing. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems (DEBS’14). ACM, New York, NY, 238--245. Google ScholarDigital Library
- Thomas Heinze, Zbigniew Jerzak, Gregor Hackenbroich, and Christof Fetzer. 2014. Latency-aware elastic scaling for distributed data stream processing systems. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems. ACM, 13--22. Google ScholarDigital Library
- Thomas Heinze, Yuanzhen Ji, Yinying Pan, Franz Josef Grueneberger, Zbigniew Jerzak, and Christof Fetzer. 2013. Elastic complex event processing under varying query load. In Proceedings of the 1st International Workshop on Big Dynamic Distributed Data (BD3’13). 25.Google Scholar
- Thomas Heinze, Valerio Pappalardo, Zbigniew Jerzak, and Christof Fetzer. 2014. Auto-scaling techniques for elastic data stream processing. In Proceedings of the IEEE 30th International Conference on Data Engineering Workshops (ICDEW’14). IEEE, 296--302.Google Scholar
- Thomas Heinze, Lars Roediger, Andreas Meister, Yuanzhen Ji, Zbigniew Jerzak, and Christof Fetzer. 2015. Online parameter optimization for elastic data stream processing. In Proceedings of the 6th ACM Symposium on Cloud Computing (SoCC’15). ACM, New York, NY, 276--287. Google ScholarDigital Library
- Nicolas Hidalgo, Daniel Wladdimiro, and Erika Rosas. 2017. Self-adaptive processing graph with operator fission for elastic stream processing. J. Syst. Softw. 127 (2017), 205--216. Google ScholarDigital Library
- Martin Hirzel. 2012. Partition and compose: Parallel complex event processing. In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (DEBS’12). ACM, New York, NY, 191--200. Google ScholarDigital Library
- Martin Hirzel, Scott Schneider, and Bugra Gedik. 2014. SPL: An Extensible Language for Distributed Stream Processing. Technical Report. Research Report RC25486, IBM. Retrieved from http://hirzels.com/martin/papers/tr14-rc25486-spl.pdf.Google Scholar
- Martin Hirzel, Scott Schneider, and Kanat Tangwongsan. 2017. Sliding-window aggregation algorithms: Tutorial. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems (DEBS’17). ACM, New York, NY, 11--14. Google ScholarDigital Library
- Martin Hirzel, Robert Soulé, Scott Schneider, Buğra Gedik, and Robert Grimm. 2014. A catalog of stream processing optimizations. ACM Comput. Surv. 46, 4 (Mar. 2014), 46:1--46:34. Google ScholarDigital Library
- C. Hochreiner, M. Vogler, P. Waibel, and S. Dustdar. 2016. VISP: An ecosystem for elastic data stream processing for the internet of things. In Proceedings of the IEEE 20th International Enterprise Distributed Object Computing Conference.1--11.Google Scholar
- C. Hochreiner, M. Vögler, S. Schulte, and S. Dustdar. 2016. Elastic stream processing for the internet of things. In Proceedings of the IEEE 9th International Conference on Cloud Computing (CLOUD’16). 100--107.Google Scholar
- Waldemar Hummer, Benjamin Satzger, and Schahram Dustdar. 2013. Elastic stream processing in the cloud. Wiley Interdisc. Rev.: Data Min. Knowl. Discov. 3, 5 (Sept. 2013), 333--345. Google ScholarDigital Library
- E. Kalyvianaki, W. Wiesemann, Q. H. Vu, D. Kuhn, and P. Pietzuch. 2011. SQPR: Stream query planning with reuse. In Proceedings of the IEEE 27th International Conference on Data Engineering. 840--851. Google ScholarDigital Library
- David Karger, Eric Lehman, Tom Leighton, Rina Panigrahy, Matthew Levine, and Daniel Lewin. 1997. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC’97). ACM, New York, NY, 654--663. Google ScholarDigital Library
- Nikos R. Katsipoulakis, Alexandros Labrinidis, and Panos K. Chrysanthis. 2017. A holistic view of stream partitioning costs. Proc. VLDB Endow. 10, 11 (Aug. 2017), 1286--1297. Google ScholarDigital Library
- J. O. Kephart and D. M. Chess. 2003. The vision of autonomic computing. Computer 36, 1 (Jan. 2003), 41--50. Google ScholarDigital Library
- Rohit Khandekar, Kirsten Hildrum, Sujay Parekh, Deepak Rajan, Joel Wolf, Kun-Lung Wu, Henrique Andrade, and Bugra Gedik. 2009. COLA: Optimizing stream processing applications via graph partitioning. In Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware (Middleware’09). Springer-Verlag New York, Inc., New York, NY. http://dl.acm.org/citation.cfm?id=1656980.1657002 Google ScholarDigital Library
- J. F. C. Kingman. 1961. The single server queue in heavy traffic. In Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 57. Cambridge University Press, 902--904.Google ScholarCross Ref
- Thomas Kohler, Ruben Mayer, Frank Dürr, Marius Maaß, Sukanya Bhowmik, and Kurt Rothermel. 2018. P4CEP: Towards in-network complex event processing. In Proceedings of the 2018 Morning Workshop on In-Network Computing (NetCompute’18). ACM, New York, NY, 33--38. Google ScholarDigital Library
- Alexandros Koliousis, Matthias Weidlich, Raul Castro Fernandez, Alexander L. Wolf, Paolo Costa, and Peter Pietzuch. 2016. SABER: Window-based hybrid stream processing for heterogeneous architectures. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD’16). ACM, New York, NY, 555--569. Google ScholarDigital Library
- Roland Kotto Kombi, Nicolas Lumineau, and Philippe Lamarre. 2017. A preventive auto-parallelization approach for elastic stream processing. In Proceedings of the IEEE 37th International Conference on Distributed Computing Systems (ICDCS’17). IEEE, 1532--1542.Google ScholarCross Ref
- Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel, Karthik Ramasamy, and Siddarth Taneja. 2015. Twitter Heron: Stream processing at scale. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’15). ACM, New York, NY, 239--250. Google ScholarDigital Library
- A. Kumbhare et al. 2015. Fault-tolerant and elastic streaming MapReduce with decentralized coordination. In Proceedings of the IEEE 35th International Conference on Distributed Computing Systems. 328--338.Google ScholarCross Ref
- A. G. Kumbhare, Y. Simmhan, and V. K. Prasanna. 2014. PLAStiCC: Predictive look-ahead scheduling for continuous dataflows on clouds. In Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. 344--353.Google Scholar
- A. G. Kumbhare, Y. Simmhan, and V. K. Prasanna. 2014. PLAStiCC: Predictive look-ahead scheduling for continuous dataflows on clouds. In Proceedings of the 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’14), Vol. 00. 344--353.Google Scholar
- Wang Lam, Lu Liu, Sts Prasad, Anand Rajaraman, Zoheb Vacheri, and AnHai Doan. 2012. Muppet: MapReduce-style processing of fast data. Proc. VLDB Endow. 5, 12 (Aug. 2012), 1814--1825. Google ScholarDigital Library
- Danh Le-Phuoc, Hoan Nguyen Mau Quoc, Chan Le Van, and Manfred Hauswirth. 2013. Elastic and scalable processing of linked stream data in the cloud. In Proceedings of the 12th International Semantic Web Conference—Part I (ISWC’13). Springer-Verlag, New York, NY, 280--297. Google ScholarDigital Library
- Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, and Peter A. Tucker. 2005. No pane, no gain: Efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Rec. 34, 1 (2005), 39--44. http://dl.acm.org/citation.cfm?id=1058158 Google ScholarDigital Library
- Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, and Peter A. Tucker. 2005. Semantics and evaluation techniques for window aggregates in data streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’05). ACM, New York, NY, 311--322. Google ScholarDigital Library
- Xunyun Liu and Rajkumar Buyya. 2017. D-storm: Dynamic resource-efficient scheduling of stream processing applications. In Procedings of the IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS’17). IEEE, 485--492.Google ScholarCross Ref
- Xunyun Liu and Rajkumar Buyya. 2017. Performance-oriented deployment of streaming applications on cloud. IEEE Trans. Big Data 4, 1 (2017), 46--51.Google Scholar
- B. Lohrmann, P. Janacik, and O. Kao. 2015. Elastic stream processing with latency guarantees. In Proceedings of the IEEE 35th International Conference on Distributed Computing Systems. 399--410.Google Scholar
- Björn Lohrmann, Daniel Warneke, and Odej Kao. 2014. Nephele streaming: Stream processing under QoS constraints at scale. Cluster Comput. 17, 1 (Mar. 2014), 61--78. Google ScholarDigital Library
- Federico Lombardi, Leonardo Aniello, Silvia Bonomi, and Leonardo Querzoni. 2017. Elastic symbiotic scaling of operators and resources in stream processing systems. IEEE Trans. Parallel Distrib. Syst. 29, 3 (2017), 572--583.Google ScholarCross Ref
- Kasper Grud Skat Madsen and Yongluan Zhou. 2015. Dynamic resource management in a massively parallel stream processing engine. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM’15). ACM, New York, NY, 13--22. Google ScholarDigital Library
- K. G. S. Madsen, Y. Zhou, and J. Cao. 2017. Integrative dynamic reconfiguration in a parallel stream processing engine. In Proceedings of the IEEE 33rd International Conference on Data Engineering (ICDE’17). IEEE, 227--230.Google Scholar
- C. Mayer, M. A. Tariq, R. Mayer, and K. Rothermel. 2018. GrapH: Traffic-aware graph processing. IEEE Trans. Parallel Distrib. Syst. 99 (2018), 1--1.Google Scholar
- Ruben Mayer, Boris Koldehofe, and Kurt Rothermel. 2015. Predictable low-latency event detection with parallel complex event processing. IEEE Internet Things J. 2, 4 (Aug. 2015), 274--286.Google ScholarCross Ref
- Ruben Mayer, Christian Mayer, Muhammad Adnan Tariq, and Kurt Rothermel. 2016. GraphCEP: Real-time data analytics using parallel complex event and graph processing. In Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems (DEBS’16). ACM, New York, NY, 309--316. Google ScholarDigital Library
- Ruben Mayer, Ahmad Slo, Muhammad Adnan Tariq, Kurt Rothermel, Manuel Gräber, and Umakishore Ramachandran. 2017. SPECTRE: Supporting consumption policies in window-based parallel complex event processing. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference (Middleware’17). ACM, New York, NY, 161--173. Google ScholarDigital Library
- Ruben Mayer, Muhammad Adnan Tariq, and Kurt Rothermel. 2017. Minimizing communication overhead in window-based parallel complex event processing. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems (DEBS’17). ACM, New York, NY, 54--65. Google ScholarDigital Library
- Yuan Mei and Samuel Madden. 2009. ZStream: A cost-based query processor for adaptively detecting composite events. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data (SIGMOD’09). ACM, New York, NY, 193--206. Google ScholarDigital Library
- Gabriele Mencagli. 2016. Adaptive model predictive control of autonomic distributed parallel computations with variable horizons and switching costs. Concurr. Comput.: Pract. Exper. 28, 7 (2016), 2187--2212. Google ScholarDigital Library
- Gabriele Mencagli. 2016. A game-theoretic approach for elastic distributed data stream processing. ACM Trans. Auton. Adapt. Syst. 11, 2, Article 13 (June 2016). Google ScholarDigital Library
- Gabriele Mencagli, Massimo Torquati, and Marco Danelutto. 2018. Elastic-PPQ: A two-level autonomic system for spatial preference query processing over dynamic data streams. Future Gen. Comput. Syst. 79 (2018), 862--877.Google ScholarCross Ref
- G. Mencagli, M. Torquati, M. Danelutto, and T. De Matteis. 2017. Parallel continuous preference queries over out-of-order and bursty data streams. IEEE Trans. Parallel Distrib. Syst. 28, 9 (Sept. 2017), 2608--2624.Google ScholarDigital Library
- Gabriele Mencagli, Massimo Torquati, Fabio Lucattini, Salvatore Cuomo, and Marco Aldinucci. 2018. Harnessing sliding-window execution semantics for parallel stream processing. J. Parallel Distrib. Comput. 116 (2018), 74--88.Google ScholarCross Ref
- Gabriele Mencagli, Marco Vanneschi, and Emanuele Vespa. 2014. A cooperative predictive control approach to improve the reconfiguration stability of adaptive distributed parallel applications. ACM Trans. Auton. Adapt. Syst. 9, 1, Article 2 (Mar. 2014). Google ScholarDigital Library
- Y. Nakamura, H. Suwa, Y. Arakawa, H. Yamaguchi, and K. Yasumoto. 2016. Design and implementation of middleware for IoT devices toward real-time flow processing. In Proceedings of the IEEE 36th International Conference on Distributed Computing Systems Workshops (ICDCSW’16). 162--167.Google Scholar
- M. A. U. Nasir, G. De Francisci Morales, D. García-Soriano, N. Kourtellis, and M. Serafini. 2015. The power of both choices: Practical load balancing for distributed stream processing engines. In Proceedings of the IEEE 31st International Conference on Data Engineering. 137--148.Google Scholar
- M. A. U. Nasir, G. D. F. Morales, N. Kourtellis, and M. Serafini. 2016. When two choices are not enough: Balancing at scale in Distributed Stream Processing. In Proceedings of the IEEE 32nd International Conference on Data Engineering (ICDE’16). 589--600.Google Scholar
- L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. 2010. S4: Distributed stream computing platform. In Proceedings of the IEEE International Conference on Data Mining Workshops. 170--177. Google ScholarDigital Library
- Beate Ottenwälder, Boris Koldehofe, Kurt Rothermel, Kirak Hong, and Umakishore Ramachandran. 2014. RECEP: Selection-based reuse for distributed complex event processing. In Proceedings of the 8th ACM International Conference on Distributed Event-Based Systems (DEBS’14). ACM, New York, NY, 59--70. Google ScholarDigital Library
- Beate Ottenwälder, Boris Koldehofe, Kurt Rothermel, and Umakishore Ramachandran. 2013. MigCEP: Operator migration for mobility driven distributed complex event processing. In Proceedings of the 7th ACM International Conference on Distributed Event-based Systems (DEBS’13). ACM, New York, NY, 183--194. Google ScholarDigital Library
- P. Pietzuch, J. Ledlie, J. Shneidman, M. Roussopoulos, M. Welsh, and M. Seltzer. 2006. Network-aware operator placement for stream-processing systems. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06). 49--49. Google ScholarDigital Library
- Olga Poppe, Chuan Lei, Salah Ahmed, and Elke A. Rundensteiner. 2017. Complete event trend detection in high-rate event streams. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’17). ACM, New York, NY, 109--124. Google ScholarDigital Library
- Do Le Quoc, Ruichuan Chen, Pramod Bhatotia, Christof Fetzer, Volker Hilt, and Thorsten Strufe. 2017. StreamApprox: Approximate computing for stream analytics. In Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference (Middleware’17). ACM, New York, NY, 185--197. Google ScholarDigital Library
- Medhabi Ray, Chuan Lei, and Elke A. Rundensteiner. 2016. Scalable pattern sharing on event streams. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD’16). ACM, New York, NY, 495--510. Google ScholarDigital Library
- Nicoló Rivetti, Emmanuelle Anceaume, Yann Busnel, Leonardo Querzoni, and Bruno Sericola. 2016. Online scheduling for shuffle grouping in distributed stream processing systems. In Proceedings of the 17th International Middleware Conference (Middleware’16). ACM, New York, NY. Google ScholarDigital Library
- Nicoló Rivetti, Leonardo Querzoni, Emmanuelle Anceaume, Yann Busnel, and Bruno Sericola. 2015. Efficient key grouping for near-optimal load balancing in stream processing systems. In Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (DEBS’15). ACM, New York, NY, 80--91. Google ScholarDigital Library
- S. Rizou, F. Dürr, and K. Rothermel. 2010. Solving the multi-operator placement problem in large-scale operator networks. In 2010 Proceedings of the 19th International Conference on Computer Communications and Networks. 1--6.Google Scholar
- Omran Saleh, Heiko Betz, and Kai-Uwe Sattler. 2015. Partitioning for scalable complex event processing on data streams. In New Trends in Database and Information Systems II. Springer, Cham, 185--197. https://link.springer.com/chapter/10.1007/978-3-319-10518-5_15Google Scholar
- Omran Saleh and Kai-Uwe Sattler. 2015. The pipeflow approach: Write once, run in different stream-processing engines. In Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems (DEBS’15). ACM, New York, NY, 368--371. Google ScholarDigital Library
- B. Satzger, W. Hummer, P. Leitner, and S. Dustdar. 2011. Esc: Towards an elastic stream computing platform for the cloud. In Proceedings of the IEEE 4th Conf.International Conference on Cloud Computing. 348--355. Google ScholarDigital Library
- S. Schneider, H. Andrade, B. Gedik, A. Biem, and K. Wu. 2009. Elastic scaling of data parallel operators in stream processing. In Proceedings of the IEEE International Symposium on Parallel Distributed Processing. 1--12. Google ScholarDigital Library
- Scott Schneider, Martin Hirzel, Bugra Gedik, and Kun-Lung Wu. 2012. Auto-parallelizing stateful distributed streaming applications. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT’12). ACM, New York, NY, 53--64. Google ScholarDigital Library
- S. Schneider, M. Hirzel, B. Gedik, and K. L. Wu. 2015. Safe data parallelism for general streaming. IEEE Trans. Comput. 64, 2 (Feb. 2015), 504--517.Google ScholarDigital Library
- Scott Schneider, Joel Wolf, Kirsten Hildrum, Rohit Khandekar, and Kun-Lung Wu. 2016. Dynamic load balancing for ordered data-parallel regions in distributed streaming systems. In Proceedings of the 17th International Middleware Conference (Middleware’16). ACM, New York, NY, 21:1--21:14. Google ScholarDigital Library
- M. A. Shah, J. M. Hellerstein, Sirish Chandrasekaran, and M. J. Franklin. 2003. Flux: An adaptive partitioning operator for continuous query systems. In Proceedings of the 19th International Conference on Data Engineering (Cat. No. 03CH37405). 25--36.Google Scholar
- Anatoli Shein, Panos Chrysanthis, and Alexandros Labrinidis. 2018. SlickDeque: High Throughput and Low Latency Incremental Sliding-Window Aggregation. In Proceedings of the int. conf. on Extending Data Base Technology (EDBT), Vienna, Austria, March 26--29 (2018), 397--408, openprocessdings.org.Google Scholar
- S. Shevtsov, M. Berekmeri, D. Weyns, and M. Maggio. 2018. Control-theoretical software adaptation: A systematic literature review. IEEE Trans. Softw. Eng. 44, 8 (Aug. 2018), 784--810. Google ScholarDigital Library
- A. Shukla and Y. Simmhan. 2018. Toward reliable and rapid elasticity for streaming dataflows on clouds. In Proceedings of the IEEE 38th International Conference on Distributed Computing Systems (ICDCS’18). 1096--1106.Google Scholar
- Dawei Sun, Guangyan Zhang, Songlin Yang, Weimin Zheng, Samee U. Khan, and Keqin Li. 2015. Re-stream: Real-time and energy-efficient resource scheduling in big data stream computing environments. Info. Sci. 319 (2015), 92--112. Google ScholarDigital Library
- SystemS 2019. IBM System S Project website. Retrieved from http://researcher.watson.ibm.com/researcher/view_group_subpage.php?id=2534/.Google Scholar
- Yuzhe Tang and Bugra Gedik. 2013. Autopipelining for data stream processing. IEEE Trans. Parallel Distrib. Syst. 24, 12 (2013), 2344--2354. Google ScholarDigital Library
- Kanat Tangwongsan, Martin Hirzel, and Scott Schneider. 2017. Low-latency sliding-window aggregation in worst-case constant time. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems (DEBS’17). ACM, New York, NY, 66--77. Google ScholarDigital Library
- Kanat Tangwongsan, Martin Hirzel, Scott Schneider, and Kun-Lung Wu. 2015. General incremental sliding-window aggregation. Proc. VLDB Endow. 8, 7 (Feb. 2015), 702--713. Google ScholarDigital Library
- William Thies, Michal Karczmarek, and Saman Amarasinghe. 2002. StreamIt: A language for streaming applications. In Compiler Construction. Springer, Berlin, 179--196. Google ScholarDigital Library
- J. Urbani, A. Margara, C. Jacobs, S. Voulgaris, and H. Bal. 2014. AJIRA: A lightweight distributed middleware for mapreduce and stream processing. In Proceedings of the IEEE 34th International Conference on Distributed Computing Systems. 545--554. Google ScholarDigital Library
- J. S. v. d. Veen, B. v. d. Waaij, E. Lazovik, W. Wijbrandi, and R. J. Meijer. 2015. Dynamically scaling apache storm for the analysis of streaming data. In Proceedings of the IEEE First International Conference on Big Data Computing Service and Applications. 154--161. Google ScholarDigital Library
- Uri Verner, Assaf Schuster, and Mark Silberstein. 2011. Processing data streams with hard real-time constraints on heterogeneous systems. In Proceedings of the International Conference on Supercomputing (ICS’11). ACM, New York, NY, 120--129. Google ScholarDigital Library
- Y. H. Wang, K. Cao, and X. M. Zhang. 2013. Complex event processing over distributed probabilistic event streams. Comput. Math. Appl. 66, 10 (Dec. 2013), 1808--1821. Google ScholarDigital Library
- D. Warneke and O. Kao. 2011. Exploiting dynamic resource allocation for efficient parallel data processing in the cloud. IEEE Trans. Parallel Distrib. Syst. 22, 6 (June 2011), 985--997. Google ScholarDigital Library
- Thomas Weigold, Marco Aldinucci, Marco Danelutto, and Vladimir Getov. 2012. Process-driven biometric identification by means of autonomic grid components. Int. J. Auton. Adapt. Commun. Syst. 5, 3 (July 2012), 274--291. Google ScholarDigital Library
- Joel Wolf, Nikhil Bansal, Kirsten Hildrum, Sujay Parekh, Deepak Rajan, Rohit Wagle, Kun-Lung Wu, and Lisa Fleischer. 2008. SODA: An optimizing scheduler for large-scale stream-based distributed computer systems. In Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware (Middleware’08). Springer-Verlag New York, Inc., New York, NY, 306--325. http://dl.acm.org/citation.cfm?id=1496950.1496970 Google ScholarDigital Library
- Louis Woods, Jens Teubner, and Gustavo Alonso. 2010. Complex event detection at wire speed with FPGAs. Proc. VLDB Endow. 3, 1-2 (Sept. 2010), 660--669. Google ScholarDigital Library
- Eugene Wu, Yanlei Diao, and Shariq Rizvi. 2006. High-performance complex event processing over streams. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD’06). ACM, New York, NY, 407--418. Google ScholarDigital Library
- Sai Wu, Vibhore Kumar, Kun-Lung Wu, and Beng Chin Ooi. 2012. Parallelizing stateful operators in a distributed stream processing system: How, should you and how much? In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems. ACM, 278--289. http://dl.acm.org/citation.cfm?id=2335515. Google ScholarDigital Library
- Y. Wu and K. L. Tan. 2015. ChronoStream: Elastic stateful stream computation in the cloud. In Proceedings of the IEEE 31st International Conference on Data Engineering. 723--734.Google Scholar
- Ying Xing, Jeong-Hyon Hwang, Uǧur Çetintemel, and Stan Zdonik. 2006. Providing resiliency to load variations in distributed stream processing. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB’06). VLDB Endowment, 775--786. http://dl.acm.org/citation.cfm?id=1182635.1164194 Google ScholarDigital Library
- Jielong Xu, Zhenhua Chen, Jian Tang, and Sen Su. 2014. T-storm: Traffic-aware online scheduling in storm. In Proceedings of the IEEE 34th International Conference on Distributed Computing Systems. 535--544. Google ScholarDigital Library
- L. Xu, B. Peng, and I. Gupta. 2016. Stela: Enabling stream processing systems to scale-in and scale-out on-demand. In Proceedings of the IEEE International Conference on Cloud Engineering (IC2E’16). 22--31.Google Scholar
- Keiichi Yasumoto, Hirozumi Yamaguchi, and Hiroshi Shigeno. 2016. Survey of real-time processing technologies of iot data streams. J. Info. Process. 24, 2 (2016), 195--202.Google ScholarCross Ref
- Nikos Zacheilas, Vana Kalogeraki, Nikolas Zygouras, Nikolaos Panagiotou, and Dimitrios Gunopulos. 2015. Elastic complex event processing exploiting prediction. In Proceedings of the IEEE International Conference on Big Data (BigData’15). IEEE, 213--222. Google ScholarDigital Library
- Nikos Zacheilas, Nikolas Zygouras, Nikolaos Panagiotou, Vana Kalogeraki, and Dimitrios Gunopulos. 2016. Dynamic load balancing techniques for distributed complex event processing systems. In Distributed Applications and Interoperable Systems. Springer, Cham, 174--188. Retrieved from https://link.springer.com/chapter/10.1007/978-3-319-39577-7_14. Google ScholarDigital Library
- Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Scott Shenker, and Ion Stoica. 2013. Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 423--438. Google ScholarDigital Library
- E. Zeitler and Tore Risch. 2011. Massive scale-out of expensive continuous queries. In Proceedings of the VLDB Endowment, Vol. 4, No. 11.Google ScholarDigital Library
- Liang Zheng, Carlee Joe-Wong, Chee Wei Tan, Mung Chiang, and Xinyu Wang. 2015. How to bid the cloud. In Proceedings of the ACM Conference on Special Interest Group on Data Communication (SIGCOMM’15). ACM, New York, NY, 71--84. Google ScholarDigital Library
- D. Zimmer and R. Unland. 1999. On the semantics of complex events in active database management systems. In Proceedings of the 15th International Conference on Data Engineering. 392--399. Google ScholarDigital Library
- Nikolas Zygouras, Nikos Zacheilas, Vana Kalogeraki, Dermot Kinane, and Dimitrios Gunopulos. 2015. In Proceedings of the International Conference on Extending Database Technology (EDBT’15). Retrieved from OpenProceedings.org.Google Scholar
Index Terms
- A Comprehensive Survey on Parallelization and Elasticity in Stream Processing
Recommendations
Dual-Paradigm Stream Processing
ICPP '18: Proceedings of the 47th International Conference on Parallel ProcessingExisting stream processing frameworks operate either under data stream paradigm processing data record by record to favor low latency, or under operation stream paradigm processing data in micro-batches to desire high throughput. For complex and mutable ...
Taming velocity and variety simultaneously in big data with stream reasoning: tutorial
DEBS '16: Proceedings of the 10th ACM International Conference on Distributed and Event-based SystemsMany "big data" applications must tame velocity (processing data in-motion) and variety (processing many different types of data) simultaneously.
The research on knowledge representation and reasoning has focused on the variety of data, devising data ...
Distributed data stream processing and edge computing
Under several emerging application scenarios, such as in smart cities, operational monitoring of large infrastructure, wearable assistance, and Internet of Things, continuous data streams must be processed under very short delays. Several solutions, ...
Comments