Abstract
Data stream processing (DSP) has emerged over the years as the reference paradigm for the analysis of continuous and fast information flows, which often have to be processed with low-latency requirements to extract insights and knowledge from raw data. Dealing with unbounded dataflows, DSP applications are typically long running and thus, likely experience varying workloads and working conditions over time. To keep a consistent service level in face of such variability, a lot of effort has been spent studying strategies for runtime adaptation of DSP systems and applications. In this survey, we review the most relevant approaches from the literature, presenting a taxonomy to characterize the state of the art along several key dimensions. Our analysis allows us to identify current research trends as well as open challenges that will motivate further investigations in this field.
Supplemental Material
Available for Download
Supplementary appendix
- [1] . 2005. The design of the Borealis stream processing engine. In Proc. of CIDR’05. 277–289.Google Scholar
- [2] . 2003. Aurora: A new model and architecture for data stream management. VLDB J. 12, 2 (2003), 120–139.Google ScholarDigital Library
- [3] . 2020. Prompt: Dynamic data-partitioning for distributed micro-batch stream processing systems. In Proc. of ACM SIGMOD’20. ACM, New York, NY, 2455–2469.Google Scholar
- [4] . 2015. The dataflow model: A practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc. VLDB Endow. 8, 12 (2015), 1792–1803.Google ScholarDigital Library
- [5] . 2019. On SDN-enabled online and dynamic bandwidth allocation for stream analytics. IEEE J. Sel. Areas Commun. 37, 8 (2019), 1688–1702.Google ScholarDigital Library
- [6] . 2006. Adaptive control of extreme-scale stream processing systems. In Proc. of IEEE ICDCS’06.Google Scholar
- [7] . 2013. Adaptive online scheduling in storm. In Proc. of ACM DEBS’13. 207–218.Google Scholar
- [8] . 2020. Staleness control for edge data analytics. Proc. ACM Meas. Anal. Comput. Syst. 4, 2 (2020), Article 38, 24 pages.Google ScholarDigital Library
- [9] . 2018. Distributed data stream processing and edge computing: A survey on resource elasticity and future directions. J. Netw. Comput. Appl. 103 (2018), 1–17.Google ScholarDigital Library
- [10] . 2002. Models and issues in data stream systems. In Proc. of ACM PODS’02. 1–16.Google Scholar
- [11] . 2004. Load shedding for aggregation queries over data streams. In Proc. of ICDE’04. IEEE, Los Alamitos, CA, 350–361.Google Scholar
- [12] . 2008. Fault-tolerance in the Borealis distributed stream processing system. ACM Trans. Database Syst. 33, 1 (2008), Article 3, 44 pages.Google ScholarDigital Library
- [13] . 2004. Contract-based load management in federated distributed systems. In Proc. of USENIX NSDI’04.Google Scholar
- [14] . 2013. Adaptive input admission and management for parallel stream processing. In Proc. of ACM DEBS’13. 15–26.Google Scholar
- [15] . 2021. Watermarks in stream processing systems: Semantics and comparative analysis of Apache Flink and Google Cloud dataflow. Proc. VLDB Endow. 14, 12 (2021), 3135–3147.Google ScholarDigital Library
- [16] . 2020. Group mutual exclusion to scale distributed stream processing pipelines. In Proc. of IEEE/ACM UCC’20. 247–256.Google Scholar
- [17] . 2014a. Adaptive fault-tolerance for dynamic resource provisioning in distributed stream processing systems. In Proc. of EDBT’14. 85–96.Google Scholar
- [18] . 2014b. Priority-based resource scheduling in distributed stream processing systems for big data applications. In Proc. of IEEE/ACM UCC’14. 363–370.Google Scholar
- [19] . 2006. Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications (2nd ed.). Wiley.Google ScholarCross Ref
- [20] . 2019. Minimizing cost by reducing scaling operations in distributed stream processing. Proc. VLDB Endow. 12, 7 (2019), 724–737.Google ScholarDigital Library
- [21] . 2017. Online scheduling and interference alleviation for low-latency, high-throughput processing of data streams. IEEE Trans. Parallel Distrib. Syst. 28, 12 (2017), 3553–3569.Google ScholarDigital Library
- [22] . 2008. A cost-based approach to adaptive resource management in data stream systems. IEEE Trans. Knowl. Data Eng. 20, 2 (2008), 230–245.Google ScholarDigital Library
- [23] . 2016. Locality-aware routing in stateful streaming applications. In Proc. of ACM/IFIP/USENIX MIDDLEWARE’16. ACM, New York, NY, Article
4 , 13 pages.Google Scholar - [24] . 2017. State management in Apache Flink®: Consistent stateful distributed stream processing. Proc. VLDB Endow. 10, 12 (2017), 1718–1729.Google ScholarDigital Library
- [25] . 2018b. Decentralized self-adaptation for elastic data stream processing. Future Gener. Comput. Syst. 87 (2018), 171–185.Google ScholarDigital Library
- [26] . 2018a. Optimal operator deployment and replication for elastic distributed data stream processing. Concurr. Comp. Pract. Exp. 30, 9 (2018).Google Scholar
- [27] . 2016. Elastic stateful stream processing in storm. In Proc. of HPCS’16. IEEE, Los Alamitos, CA, 583–590.Google Scholar
- [28] . 2010. A framework to enforce access control over data streams. ACM Trans. Inf. Syst. Secur. 13, 3 (2010), Article 28, 31 pages.Google ScholarDigital Library
- [29] . 2019. The rise of serverless computing. Commun. ACM 62, 12 (2019), 44–54.Google ScholarDigital Library
- [30] . 2012. Adaptive provisioning of stream processing systems in the cloud. In Proc. of IEEE ICDE’12. 295–301.Google Scholar
- [31] . 2003. TelegraphCQ: Continuous dataflow processing. In Proc. of ACM SIGMOD’03. 668.Google Scholar
- [32] . 2020. R-MStorm: A resilient mobile stream processing system for dynamic edge networks. In Proc. of IEEE ICFC’20. 64–72.Google Scholar
- [33] . 2018. F-MStorm: Feedback-based online distributed mobile stream processing. In Proc. of IEEE/ACM SEC’18. 273–285.Google Scholar
- [34] . 2019. Toward resilient stream processing on clouds using moving target defense. In Proc. of IEEE ISORC’19. 134–142.Google Scholar
- [35] . 2021. Cost-effective sharing of streaming dataflows for IoT applications. IEEE Trans. Cloud Comput. 9, 4 (2021), 1391–1407.Google ScholarCross Ref
- [36] . 2014. Fast heuristics for near-optimal task allocation in data stream processing over clusters. In Proc. of ACM CIKM’14. 1579–1588.Google Scholar
- [37] . 2017. GOVERNOR: Smoother stream processing through smarter backpressure. In Proc. of IEEE ICAC’17. 145–154.Google Scholar
- [38] . 2018. Adaptive scheduling parallel jobs with dynamic batching in spark streaming. IEEE Trans. Parallel Distrib. Syst. 29, 12 (2018), 2672–2685.Google ScholarCross Ref
- [39] . 2012. Processing flows of information: From data stream to complex event processing. ACM Comput. Surv. 44, 3 (2012), Article 15, 62 pages.Google ScholarDigital Library
- [40] . 2013. Deployment strategies for distributed complex event processing. Computing 95, 2 (2013), 129–156.Google ScholarDigital Library
- [41] . 2014. Adaptive stream processing using dynamic batch sizing. In Proc. of ACM SoCC’14. Article 16, 13 pages.Google Scholar
- [42] . 2018. Recent advancements in event processing. ACM Comput. Surv. 51, 2 (2018), Article 33, 36 pages.Google Scholar
- [43] . 2017b. Elastic scaling for distributed latency-sensitive data stream operators. In Proc. of PDP’17. IEEE, Los Alamitos, CA, 61–68.Google Scholar
- [44] . 2017a. Proactive elasticity and energy awareness in data stream processing. J. Syst. Softw. 127 (2017), 302–319.Google ScholarDigital Library
- [45] . 2020. Scalable joint optimization of placement and parallelism of data stream processing applications on cloud-edge infrastructure. In Service-Oriented Computing. Lecture Notes in Computer Science, Vol. 12571. Springer, 149–164.Google Scholar
- [46] . 2016. New techniques to curtail the tail latency in stream processing systems. In Proc. of DCC@PODC’16. ACM, New York, NY, Article 7, 6 pages.Google Scholar
- [47] . 2018. Strome: Energy-aware data-stream processing. In Distributed Applications and Interoperable Systems. Lecture Notes in Computer Science, Vol. 10853. Springer, 40–57.Google Scholar
- [48] . 2016. P-scheduler: Adaptive hierarchical scheduling in Apache Storm. In Proc. of ACSW’16.ACM, New York, NY, Article
26 , 10 pages.Google Scholar - [49] . 2019. Integrating workload balancing and fault tolerance in distributed stream processing system. World Wide Web 22, 6 (2019), 2471–2496.Google ScholarCross Ref
- [50] . 2018. Distributed stream rebalance for stateful operator under workload variance. IEEE Trans. Parallel Distrib. Syst. 29, 10 (2018), 2223–2240.Google ScholarCross Ref
- [51] . 2021. Klink: Progress-aware scheduling for streaming data systems. In Proc. of ACM SIGMOD’21. 485–498.Google Scholar
- [52] . 2013. Integrating scale out and fault tolerance in stream processing using operator state management. In Proc. of ACM SIGMOD’13. 725–736.Google Scholar
- [53] . 2017. Dhalion: Self-regulating stream processing in Heron. Proc. VLDB Endow. 10, 12 (2017), 1825–1836.Google ScholarDigital Library
- [54] . 2020. A survey on the evolution of stream processing systems. CoRR abs/2008.00842 (2020).Google Scholar
- [55] . 2017. DRS: Auto-scaling for real-time stream analytics. IEEE/ACM Trans. Netw. 25, 6 (2017), 3338–3352.Google ScholarDigital Library
- [56] . 2019. EdgeWise: A better stream processing engine for the edge. In Proc. of USENIX ATC’19. 929–946.Google Scholar
- [57] . 2014. Elastic scaling for data stream processing. IEEE Trans. Parallel Distrib. Syst. 25, 6 (2014), 1447–1463.Google ScholarDigital Library
- [58] . 2003. Issues in data stream management. ACM SIGMOD Rec. 32, 2 (2003), 5–14.Google ScholarDigital Library
- [59] . 2005. Optimal component composition for scalable stream processing. In Proc. of IEEE ICDCS’05. 773–782.Google Scholar
- [60] . 2012. StreamCloud: An elastic and scalable data streaming system. IEEE Trans. Parallel Distrib. Syst. 23, 12 (2012), 2351–2365.Google ScholarDigital Library
- [61] . 2019. Elasticity. In Encyclopedia of Big Data Technologies. Springer.Google ScholarCross Ref
- [62] . 2017a. CBP: A new parallelization paradigm for massively distributed stream processing. In Database Systems for Advanced Applications. Lecture Notes in Computer Science, Vol. 10178. Springer, 304–320.Google Scholar
- [63] . 2017b. Stateful load balancing for parallel stream processing. In Euro-Par 2017: Parallel Processing Workshops. Lecture Notes in Computer Science, Vol. 10659. Springer, 80–93.Google Scholar
- [64] . 2014. Elastic allocator: An adaptive task scheduler for streaming query in the cloud. In Proc. of IEEE SOSE’14. 284–289.Google Scholar
- [65] . 2017. SecureStreams: A reactive middleware framework for secure data stream processing. In Proc. of ACM DEBS’17. 124–133.Google Scholar
- [66] . 2020. Optimizing timeliness and cost in geo-distributed streaming analytics. IEEE Trans. Cloud Comput. 8, 1 (2020), 232–245.Google ScholarCross Ref
- [67] . 2014a. Latency-aware elastic scaling for distributed data stream processing systems. In Proc. of ACM DEBS’14. 13–22.Google Scholar
- [68] . 2014b. Auto-scaling techniques for elastic data stream processing. In Proc. of IEEE ICDEW’14. 296–302.Google Scholar
- [69] . 2015a. Online parameter optimization for elastic data stream processing. In Proc. of ACM SoCC’15. 276–287.Google Scholar
- [70] . 2015b. An adaptive replication scheme for elastic data stream processing systems. In Proc. of ACM DEBS’15. 150–161.Google Scholar
- [71] . 2020. A survey on automatic parameter tuning for big data processing systems. ACM Comput. Surv. 53, 2 (2020), Article 43, 37 pages.Google Scholar
- [72] . 2017. Self-adaptive processing graph with operator fission for elastic stream processing. J. Syst. Softw. 127 (2017), 205–216.Google ScholarDigital Library
- [73] . 2013. A catalog of stream processing optimizations. ACM Comput. Surv. 46, 4 (2013), Article 46, 34 pages.Google Scholar
- [74] . 2016. Elastic stream processing for the Internet of Things. In Proc. of IEEE CLOUD’16. 100–107.Google Scholar
- [75] . 2019. Megaphone: Latency-conscious state migration for distributed streaming dataflows. Proc. VLDB Endow. 12, 9 (2019), 1002–1015.Google ScholarDigital Library
- [76] . 2020. Q-Flink: A QoS-aware controller for Apache Flink. In Proc. of IEEE/ACM CCGRID’20. 629–638.Google Scholar
- [77] . 2016. A QoS-aware controller for Apache storm. In Proc. of IEEE NCA’16. 334–342.Google Scholar
- [78] . 2017. QoS- and contention- aware resource provisioning in a stream processing engine. In Proc. of IEEE CLUSTER’17. 137–146.Google Scholar
- [79] . 2016. Toward high-performance distributed stream processing via approximate fault tolerance. Proc. VLDB Endow. 10, 3 (2016), 73–84.Google ScholarDigital Library
- [80] . 2020. POTUS: Predictive online tuple scheduling for data stream processing systems. IEEE Trans. Cloud Comput.
To appear. Google Scholar - [81] . 2008. Fast and highly-available stream processing over wide area networks. In Proc. of IEEE ICDE’08. 804–813.Google Scholar
- [82] . 2018. Uncertainty-aware elastic virtual machine scheduling for stream processing systems. In Proc. of IEEE/ACM CCGRID’18. 62–71.Google Scholar
- [83] . 2016. 5W+1H pattern: A perspective of systematic mapping studies and a case study on cloud software testing. J. Syst. Softw. 116 (2016), 206–219.Google ScholarDigital Library
- [84] . 2020. Merge, split, and cluster: Dynamic deployment of stream processing applications. In Proc. of IEEE/ACM CCGRID’20. 71–80.Google Scholar
- [85] . 2020. WASP: Wide-area adaptive stream processing. In Proc. of ACM/IFIP MIDDLEWARE’20. ACM, New York, NY, 221–235.Google Scholar
- [86] . 2020. Joker: Elastic stream processing with organic adaptation. J. Parallel Distrib. Comput. 137 (2020), 205–223.Google ScholarDigital Library
- [87] . 2018. Three steps is all you need: Fast, accurate, automatic scaling decisions for distributed streaming dataflows. In Proc. of USENIX OSDI’18. 783–798.Google Scholar
- [88] . 2018. Henge: Intent-driven multi-tenant stream processing. In Proc. of ACM SoCC’18. 249–262.Google Scholar
- [89] . 2012. Overload management in data stream processing systems with latency guarantees. In Proc. of FCW’12.Google Scholar
- [90] . 2016. THEMIS: Fairness in federated stream processing under overload. In Proc. of ACM SIGMOD’16. 541–553.Google Scholar
- [91] . 2011. SQPR: Stream query planning with reuse. In Proc. of IEEE ICDE’11. 840–851.Google Scholar
- [92] . 2017. A holistic view of stream partitioning costs. Proc. VLDB Endow. 10, 11 (2017), 1286–1297.Google ScholarDigital Library
- [93] . 2018. Concept-driven load shedding: Reducing size and error of voluminous and variable data streams. In Proc. of IEEE Big Data’18. 418–427.Google Scholar
- [94] . 2020. SPEAr: Expediting stream processing with accuracy guarantees. In Proc. of IEEE ICDE’20. 1105–1116.Google Scholar
- [95] . 2011. Balancing load in stream processing with the cloud. In Proc. of IEEE ICDE’11. 16–21.Google Scholar
- [96] . 2018. Pocket: Elastic ephemeral storage for serverless analytics. In Proc. of USENIX OSDI’18. 427–444.Google Scholar
- [97] . 2016. SABER: Window-based hybrid stream processing for heterogeneous architectures. In Proc. of ACM SIGMOD’16. 555–569.Google Scholar
- [98] . 2017. A preventive auto-parallelization approach for elastic stream processing. In Proc. of IEEE ICDCS’17. 1532–1542.Google Scholar
- [99] . 2014. PLAStiCC: Predictive look-ahead scheduling for continuous dataflows on clouds. In Proc. of IEEE/ACM CCGrid’14. 344–353.Google Scholar
- [100] . 2015. Reactive resource provisioning heuristics for dynamic dataflows on cloud infrastructure. IEEE Trans. Cloud Comput. 3, 2 (2015), 105–118.Google ScholarCross Ref
- [101] . 2008. Placement strategies for internet-scale data stream systems. IEEE Internet Comput. 12, 6 (2008), 50–60.Google ScholarDigital Library
- [102] . 2008. Biologically-inspired distributed middleware management for stream processing systems. In Middleware 2008.Lecture Notes in Computer Science, Vol. 5346. Springer, 223–242.Google Scholar
- [103] . 2017a. PrivApprox: Privacy-preserving stream analytics. In Proc. of USENIX ATC’17. 659–672.Google Scholar
- [104] . 2017b. StreamApprox: Approximate computing for stream analytics. In Proc. of ACM/IFIP/USENIX MIDDLEWARE’17. ACM, New York, NY, 185–197.Google Scholar
- [105] . 2014. Robust distributed query processing for streaming data. ACM Trans. Database Syst. 39, 2 (2014), Article 17, 45 pages.Google ScholarDigital Library
- [106] . 2016. Enabling elastic stream processing in shared clusters. In Proc. of IEEE CLOUD’16. 108–115.Google Scholar
- [107] . 2019. A holistic stream partitioning algorithm for distributed stream processing systems. In Proc. of PDCAT’19. IEEE, Los Alamitos, CA, 202–207.Google Scholar
- [108] . 2018. Model-free control for distributed stream data processing using deep reinforcement learning. Proc. VLDB Endow. 11, 6 (2018), 705–718.Google ScholarDigital Library
- [109] . 2019. Efficient time-evolving stream processing at scale. IEEE Trans. Parallel Distrib. Syst. 30, 10 (2019), 2165–2178.Google ScholarCross Ref
- [110] . 2017. D-storm: Dynamic resource-efficient scheduling of stream processing applications. In Proc. of ICPADS’17. 485–492.Google Scholar
- [111] . 2020. Resource management and scheduling in distributed stream processing systems: A taxonomy, review, and future directions. ACM Comput. Surv. 53, 3 (2020), Article 50, 41 pages.Google Scholar
- [112] . 2018. A stepwise auto-profiling method for performance optimization of streaming applications. ACM Trans. Auton. Adapt. Syst. 12, 4 (2018), Article 24, 33 pages.Google ScholarDigital Library
- [113] . 2015. Elastic stream processing with latency guarantees. In Proc. of IEEE ICDCS’15. 399–410.Google Scholar
- [114] . 2014. Nephele streaming: Stream processing under QoS constraints at scale. Clust. Comput. 17, 1 (2014), 61–78.Google ScholarDigital Library
- [115] . 2018. Elastic symbiotic scaling of operators and resources in stream processing systems. IEEE Trans. Parallel Distrib. Syst. 29, 3 (2018), 572–585.Google ScholarCross Ref
- [116] . 2018. TCEP: Adapting to dynamic user environments by enabling transitions between operator placement mechanisms. In Proc. of ACM DEBS’18. 136–147.Google Scholar
- [117] . 2017. Integrative dynamic reconfiguration in a parallel stream processing engine. In Proc. of IEEE ICDE’17. 227–230.Google Scholar
- [118] . 2016. Enorm: Efficient window-based computation in large-scale distributed stream processing systems. In Proc. of ACM DEBS’16. 37–48.Google Scholar
- [119] . 2018. Chi: A scalable and programmable control plane for distributed stream processing systems. Proc. VLDB Endow. 11, 10 (2018), 1303–1316.Google ScholarDigital Library
- [120] . 2019. Multi-level elasticity for data stream processing. IEEE Trans. Parallel Distrib. Syst. 30, 10 (2019), 2326–2337.Google ScholarCross Ref
- [121] . 2020. Turbine: Facebook’s service management platform for stream processing. In Proc. of IEEE ICDE’20. 1591–1602.Google Scholar
- [122] . 2016. A game-theoretic approach for elastic distributed data stream processing. ACM Trans. Auton. Adapt. Syst. 11, 2 (2016), Article 13, 34 pages.Google ScholarDigital Library
- [123] . 2018. Elastic-PPQ: A two-level autonomic system for spatial preference query processing over dynamic data streams. Future Gener. Comput. Syst. 79 (2018), 862–877.Google ScholarCross Ref
- [124] . 2017. Parallel continuous preference queries over out-of-order and bursty data streams. IEEE Trans. Parallel Distrib. Syst. 28, 9 (2017), 2608–2624.Google ScholarDigital Library
- [125] . 2020. Rhino: Efficient management of very large distributed state for stream processing engines. In Proc. of ACM SIGMOD’20. ACM, New York, NY, 2471–2486.Google Scholar
- [126] . 2019. BGElasor: Elastic-scaling framework for distributed streaming processing with deep neural network. In Network and Parallel Computing. Lecture Notes in Computer Science, Vol. 11783. Springer, 120–131.Google Scholar
- [127] . 2019. STRETCH: Scalable and elastic deterministic streaming analysis with virtual shared-nothing parallelism. In Proc. of ACM DEBS’19. 7–18.Google Scholar
- [128] . 2019. Efficient operator placement for distributed data stream processing applications. IEEE Trans. Parallel Distrib. Syst. 30, 8 (2019), 1753–1767.Google ScholarCross Ref
- [129] . 2017. A serverless real-time data analytics platform for edge computing. IEEE Internet Comput. 21, 4 (2017), 64–71.Google ScholarDigital Library
- [130] . 2019. Automating multi-level performance elastic components for IBM streams. In Proc. of ACM/IFIP Middleware’19. ACM, New York, NY, 163–175.Google Scholar
- [131] . 2018. Frontier: Resilient edge processing for the Internet of Things. Proc. VLDB Endow. 11, 10 (2018), 1178–1191.Google ScholarDigital Library
- [132] . 2014. MCEP: A mobility-aware complex event processing system. ACM Trans. Internet Technol. 14, 1 (2014), Article 6, 24 pages.Google ScholarDigital Library
- [133] . 2021. Lachesis: A middleware for customizing OS scheduling of stream processing queries. In Proc. of ACM Middleware’21. 365–378.Google Scholar
- [134] . 2009. Supporting generic cost models for wide-area stream processing. In Proc. of IEEE ICDE’09. 1084–1095.Google Scholar
- [135] . 2019. Streambox-TZ: Secure stream analytics at the edge with trustzone. In Proc. of USENIX ATC’19. 537–554.Google Scholar
- [136] . 2016. Avoiding class warfare: Managing continuous queries with differentiated classes of service. VLDB J. 25, 2 (2016), 197–221.Google ScholarDigital Library
- [137] . 2017. Uninterruptible migration of continuous queries without operator state migration. ACM SIGMOD Rec. 46, 3 (2017), 17–22.Google ScholarDigital Library
- [138] . 2006. Network-aware operator placement for stream-processing systems. In Proc. of IEEE ICDE’06. 49–60.Google Scholar
- [139] . 2019. Enactment of adaptation in data stream processing with latency implications—A systematic literature review. Inf. Softw. Technol. 111 (2019), 1–21.Google ScholarDigital Library
- [140] . 2020. SPARCLE: Stream processing applications over dispersed computing networks. In Proc. of IEEE ICDCS’20. 1067–1078.Google Scholar
- [141] . 2017. Latency aware elastic switching-based stream processing over compressed data streams. In Proc. of ACM/SPEC ICPE’17. 91–102.Google Scholar
- [142] . 2009. QoS-aware shared component composition for distributed stream processing systems. IEEE Trans. Parallel Distrib. Syst. 20, 7 (2009), 968–982.Google ScholarDigital Library
- [143] . 2016. Online scheduling for shuffle grouping in distributed stream processing systems. In Proc. of ACM/IFIP/USENIX Middleware’16.Google Scholar
- [144] . 2010. Solving the multi-operator placement problem in large-scale operator networks. In Proc. of IEEE ICCCN’10. 1–6.Google Scholar
- [145] . 2019. A comprehensive survey on parallelization and elasticity in stream processing. ACM Comput. Surv. 52, 2 (2019), Article 36, 37 pages.Google Scholar
- [146] . 2017. Cloud resource scaling for big data streaming applications using a layered multi-dimensional hidden Markov model. In Proc. of IEEE/ACM CCGRID’17. 848–857.Google Scholar
- [147] . 2021. MEAD: Model-based vertical auto-scaling for data stream processing. In Proc. of IEEE/ACM CCGRID’21. 314–323.Google Scholar
- [148] . 2019. Reinforcement learning based policies for elastic stream processing on heterogeneous resources. In Proc. of ACM DEBS’19. 31–42.Google Scholar
- [149] . 2021. Towards a security-aware deployment of data streaming applications in fog computing. In Fog/Edge Computing For Security, Privacy, and Applications. Springer, 355–385.Google Scholar
- [150] . 2016. SpanEdge: Towards unifying stream processing over central and near-the-edge data centers. In Proc. of IEEE/ACM SEC’16. 168–178.Google Scholar
- [151] . 2020. An overview of service placement problem in fog and edge computing. ACM Comput. Surv. 53, 3 (2020), Article 65, 35 pages.Google Scholar
- [152] . 2011. ESC: Towards an elastic stream computing platform for the cloud. In Proc. of IEEE CLOUD’11. 348–355.Google Scholar
- [153] . 2016. Incremental deployment and migration of geo-distributed situation awareness applications in the fog. In Proc. of ACM DEBS’16. 258–269.Google Scholar
- [154] . 2009. Elastic scaling of data parallel operators in stream processing. In Proc. of IEEE IPDPS’09. 1–12.Google Scholar
- [155] . 2016. Dynamic load balancing for ordered data-parallel regions in distributed streaming systems. In Proc. of ACM/IFIP/USENIX Middleware’16. ACM, New York, NY, Article
21 , 14 pages.Google Scholar - [156] . 2017. Low-synchronization, mostly lock-free, elastic scheduling for streaming runtimes. In Proc. of ACM SIGPLAN PLDI’17. 648–661.Google Scholar
- [157] . 2003. Flux: An adaptive partitioning operator for continuous query systems. In Proc. of ICDE’03. IEEE, Los Alamitos, CA, 25–36.Google Scholar
- [158] . 2008. Algorithms and metrics for processing multiple heterogeneous continuous queries. ACM Trans. Database Syst. 33, 1 (2008), Article 5, 44 pages.Google ScholarDigital Library
- [159] . 2018. Toward reliable and rapid elasticity for streaming dataflows on clouds. In Proc. of IEEE ICDCS’18. 1096–1106.Google Scholar
- [160] . 2019. Multi-objective reinforcement learning for reconfiguring data stream analytics on edge computing. In Proc. of ICPP’19. ACM, New York, NY, Article 106, 10 pages.Google Scholar
- [161] . 2020. Auto-sizing for stream processing applications at LinkedIn. In Proc. of USENIX HotCloud’20.Google Scholar
- [162] . 2019. eSPICE: Probabilistic load shedding from input event streams in complex event processing. In Proc. of ACM/IFIP Middleware’19. ACM, New York, NY, 215–227.Google Scholar
- [163] . 2020. State-aware load shedding from input event streams in complex event processing. IEEE Trans. Big Data.
To appear. Google Scholar - [164] . 2005. The 8 requirements of real-time stream processing. ACM SIGMOD Rec. 34, 4 (2005), 42–47.Google ScholarDigital Library
- [165] . 2020. Dynamic redirection of real-time data streams for elastic stream computing. Future Gener. Comput. Syst. 112 (2020), 193–208.Google ScholarCross Ref
- [166] . 2015. Re-Stream: Real-time and energy-efficient resource scheduling in big data stream computing environments. Inf. Sci. 319 (2015), 92–112.Google ScholarDigital Library
- [167] . 2020. A review on big data real-time stream processing and its scheduling techniques. Int. J. Parallel Emergent Distributed Syst. 35, 5 (2020), 571–601.Google ScholarCross Ref
- [168] . 2003. Load shedding in a data stream manager. In Proc. of VLDB’03. 309–320.Google Scholar
- [169] . 2007. Staying FIT: Efficient load shedding techniques for distributed stream processing. In Proc. of VLDB’07. ACM, New York, NY, 159–170.Google Scholar
- [170] . 2019. Reconfigurable streaming for the mobile edge. In Proc. of HotMobile’19. ACM, New York, NY, 153–158.Google Scholar
- [171] . 2018. A survey of state management in big data processing systems. VLDB J. 27, 6 (2018), 847–872.Google ScholarDigital Library
- [172] . 2017. Feedback-control and queueing theory-based resource management for streaming applications. IEEE Trans. Parallel Distrib. Syst. 28, 4 (2017), 1061–1075.Google ScholarDigital Library
- [173] . 2018. Reducing tail latencies while improving resiliency to timing errors for stream processing workloads. In Proc. of IEEE/ACM UCC’18. 194–203.Google Scholar
- [174] . 2003. Exploiting punctuation semantics in continuous data streams. IEEE Trans. Knowl. Data Eng. 15, 3 (2003), 555–568.Google ScholarDigital Library
- [175] . 2014. JetStream: Enabling high performance event streaming across cloud data-centers. In Proc. of ACM DEBS’14. 23–34.Google Scholar
- [176] . 2015. Dynamically scaling Apache Storm for the analysis of streaming data. In Proc. of IEEE BigDataService’15. 154–161.Google Scholar
- [177] . 2017. Drizzle: Fast and adaptable stream processing at scale. In Proc. of ACM SOSP’17. 374–389.Google Scholar
- [178] . 2020. Spur: Mitigating slow instances in large-scale streaming pipelines. In Proc. of ACM SIGMOD’20. 2271–2285.Google Scholar
- [179] . 2019a. Elasticutor: Rapid elasticity for realtime stateful stream processing. In Proc. of ACM SIGMOD’19. 573–588.Google Scholar
- [180] . 2017. Model-based scheduling for stream processing systems. In Proc. of IEEE HPCC/SmartCity/DSS’17. 215–222.Google Scholar
- [181] . 2019b. A network-aware and partition-based resource management scheme for data stream processing. In Proc. of ICPP’19. ACM, New York, NY, Article 20, 10 pages.Google Scholar
- [182] . 2019. Pec: Proactive elastic collaborative resource scheduling in data stream processing. IEEE Trans. Parallel Distrib. Syst. 30, 7 (2019), 1628–1642.Google ScholarDigital Library
- [183] . 2019. When FPGA-accelerator meets stream data processing in the edge. In Proc. of IEEE ICDCS’19. 1818–1829.Google Scholar
- [184] . 2018. TurboStream: Towards low-latency data stream processing. In Proc. of IEEE ICDCS’18. 983–993.Google Scholar
- [185] . 2005. Dynamic load distribution in the Borealis stream processor. In Proc. of IEEE ICDE’05. 791–802.Google Scholar
- [186] . 2014. T-storm: Traffic-aware online scheduling in storm. In Proc. of IEEE ICDCS’14. 535–544.Google Scholar
- [187] . 2022. Amnis: Optimized stream processing for edge computing. J. Parallel Distrib. Comput. 160 (2022), 49–64.Google ScholarDigital Library
- [188] . 2016. Stela: Enabling stream processing systems to scale-in and scale-out on-demand. In Proc. of IEEE IC2E’16. 22–31.Google Scholar
- [189] . 2021. Move fast and meet deadlines: Fine-grained real-time stream processing with Cameo. In Proc. of USENIX NSDI’21. 389–405.Google Scholar
- [190] . 2015. Elastic complex event processing exploiting prediction. In Proc. of IEEE Big Data’15. 213–222.Google Scholar
- [191] . 2016. Dynamic load balancing techniques for distributed complex event processing systems. In Distributed Applications and Interoperable Systems. Lecture Notes in Computer Science, Vol. 9687. Springer, 174–188.Google Scholar
- [192] . 2013. Discretized streams: Fault-tolerant streaming computation at scale. In Proc. of ACM SOSP’13. 423–438.Google Scholar
- [193] . 2020. An edge-aware autonomic runtime for data streaming and in-transit processing. Future Gener. Comput. Syst. 110 (2020), 107–118.Google ScholarCross Ref
- [194] . 2020. The NebulaStream platform for data and application management in the Internet of Things. In Proc. of CIDR’20.Google Scholar
- [195] . 2018. AWStream: Adaptive wide-area streaming analytics. In Proc. of ACM SIGCOMM’18. 236–252.Google Scholar
- [196] . 2016. Adaptive block and batch sizing for batched stream processing system. In Proc. of IEEE ICAC’16. 35–44.Google Scholar
- [197] . 2019. Hardware-conscious stream processing: A survey. ACM SIGMOD Rec. 48, 4 (2019), 18–29.Google ScholarDigital Library
- [198] . 2006. Efficient dynamic operator placement in a locally distributed continuous query system. In On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE. Lecture Notes in Computer Science, Vol. 4275. Springer, 54–71.Google Scholar
- [199] . 2013. Multi-query scheduling for time-critical data stream applications. In Proc. of SSDBM’13. ACM, New York, NY, Article 15, 12 pages.Google Scholar
Index Terms
- Runtime Adaptation of Data Stream Processing Systems: The State of the Art
Recommendations
Generic windowing support for extensible stream processing systems
Stream processing applications process high volume, continuous feeds from live data sources, employ data-in-motion analytics to analyze these feeds, and produce near real-time insights with low latency. One of the fundamental characteristics of such ...
Towards Elastic and Sustainable Data Stream Processing on Edge Infrastructure
ICPE '21: Companion of the ACM/SPEC International Conference on Performance EngineeringMuch of the data produced today is processed as it is generated by data stream processing systems. Although the cloud is often the target infrastructure for deploying data stream processing applications, resources located at the edges of the Internet ...
A New Application Benchmark for Data Stream Processing Architectures in an Enterprise Context: Doctoral Symposium
DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based SystemsAgainst the backdrop of ever-growing data volumes and trends like the Internet of Things (IoT) or Industry 4.0, Data Stream Processing Systems (DSPSs) or data stream processing architectures in general receive a greater interest. Continuously analyzing ...
Comments