Skip to main content
Erschienen in: The Journal of Supercomputing 11/2022

16.03.2022

Toward optimal operator parallelism for stream processing topology with limited buffers

verfasst von: Wenhao Li, Zhan Zhang, Yanjun Shu, Hongwei Liu, Tianming Liu

Erschienen in: The Journal of Supercomputing | Ausgabe 11/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Stream processing is an emerging in-memory computing paradigm to handle massive amounts of real-time data. It is vital to have a mechanism to propose proper parallelism for the operators to handle streaming data efficiently. Previous research has mostly focused on parallelism optimization with infinite buffers; however, the topology’s quality of service is severely affected by network buffers. Thus, in this paper, we introduce an extended queueing network to model the relationship between the parallelism and tuple’s average sojourn time with limited buffers. Based on this model, we also propose greedy algorithms to calculate the optimal parallelism for both the minimum latency and maximum throughput with resource constraints. To fairly evaluate the performance of different models, a random parameter generator for the streaming topology is presented. Experiments show that the extended queuing model may properly forecast performance. Compared to the state-of-the-art method, the proposed algorithms reduce the median total sojourn time by 3.74 times and increase the average maximum sustainable throughput by 1.69 times.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Iqbal MH, Soomro TR (2015) Big data analysis: apache storm perspective. Int J Comput Trends Technol 19:9–14CrossRef Iqbal MH, Soomro TR (2015) Big data analysis: apache storm perspective. Int J Comput Trends Technol 19:9–14CrossRef
2.
Zurück zum Zitat Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache Flink: Stream and Batch Processing in a Single Engine. Bull IEEE Comput Soc Tech Committee Data Eng 36(4):28–38 Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache Flink: Stream and Batch Processing in a Single Engine. Bull IEEE Comput Soc Tech Committee Data Eng 36(4):28–38
3.
Zurück zum Zitat Liu X (2018) Robust resource management in distributed stream processing systems. Doctoral dissertation Liu X (2018) Robust resource management in distributed stream processing systems. Doctoral dissertation
4.
Zurück zum Zitat Cervino J, Kalyvianaki E, Salvachua J, Pietzuch P (2012) Adaptive provisioning of stream processing systems in the cloud. In: 2012 IEEE 28th International Conference on Data Engineering Workshops. IEEE, pp 295–301 Cervino J, Kalyvianaki E, Salvachua J, Pietzuch P (2012) Adaptive provisioning of stream processing systems in the cloud. In: 2012 IEEE 28th International Conference on Data Engineering Workshops. IEEE, pp 295–301
5.
Zurück zum Zitat Lohrmann B, Warneke D, Kao O (2012) Massively-parallel stream processing under QoS constraints with nephele. In: Proceedings of the 21st international symposium on high-performance parallel and distributed computing, pp 271–282 Lohrmann B, Warneke D, Kao O (2012) Massively-parallel stream processing under QoS constraints with nephele. In: Proceedings of the 21st international symposium on high-performance parallel and distributed computing, pp 271–282
6.
Zurück zum Zitat Wilmanns PS, Geuns SJ, Hausmans JP, Bekooij MJ (2015) Buffer sizing to reduce interference and increase throughput of real-time stream processing applications. In: 2015 IEEE 18th international symposium on real-time distributed computing. IEEE, pp 9–18 Wilmanns PS, Geuns SJ, Hausmans JP, Bekooij MJ (2015) Buffer sizing to reduce interference and increase throughput of real-time stream processing applications. In: 2015 IEEE 18th international symposium on real-time distributed computing. IEEE, pp 9–18
7.
Zurück zum Zitat Mudassar M, Zhai Y, Liao L (2019) Efficient state management for scaling out stateful operators in stream processing systems. Big data 7(3):192–206CrossRef Mudassar M, Zhai Y, Liao L (2019) Efficient state management for scaling out stateful operators in stream processing systems. Big data 7(3):192–206CrossRef
8.
Zurück zum Zitat Gulisano V, Jimenez-Peris R, Patino-Martinez M, Soriente C, Valduriez P (2012) Streamcloud: an elastic and scalable data streaming system. IEEE Trans Parallel Distrib Syst 23(12):2351–2365CrossRef Gulisano V, Jimenez-Peris R, Patino-Martinez M, Soriente C, Valduriez P (2012) Streamcloud: an elastic and scalable data streaming system. IEEE Trans Parallel Distrib Syst 23(12):2351–2365CrossRef
9.
Zurück zum Zitat Lombardi F, Aniello L, Bonomi S, Querzoni L (2017) Elastic symbiotic scaling of operators and resources in stream processing systems. IEEE Trans Parallel Distrib Syst 29(3):572–585CrossRef Lombardi F, Aniello L, Bonomi S, Querzoni L (2017) Elastic symbiotic scaling of operators and resources in stream processing systems. IEEE Trans Parallel Distrib Syst 29(3):572–585CrossRef
10.
Zurück zum Zitat Kombi RK, Lumineau N, Lamarre P (2017) A preventive auto-parallelization approach for elastic stream processing. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 1532–1542 Kombi RK, Lumineau N, Lamarre P (2017) A preventive auto-parallelization approach for elastic stream processing. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 1532–1542
11.
Zurück zum Zitat Marangozova-Martin V, De Palma N, El Rheddane A (2019) Multi-level elasticity for data stream processing. IEEE Trans Parallel Distrib Syst 30(10):2326–2337CrossRef Marangozova-Martin V, De Palma N, El Rheddane A (2019) Multi-level elasticity for data stream processing. IEEE Trans Parallel Distrib Syst 30(10):2326–2337CrossRef
12.
Zurück zum Zitat Sahni J, Vidyarthi DP (2021) Heterogeneity-aware elastic scaling of streaming applications on cloud platforms. J Supercomput 1–28 Sahni J, Vidyarthi DP (2021) Heterogeneity-aware elastic scaling of streaming applications on cloud platforms. J Supercomput 1–28
13.
Zurück zum Zitat Kahveci B, Gedik B (2020) Joker: elastic stream processing with organic adaptation. J Parallel Distrib Comput 137:205–223CrossRef Kahveci B, Gedik B (2020) Joker: elastic stream processing with organic adaptation. J Parallel Distrib Comput 137:205–223CrossRef
14.
Zurück zum Zitat Gedik B, Schneider S, Hirzel M, Wu KL (2013) Elastic scaling for data stream processing. IEEE Trans Parallel Distrib Syst 25(6):1447–1463CrossRef Gedik B, Schneider S, Hirzel M, Wu KL (2013) Elastic scaling for data stream processing. IEEE Trans Parallel Distrib Syst 25(6):1447–1463CrossRef
15.
Zurück zum Zitat Floratou A, Agrawal A, Graham B, Rao S, Ramasamy K (2017) Dhalion: self-regulating stream processing in heron. Proc VLDB Endow 10(12):1825–1836CrossRef Floratou A, Agrawal A, Graham B, Rao S, Ramasamy K (2017) Dhalion: self-regulating stream processing in heron. Proc VLDB Endow 10(12):1825–1836CrossRef
16.
Zurück zum Zitat Xu L, Peng B, Gupta I (2016) Stela: enabling stream processing systems to scale-in and scale-out on-demand. In: 2016 IEEE International Conference on Cloud Engineering (IC2E). IEEE, pp 22–31 Xu L, Peng B, Gupta I (2016) Stela: enabling stream processing systems to scale-in and scale-out on-demand. In: 2016 IEEE International Conference on Cloud Engineering (IC2E). IEEE, pp 22–31
17.
Zurück zum Zitat Zacheilas N, Kalogeraki V, Zygouras N, Panagiotou N, Gunopulos D (2015) Elastic complex event processing exploiting prediction. In: 2015 IEEE International Conference on Big Data (Big Data). IEEE, pp 213–222 Zacheilas N, Kalogeraki V, Zygouras N, Panagiotou N, Gunopulos D (2015) Elastic complex event processing exploiting prediction. In: 2015 IEEE International Conference on Big Data (Big Data). IEEE, pp 213–222
18.
Zurück zum Zitat Wang C, Meng X, Guo Q, Weng Z, Yang C (2017) Automating characterization deployment in distributed data stream management systems. IEEE Trans Knowl Data Eng 29(12):2669–2681CrossRef Wang C, Meng X, Guo Q, Weng Z, Yang C (2017) Automating characterization deployment in distributed data stream management systems. IEEE Trans Knowl Data Eng 29(12):2669–2681CrossRef
19.
Zurück zum Zitat Yang Y, Zhao L, Li Z, Nie L, Chen P, Li K (2019) ElaX: provisioning resource elastically for containerized online cloud services. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, pp 1987–1994 Yang Y, Zhao L, Li Z, Nie L, Chen P, Li K (2019) ElaX: provisioning resource elastically for containerized online cloud services. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, pp 1987–1994
20.
Zurück zum Zitat Foroni D, Axenie C, Bortoli S, Al Hajj Hassan M, Acker R, Tudoran R, Velegrakis Y (2018) Moira: a goal-oriented incremental machine learning approach to dynamic resource cost estimation in distributed stream processing systems. In: Proceedings of the international workshop on real-time business intelligence and analytics, pp 1–10 Foroni D, Axenie C, Bortoli S, Al Hajj Hassan M, Acker R, Tudoran R, Velegrakis Y (2018) Moira: a goal-oriented incremental machine learning approach to dynamic resource cost estimation in distributed stream processing systems. In: Proceedings of the international workshop on real-time business intelligence and analytics, pp 1–10
21.
Zurück zum Zitat Lombardi F, Muti A, Aniello L, Baldoni R, Bonomi S, Querzoni L (2019) PASCAL: an architecture for proactive auto-scaling of distributed services. Futur Gener Comput Syst 98:342–361CrossRef Lombardi F, Muti A, Aniello L, Baldoni R, Bonomi S, Querzoni L (2019) PASCAL: an architecture for proactive auto-scaling of distributed services. Futur Gener Comput Syst 98:342–361CrossRef
22.
Zurück zum Zitat Liu N, Li Z, Xu J, Xu Z, Lin S, Qiu Q, Wang Y (2017) A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 372–382 Liu N, Li Z, Xu J, Xu Z, Lin S, Qiu Q, Wang Y (2017) A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, pp 372–382
23.
Zurück zum Zitat Cardellini V, Presti FL, Nardelli M, Russo GR (2017) Auto-scaling in data stream processing applications: a model-based reinforcement learning approach. In: Workshop on new frontiers in quantitative methods in informatics. Springer, Cham, pp 97–110 Cardellini V, Presti FL, Nardelli M, Russo GR (2017) Auto-scaling in data stream processing applications: a model-based reinforcement learning approach. In: Workshop on new frontiers in quantitative methods in informatics. Springer, Cham, pp 97–110
24.
Zurück zum Zitat Rossi F, Nardelli M, Cardellini V (2019) Horizontal and vertical scaling of container-based applications using reinforcement learning. In: 2019 IEEE 12th International Conference on Cloud Computing (CLOUD). IEEE, pp 329–338 Rossi F, Nardelli M, Cardellini V (2019) Horizontal and vertical scaling of container-based applications using reinforcement learning. In: 2019 IEEE 12th International Conference on Cloud Computing (CLOUD). IEEE, pp 329–338
25.
Zurück zum Zitat De Matteis T, Mencagli G (2016) Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing. ACM SIGPLAN Not 51(8):1–12CrossRef De Matteis T, Mencagli G (2016) Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing. ACM SIGPLAN Not 51(8):1–12CrossRef
26.
Zurück zum Zitat Farahabady MRH, Zomaya AY, Tari Z (2017) QoS-and contention-aware resource provisioning in a stream processing engine. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 137–146 Farahabady MRH, Zomaya AY, Tari Z (2017) QoS-and contention-aware resource provisioning in a stream processing engine. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, pp 137–146
27.
Zurück zum Zitat Wei X, Li L, Li X, Wang X, Gao S, Li H (2019) Pec: Proactive elastic collaborative resource scheduling in data stream processing. IEEE Trans Parallel Distrib Syst 30(7):1628–1642CrossRef Wei X, Li L, Li X, Wang X, Gao S, Li H (2019) Pec: Proactive elastic collaborative resource scheduling in data stream processing. IEEE Trans Parallel Distrib Syst 30(7):1628–1642CrossRef
28.
Zurück zum Zitat Nasir MAU, Morales GDF, Garcia-Soriano D, Kourtellis N, Serafini M (2015) The power of both choices: practical load balancing for distributed stream processing engines. In: 2015 IEEE 31st International Conference on Data Engineering. IEEE, pp 137–148 Nasir MAU, Morales GDF, Garcia-Soriano D, Kourtellis N, Serafini M (2015) The power of both choices: practical load balancing for distributed stream processing engines. In: 2015 IEEE 31st International Conference on Data Engineering. IEEE, pp 137–148
29.
Zurück zum Zitat Calzarossa MC, Massari L, Tessera D (2016) Workload characterization: a survey revisited. ACM Comput Surv 48(3):1–43CrossRef Calzarossa MC, Massari L, Tessera D (2016) Workload characterization: a survey revisited. ACM Comput Surv 48(3):1–43CrossRef
30.
Zurück zum Zitat Kerbache L, Smith JM (1988) Asymptotic behavior of the expansion method for open finite queueing networks. Comput Oper Res 15(2):157–169CrossRef Kerbache L, Smith JM (1988) Asymptotic behavior of the expansion method for open finite queueing networks. Comput Oper Res 15(2):157–169CrossRef
31.
Zurück zum Zitat Bhat UN (2015) An introduction to queueing theory: modeling and analysis in applications. Birkhäuser, BaselCrossRef Bhat UN (2015) An introduction to queueing theory: modeling and analysis in applications. Birkhäuser, BaselCrossRef
32.
Zurück zum Zitat Labetoulle J, Pujolle G (1980) Isolation method in a network of queues. IEEE Trans Softw Eng 4:373–381CrossRef Labetoulle J, Pujolle G (1980) Isolation method in a network of queues. IEEE Trans Softw Eng 4:373–381CrossRef
33.
Zurück zum Zitat Grassmann WK (1977) Transient solutions in Markovian queueing systems. Comput Oper Res 4(1):47–53CrossRef Grassmann WK (1977) Transient solutions in Markovian queueing systems. Comput Oper Res 4(1):47–53CrossRef
34.
Zurück zum Zitat Bitran GR, Morabito R (1996) State-of-the-art survey: open queueing networks: optimization and performance evaluation models for discrete manufacturing systems. Prod Oper Manag 5(2):163–193CrossRef Bitran GR, Morabito R (1996) State-of-the-art survey: open queueing networks: optimization and performance evaluation models for discrete manufacturing systems. Prod Oper Manag 5(2):163–193CrossRef
35.
Zurück zum Zitat Liu X, Dastjerdi AV, Calheiros RN, Qu C, Buyya R (2017) A stepwise auto-profiling method for performance optimization of streaming applications. ACM Trans Auton Adapt Syst (TAAS) 12(4):1–33 Liu X, Dastjerdi AV, Calheiros RN, Qu C, Buyya R (2017) A stepwise auto-profiling method for performance optimization of streaming applications. ACM Trans Auton Adapt Syst (TAAS) 12(4):1–33
36.
Zurück zum Zitat Fu TZ, Ding J, Ma RT, Winslett M, Yang Y, Zhang Z (2017) DRS: auto-scaling for real-time stream analytics. IEEE/ACM Trans Netw 25(6):3338–3352CrossRef Fu TZ, Ding J, Ma RT, Winslett M, Yang Y, Zhang Z (2017) DRS: auto-scaling for real-time stream analytics. IEEE/ACM Trans Netw 25(6):3338–3352CrossRef
37.
Zurück zum Zitat Chu Z, Yu J, Hamdull A (2020) Maximum sustainable throughput evaluation using an adaptive method for stream processing platforms. IEEE Access 8:40977–40988CrossRef Chu Z, Yu J, Hamdull A (2020) Maximum sustainable throughput evaluation using an adaptive method for stream processing platforms. IEEE Access 8:40977–40988CrossRef
38.
Zurück zum Zitat Röger H, Mayer R (2019) A comprehensive survey on parallelization and elasticity in stream processing. ACM Comput Surv (CSUR) 52(2):1–37CrossRef Röger H, Mayer R (2019) A comprehensive survey on parallelization and elasticity in stream processing. ACM Comput Surv (CSUR) 52(2):1–37CrossRef
39.
Zurück zum Zitat Agnihotri P (2021) Autonomous resource management in distributed stream processing systems. In: Proceedings of the 22nd International Middleware Conference: Doctoral Symposium, pp 19–22 Agnihotri P (2021) Autonomous resource management in distributed stream processing systems. In: Proceedings of the 22nd International Middleware Conference: Doctoral Symposium, pp 19–22
Metadaten
Titel
Toward optimal operator parallelism for stream processing topology with limited buffers
verfasst von
Wenhao Li
Zhan Zhang
Yanjun Shu
Hongwei Liu
Tianming Liu
Publikationsdatum
16.03.2022
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 11/2022
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-022-04376-9

Weitere Artikel der Ausgabe 11/2022

The Journal of Supercomputing 11/2022 Zur Ausgabe

Premium Partner