Skip to main content
Top
Published in: The Journal of Supercomputing 12/2020

02-03-2020

Job scheduler for streaming applications in heterogeneous distributed processing systems

Authors: Ali Al-Sinayyid, Michelle Zhu

Published in: The Journal of Supercomputing | Issue 12/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this study, we investigated the problem of scheduling streaming applications on a heterogeneous cluster environment and, based on our previous work, developed the maximum throughput scheduler algorithm (MT-Scheduler) for streaming applications. The proposed algorithm uses a dynamic programming technique to efficiently map the application topology onto the heterogeneous distributed system based on computing and data transfer requirements, while also taking into account the capacity of the underlying cluster resources. The proposed approach maximizes the system throughput by identifying and minimizing the time incurred at the computing/transfer bottleneck. The MT-Scheduler supports scheduling applications structured as a directed acyclic graph. We conducted experiments using three Storm microbenchmark topologies in both simulation and real Apache Storm environments. In terms of the performance evaluation, we compared the proposed MT-Scheduler with the simulated round robin and the default Storm scheduler algorithms. The results indicated that the MT-Scheduler outperforms the default round robin approach in terms of both the average system latency and throughput.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Diasde Assunção M, da Silva Veith A, Buyya R (2018) Distributed data stream processing and edge computing: a survey on resource elasticity and future directions. J Netw Comput Appl 103:1–17CrossRef Diasde Assunção M, da Silva Veith A, Buyya R (2018) Distributed data stream processing and edge computing: a survey on resource elasticity and future directions. J Netw Comput Appl 103:1–17CrossRef
2.
go back to reference Imai S, Patterson S, Varela CA (2017) Maximum sustainable throughput prediction for data stream processing over public clouds. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp 504–513 Imai S, Patterson S, Varela CA (2017) Maximum sustainable throughput prediction for data stream processing over public clouds. In: 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), pp 504–513
3.
go back to reference Khan S, Shakil KA, Alam M (2018) Cloud-based big data analytics—a survey of current research and future directions. In: Aggarwal VB, Bhatnagar V, Mishra DK (eds) Big data analytics, vol 654. Springer Singapore, Singapore, pp 595–604CrossRef Khan S, Shakil KA, Alam M (2018) Cloud-based big data analytics—a survey of current research and future directions. In: Aggarwal VB, Bhatnagar V, Mishra DK (eds) Big data analytics, vol 654. Springer Singapore, Singapore, pp 595–604CrossRef
4.
go back to reference To Q-C, Soto J, Markl V (2018) A survey of state management in big data processing systems. VLDB J 27(6):847–872CrossRef To Q-C, Soto J, Markl V (2018) A survey of state management in big data processing systems. VLDB J 27(6):847–872CrossRef
5.
go back to reference Teixeira FA, Pereira FMQ, Wong H-C, Nogueira JMS, Oliveira LB (2019) SIoT: securing internet of things through distributed systems analysis. Future Gener Comput Syst 92:1172–1186CrossRef Teixeira FA, Pereira FMQ, Wong H-C, Nogueira JMS, Oliveira LB (2019) SIoT: securing internet of things through distributed systems analysis. Future Gener Comput Syst 92:1172–1186CrossRef
6.
go back to reference Caneill M, El Rheddane A, Leroy V, De Palma N (2016) Locality-aware routing in stateful streaming applications. In: Proceedings of the 17th International Middleware Conference on—Middleware ’16, Trento, Italy, pp 1–13 Caneill M, El Rheddane A, Leroy V, De Palma N (2016) Locality-aware routing in stateful streaming applications. In: Proceedings of the 17th International Middleware Conference on—Middleware ’16, Trento, Italy, pp 1–13
7.
go back to reference Yi S, Li C, Li Q (2015) A survey of fog computing: concepts, applications and issues. In: Proceedings of the 2015 Workshop on Mobile Big Data—Mobidata’15, Hangzhou, China, pp 37–4 Yi S, Li C, Li Q (2015) A survey of fog computing: concepts, applications and issues. In: Proceedings of the 2015 Workshop on Mobile Big Data—Mobidata’15, Hangzhou, China, pp 37–4
8.
go back to reference Jansen G, Verbitskiy I, Renner T, Thamsen L (2018) Scheduling stream processing tasks on geo-distributed heterogeneous resources. In: 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, pp 5159–5164 Jansen G, Verbitskiy I, Renner T, Thamsen L (2018) Scheduling stream processing tasks on geo-distributed heterogeneous resources. In: 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, pp 5159–5164
9.
go back to reference Zhu M, Wu Q, Rao NSV, Iyengar S (2007) Optimal pipeline decomposition and adaptive network mapping to support distributed remote visualization. J Parallel Distrib Comput 67(8):947–956CrossRef Zhu M, Wu Q, Rao NSV, Iyengar S (2007) Optimal pipeline decomposition and adaptive network mapping to support distributed remote visualization. J Parallel Distrib Comput 67(8):947–956CrossRef
10.
go back to reference Wu Q, Zhu M, Gu Y, Rao NSV (2010) System design and algorithmic development for computational steering in distributed environments. IEEE Trans Parallel Distrib Syst 21(4):438–451CrossRef Wu Q, Zhu M, Gu Y, Rao NSV (2010) System design and algorithmic development for computational steering in distributed environments. IEEE Trans Parallel Distrib Syst 21(4):438–451CrossRef
11.
go back to reference Blum L, Shub M, Smale S (1988) On a theory of computation over the real numbers; NP-completeness, recursive functions and universal machines. In: Proceedings 1988 29th Annual Symposium on Foundations of Computer Science, pp 387–397 Blum L, Shub M, Smale S (1988) On a theory of computation over the real numbers; NP-completeness, recursive functions and universal machines. In: Proceedings 1988 29th Annual Symposium on Foundations of Computer Science, pp 387–397
12.
go back to reference Xue J, Yang Z, Hou S, Dai Y (2015) When computing meets heterogeneous cluster: workload assignment in graph computation. In: 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, pp 154–163 Xue J, Yang Z, Hou S, Dai Y (2015) When computing meets heterogeneous cluster: workload assignment in graph computation. In: 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, pp 154–163
13.
go back to reference Aljoby WAY, Fu TZJ, Ma RTB (2017) Impacts of task placement and bandwidth allocation on stream analytics. In: 2017 IEEE 25th International Conference on Network Protocols (ICNP), Toronto, ON, pp 1–6 Aljoby WAY, Fu TZJ, Ma RTB (2017) Impacts of task placement and bandwidth allocation on stream analytics. In: 2017 IEEE 25th International Conference on Network Protocols (ICNP), Toronto, ON, pp 1–6
14.
go back to reference Kaur N, Sood SK (2017) Dynamic resource allocation for big data streams based on data characteristics (5Vs). Int J Netw Manag 27(4):e1978CrossRef Kaur N, Sood SK (2017) Dynamic resource allocation for big data streams based on data characteristics (5Vs). Int J Netw Manag 27(4):e1978CrossRef
15.
go back to reference Mortazavi-Dehkordi M, Zamanifar K (2019) Efficient resource scheduling for the analysis of Big Data streams. Intell Data Anal 23(1):77–102CrossRef Mortazavi-Dehkordi M, Zamanifar K (2019) Efficient resource scheduling for the analysis of Big Data streams. Intell Data Anal 23(1):77–102CrossRef
16.
go back to reference Vasile M-A, Pop F, Tutueanu R-I, Cristea V, Kołodziej J (2015) Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gener Comput Syst 51:61–71CrossRef Vasile M-A, Pop F, Tutueanu R-I, Cristea V, Kołodziej J (2015) Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gener Comput Syst 51:61–71CrossRef
17.
go back to reference Qian Z et al. (2013) Timestream: reliable stream computation in the cloud. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp 1–14 Qian Z et al. (2013) Timestream: reliable stream computation in the cloud. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp 1–14
18.
go back to reference Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. Proc VLDB Endow 6(11):1033–1044CrossRef Akidau T et al (2013) MillWheel: fault-tolerant stream processing at internet scale. Proc VLDB Endow 6(11):1033–1044CrossRef
19.
go back to reference Neumeyer L, Robbins B, Nair A, Kesari A (2010) S4: distributed stream computing platform. In: 2010 IEEE International Conference on Data Mining Workshops, pp 170–177 Neumeyer L, Robbins B, Nair A, Kesari A (2010) S4: distributed stream computing platform. In: 2010 IEEE International Conference on Data Mining Workshops, pp 170–177
20.
go back to reference Fu M et al (2017) Twitter Heron: towards extensible streaming engines. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp 1165–1172 Fu M et al (2017) Twitter Heron: towards extensible streaming engines. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp 1165–1172
25.
go back to reference Peng B, Hosseini M, Hong Z, Farivar R, Campbell R (2015) R-Storm: resource-aware scheduling in storm. In: Proceedings of the 16th Annual Middleware Conference on—Middleware ’15, Vancouver, BC, Canada, pp 149–161 Peng B, Hosseini M, Hong Z, Farivar R, Campbell R (2015) R-Storm: resource-aware scheduling in storm. In: Proceedings of the 16th Annual Middleware Conference on—Middleware ’15, Vancouver, BC, Canada, pp 149–161
26.
go back to reference Xu J, Chen Z, Tang J, Su S (2014) T-Storm: traffic-aware [Online] scheduling in Storm. In: 2014 IEEE 34th International Conference on Distributed Computing Systems, pp 535–544 Xu J, Chen Z, Tang J, Su S (2014) T-Storm: traffic-aware [Online] scheduling in Storm. In: 2014 IEEE 34th International Conference on Distributed Computing Systems, pp 535–544
27.
go back to reference Li T, Tang J, Xu J (2015) A predictive scheduling framework for fast and distributed stream data processing. In: 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, pp 333–338 Li T, Tang J, Xu J (2015) A predictive scheduling framework for fast and distributed stream data processing. In: 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, pp 333–338
28.
go back to reference Eskandari L, Mair J, Huang Z, Eyers D (2018) T3-Scheduler: a topology and traffic aware two-level Scheduler for stream processing systems in a heterogeneous cluster. Future Gener Comput Syst 89:617–632CrossRef Eskandari L, Mair J, Huang Z, Eyers D (2018) T3-Scheduler: a topology and traffic aware two-level Scheduler for stream processing systems in a heterogeneous cluster. Future Gener Comput Syst 89:617–632CrossRef
29.
go back to reference Aniello L, Baldoni R, Querzoni L (2013) Adaptive [Online] scheduling in Storm. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems—DEBS ’13, Arlington, Texas, USA, p 207 Aniello L, Baldoni R, Querzoni L (2013) Adaptive [Online] scheduling in Storm. In: Proceedings of the 7th ACM International Conference on Distributed Event-Based Systems—DEBS ’13, Arlington, Texas, USA, p 207
31.
32.
go back to reference Sliwko L (2019) A taxonomy of schedulers—operating systems, clusters and big data frameworks. Glob J Comput Sci Technol 19:25–40CrossRef Sliwko L (2019) A taxonomy of schedulers—operating systems, clusters and big data frameworks. Glob J Comput Sci Technol 19:25–40CrossRef
33.
34.
go back to reference Liu J, Pacitti E, Valduriez P (2018) A survey of scheduling frameworks in big data systems, p 28 Liu J, Pacitti E, Valduriez P (2018) A survey of scheduling frameworks in big data systems, p 28
35.
go back to reference Rychly M, Koda P, Mr P (2014) Scheduling decisions in stream processing on heterogeneous clusters. In: 2014 Eighth International Conference on Complex, Intelligent and Software Intensive Systems, Birmingham, UK, pp 614–619 Rychly M, Koda P, Mr P (2014) Scheduling decisions in stream processing on heterogeneous clusters. In: 2014 Eighth International Conference on Complex, Intelligent and Software Intensive Systems, Birmingham, UK, pp 614–619
36.
go back to reference Cardellini V, Lo Presti F, Nardelli M, Russo Russo G (2018) Optimal operator deployment and replication for elastic distributed data stream processing: optimal deployment and replication for elastic data stream processing. Concurr Comput Pract Exp 30(9):e4334CrossRef Cardellini V, Lo Presti F, Nardelli M, Russo Russo G (2018) Optimal operator deployment and replication for elastic distributed data stream processing: optimal deployment and replication for elastic data stream processing. Concurr Comput Pract Exp 30(9):e4334CrossRef
37.
go back to reference Cardellini V, Grassi V, Lo Presti F, Nardelli M (2016) Optimal operator placement for distributed stream processing applications. In: Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems—DEBS ’16, Irvine, California, pp 69–80 Cardellini V, Grassi V, Lo Presti F, Nardelli M (2016) Optimal operator placement for distributed stream processing applications. In: Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems—DEBS ’16, Irvine, California, pp 69–80
38.
go back to reference Nardelli M, Cardellini V, Grassi V, Presti FL (2019) Efficient operator placement for distributed data stream processing applications. IEEE Trans Parallel Distrib Syst 30(8):1753–1767CrossRef Nardelli M, Cardellini V, Grassi V, Presti FL (2019) Efficient operator placement for distributed data stream processing applications. IEEE Trans Parallel Distrib Syst 30(8):1753–1767CrossRef
39.
go back to reference Nardelli M (2018) QoS-aware deployment and adaptation of data stream processing applications in geo-distributed environments. Ph.D. thesis, University of Rome Tor Vergata Nardelli M (2018) QoS-aware deployment and adaptation of data stream processing applications in geo-distributed environments. Ph.D. thesis, University of Rome Tor Vergata
40.
go back to reference Li C, Zhang J, Luo Y (2017) Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of Storm. J Netw Comput Appl 87:100–115CrossRef Li C, Zhang J, Luo Y (2017) Real-time scheduling based on optimized topology and communication traffic in distributed real-time computation platform of Storm. J Netw Comput Appl 87:100–115CrossRef
41.
go back to reference Zhang W, Li S, Liu L, Jia Z, Zhang Y, Raychaudhuri D (2019) Hetero-edge: orchestration of real-time vision applications on heterogeneous edge clouds. In: IEEE INFOCOM 2019—IEEE Conference on Computer Communications, Paris, France, pp 1270–1278 Zhang W, Li S, Liu L, Jia Z, Zhang Y, Raychaudhuri D (2019) Hetero-edge: orchestration of real-time vision applications on heterogeneous edge clouds. In: IEEE INFOCOM 2019—IEEE Conference on Computer Communications, Paris, France, pp 1270–1278
42.
go back to reference Liu S, Weng J, Wang JH, An C, Zhou Y, Wang J (2019) An adaptive [online] scheme for scheduling and resource enforcement in storm. IEEE ACM Trans Netw 27:1373–1386CrossRef Liu S, Weng J, Wang JH, An C, Zhou Y, Wang J (2019) An adaptive [online] scheme for scheduling and resource enforcement in storm. IEEE ACM Trans Netw 27:1373–1386CrossRef
43.
go back to reference Shukla A, Simmhan Y (2018) Model-driven scheduling for distributed stream processing systems. J Parallel Distrib Comput 117:98–114CrossRef Shukla A, Simmhan Y (2018) Model-driven scheduling for distributed stream processing systems. J Parallel Distrib Comput 117:98–114CrossRef
44.
go back to reference Kombi RK, Lumineau N, Lamarre P, Rivetti N, Busnel Y (2019) DABS-Storm: a data-aware approach for elastic stream processing. In: Hameurlain A, Wagner R, Morvan F, Tamine L (eds) Transactions on large-scale data- and knowledge-centered systems XL. vol 11360. Springer, Berlin, pp 58–93CrossRef Kombi RK, Lumineau N, Lamarre P, Rivetti N, Busnel Y (2019) DABS-Storm: a data-aware approach for elastic stream processing. In: Hameurlain A, Wagner R, Morvan F, Tamine L (eds) Transactions on large-scale data- and knowledge-centered systems XL. vol 11360. Springer, Berlin, pp 58–93CrossRef
45.
go back to reference Liu X, Buyya R (2017) D-Storm: dynamic resource-efficient scheduling of stream processing applications. In: 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS), Shenzhen, pp 485–492 Liu X, Buyya R (2017) D-Storm: dynamic resource-efficient scheduling of stream processing applications. In: 2017 IEEE 23rd International Conference on Parallel and Distributed Systems (ICPADS), Shenzhen, pp 485–492
48.
go back to reference Al-Sinayyid A,Zhu M (2018) Maximizing the processing rate for streaming applications in Apache Storm. In: Proceedings of the 14th International Conference on Data Science (ICDATA’18) Al-Sinayyid A,Zhu M (2018) Maximizing the processing rate for streaming applications in Apache Storm. In: Proceedings of the 14th International Conference on Data Science (ICDATA’18)
Metadata
Title
Job scheduler for streaming applications in heterogeneous distributed processing systems
Authors
Ali Al-Sinayyid
Michelle Zhu
Publication date
02-03-2020
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 12/2020
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03223-z

Other articles of this Issue 12/2020

The Journal of Supercomputing 12/2020 Go to the issue

Premium Partner