Skip to main content
Top
Published in: Cluster Computing 2/2024

31-05-2023

Two-stage scheduling for a fluctuant big data stream on heterogeneous servers with multicores in a data center

Authors: Shun Wang, Guo-sun Zeng

Published in: Cluster Computing | Issue 2/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Rapid processing with low-latency and high-throughput is a critical requirement for the applications of big data streams. However, the interferences among stream processing tasks in a data center decrease the utilization of the computational resources and prolong the latency of the tasks. Thus, we study an optimal scheduling method for processing a big data stream on heterogeneous servers with multicores in a data center. We model the big data stream processing and the scheduling problem with four objects or factors which are streaming data items, processing tasks, computational nodes and the cores inside each computational node. An interference model based on regression analysis and a prediction model based on the Autoregressive Integrated Moving Average are presented. Then, we propose a two-stage scheduling method including the fine-grained core scheduling and the coarse-grained node scheduling. In the core scheduling stage, we design a core scheduling algorithm named CS_TDF. In the node scheduling stage, we design a node scheduling algorithm named NS_ITF for a single time window and a continuous scheduling algorithm named PS_UIM for the entire data stream in all time windows. The experimental results show that our scheduling method achieves low interference and high computational resource utilization.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Guo, J., Chang, Z.H., Wang, S., et al.: Who limits the resource efficiency of my datacenter: an analysis of Alibaba datacenter traces. In: The International Symposium, pp. 1–10 (2019) Guo, J., Chang, Z.H., Wang, S., et al.: Who limits the resource efficiency of my datacenter: an analysis of Alibaba datacenter traces. In: The International Symposium, pp. 1–10 (2019)
2.
go back to reference He, K., Meng, X., Pan, Z., et al.: A novel task-duplication based clustering algorithm for heterogeneous computing environments. IEEE Trans. Parallel Distrib. Syst. 30(1), 2–14 (2019)CrossRef He, K., Meng, X., Pan, Z., et al.: A novel task-duplication based clustering algorithm for heterogeneous computing environments. IEEE Trans. Parallel Distrib. Syst. 30(1), 2–14 (2019)CrossRef
3.
go back to reference Gao, G., Xiao, M., Wu, J., et al.: Opportunistic mobile data offloading with deadline constraints. IEEE Trans. Parallel Distrib. Syst. 28(12), 3584–3599 (2017)CrossRef Gao, G., Xiao, M., Wu, J., et al.: Opportunistic mobile data offloading with deadline constraints. IEEE Trans. Parallel Distrib. Syst. 28(12), 3584–3599 (2017)CrossRef
4.
go back to reference Barika, M., Garg, S., Chan, A., et al.: Scheduling algorithms for efficient execution of stream workflow applications in multicloud environments. IEEE Trans. Serv. Comput. 15(2), 860–875 (2022)CrossRef Barika, M., Garg, S., Chan, A., et al.: Scheduling algorithms for efficient execution of stream workflow applications in multicloud environments. IEEE Trans. Serv. Comput. 15(2), 860–875 (2022)CrossRef
5.
go back to reference Zhang, H., Geng, X., Ma, H.: Learning-driven interference-aware workload parallelization for streaming applications in heterogeneous cluster. IEEE Trans. Parallel Distrib. Syst. 32(1), 1–15 (2021)CrossRef Zhang, H., Geng, X., Ma, H.: Learning-driven interference-aware workload parallelization for streaming applications in heterogeneous cluster. IEEE Trans. Parallel Distrib. Syst. 32(1), 1–15 (2021)CrossRef
6.
go back to reference Barika, M., Garg, S., Zomaya, A.Y., et al.: Online scheduling technique to handle data velocity changes in stream workflows. IEEE Trans. Parallel Distrib. Syst. 32(8), 2115–2130 (2021)CrossRef Barika, M., Garg, S., Zomaya, A.Y., et al.: Online scheduling technique to handle data velocity changes in stream workflows. IEEE Trans. Parallel Distrib. Syst. 32(8), 2115–2130 (2021)CrossRef
7.
go back to reference Li, W., Liu, D., Chen, K., et al.: Hone: mitigating stragglers in distributed stream processing with tuple scheduling. IEEE Trans. Parallel Distrib. Syst. 32(8), 2021–2034 (2021)CrossRef Li, W., Liu, D., Chen, K., et al.: Hone: mitigating stragglers in distributed stream processing with tuple scheduling. IEEE Trans. Parallel Distrib. Syst. 32(8), 2021–2034 (2021)CrossRef
8.
go back to reference Liu, S., Weng, J., Wang, J.H., et al.: An adaptive online scheme for scheduling and resource enforcement in Storm. IEEE/ACM Trans. Netw. 27(4), 1373–1386 (2019)CrossRef Liu, S., Weng, J., Wang, J.H., et al.: An adaptive online scheme for scheduling and resource enforcement in Storm. IEEE/ACM Trans. Netw. 27(4), 1373–1386 (2019)CrossRef
9.
go back to reference Fu, M., Mittal, S., Kedigehalli, V., et al.: Streaming@Twitter. IEEE Data Eng. Bull. 38(4), 15–27 (2015) Fu, M., Mittal, S., Kedigehalli, V., et al.: Streaming@Twitter. IEEE Data Eng. Bull. 38(4), 15–27 (2015)
10.
go back to reference Peng B., Hosseini, M., Hong, Z., et al.: R-storm: resource-aware scheduling in storm. In: Proceedings of the 16th Annual Middleware Conference, pp. 149–161 (2015) Peng B., Hosseini, M., Hong, Z., et al.: R-storm: resource-aware scheduling in storm. In: Proceedings of the 16th Annual Middleware Conference, pp. 149–161 (2015)
11.
go back to reference Shukla, A., Simmhan, Y.: Model-driven scheduling for distributed stream processing systems. J. Parallel Distrib. Comput. 117(1), 98–114 (2018)CrossRef Shukla, A., Simmhan, Y.: Model-driven scheduling for distributed stream processing systems. J. Parallel Distrib. Comput. 117(1), 98–114 (2018)CrossRef
13.
go back to reference Heintz, B., Chandra, A., Sitaraman, R.K.: Optimizing timeliness and cost in geo-distributed streaming analytics. IEEE Trans. Cloud Comput. 8(1), 232–245 (2020)CrossRef Heintz, B., Chandra, A., Sitaraman, R.K.: Optimizing timeliness and cost in geo-distributed streaming analytics. IEEE Trans. Cloud Comput. 8(1), 232–245 (2020)CrossRef
14.
go back to reference Sun, D., Gao, S., Liu, X., et al.: A multi-level collaborative framework for elastic stream computing systems. Futur. Gener. Comput. Syst. 128, 117–131 (2022)CrossRef Sun, D., Gao, S., Liu, X., et al.: A multi-level collaborative framework for elastic stream computing systems. Futur. Gener. Comput. Syst. 128, 117–131 (2022)CrossRef
16.
go back to reference Li, H., Dai, H., Liu, Z., et al.: Dynamic energy-efficient scheduling for streaming applications in storm. Computing 104(2), 413–432 (2022)CrossRef Li, H., Dai, H., Liu, Z., et al.: Dynamic energy-efficient scheduling for streaming applications in storm. Computing 104(2), 413–432 (2022)CrossRef
17.
go back to reference KhudaBukhsh, W.R., Kar, S., Alt, B., et al.: Generalized cost-based job scheduling in very large heterogeneous cluster systems. IEEE Trans. Parallel Distrib. Syst. 31(11), 2594–2604 (2020)CrossRef KhudaBukhsh, W.R., Kar, S., Alt, B., et al.: Generalized cost-based job scheduling in very large heterogeneous cluster systems. IEEE Trans. Parallel Distrib. Syst. 31(11), 2594–2604 (2020)CrossRef
18.
go back to reference Liang, W., Hu, C., Wu, M., et al.: A data intensive heuristic approach to the two-stage streaming scheduling problem. J. Comput. Syst. Sci. 89(1), 64–79 (2017)MathSciNetCrossRef Liang, W., Hu, C., Wu, M., et al.: A data intensive heuristic approach to the two-stage streaming scheduling problem. J. Comput. Syst. Sci. 89(1), 64–79 (2017)MathSciNetCrossRef
19.
go back to reference Jin, H., Chen, F., Wu, S., et al.: Towards low-latency batched stream processing by pre-scheduling. IEEE Trans. Parallel Distrib. Syst. 30(3), 710–722 (2018)CrossRef Jin, H., Chen, F., Wu, S., et al.: Towards low-latency batched stream processing by pre-scheduling. IEEE Trans. Parallel Distrib. Syst. 30(3), 710–722 (2018)CrossRef
20.
go back to reference Li, T., Tang, J., Xu, J.: Performance modeling and predictive scheduling for distributed stream data processing. IEEE Trans. Big Data 2(4), 353–364 (2016)CrossRef Li, T., Tang, J., Xu, J.: Performance modeling and predictive scheduling for distributed stream data processing. IEEE Trans. Big Data 2(4), 353–364 (2016)CrossRef
21.
go back to reference Liu, X., Buyya, R.: Performance-oriented deployment of streaming applications on cloud. IEEE Trans. Big Data 5(1), 46–59 (2017)CrossRef Liu, X., Buyya, R.: Performance-oriented deployment of streaming applications on cloud. IEEE Trans. Big Data 5(1), 46–59 (2017)CrossRef
22.
go back to reference Shen, J., Varbanescu, A.L., Lu, Y., et al.: Workload partitioning for accelerating applications on heterogeneous platforms. IEEE Trans. Parallel Distrib. Syst. 27(9), 2766–2780 (2016)CrossRef Shen, J., Varbanescu, A.L., Lu, Y., et al.: Workload partitioning for accelerating applications on heterogeneous platforms. IEEE Trans. Parallel Distrib. Syst. 27(9), 2766–2780 (2016)CrossRef
23.
go back to reference Wei, X., Li, L., Li, X., et al.: Pec: proactive elastic collaborative resource scheduling in data stream processing. IEEE Trans. Parallel Distrib. Syst. 30(7), 1628–1642 (2019)CrossRef Wei, X., Li, L., Li, X., et al.: Pec: proactive elastic collaborative resource scheduling in data stream processing. IEEE Trans. Parallel Distrib. Syst. 30(7), 1628–1642 (2019)CrossRef
24.
go back to reference Min, C., Eom, Y.I.: Dynamic scheduling of irregular stream programs toward many-core scalability. IEEE Trans. Parallel Distrib. Syst. 26(6), 1594–1607 (2015)CrossRef Min, C., Eom, Y.I.: Dynamic scheduling of irregular stream programs toward many-core scalability. IEEE Trans. Parallel Distrib. Syst. 26(6), 1594–1607 (2015)CrossRef
25.
go back to reference Huang, J., Li, R., Wei, Y., et al.: Bi-directional timing-power optimisation on heterogeneous multi-core architectures. IEEE Trans. Sustain. Comput. 6(4), 572–585 (2021)CrossRef Huang, J., Li, R., Wei, Y., et al.: Bi-directional timing-power optimisation on heterogeneous multi-core architectures. IEEE Trans. Sustain. Comput. 6(4), 572–585 (2021)CrossRef
26.
go back to reference Zhao, J.C., Cui, H.M., Xue, J.L., et al.: Predicting cross-core performance interference on multicore processors with regression analysis. IEEE Trans. Parallel Distrib. Syst. 27(5), 1443–1456 (2016)CrossRef Zhao, J.C., Cui, H.M., Xue, J.L., et al.: Predicting cross-core performance interference on multicore processors with regression analysis. IEEE Trans. Parallel Distrib. Syst. 27(5), 1443–1456 (2016)CrossRef
27.
go back to reference Buddhika, T., Stern, R., Lindburg, K., Pallickara, S., et al.: Online scheduling and interference alleviation for low-latency, high-throughput processing of data streams. IEEE Trans. Parallel Distrib. Syst. 28(12), 3553–3569 (2017)CrossRef Buddhika, T., Stern, R., Lindburg, K., Pallickara, S., et al.: Online scheduling and interference alleviation for low-latency, high-throughput processing of data streams. IEEE Trans. Parallel Distrib. Syst. 28(12), 3553–3569 (2017)CrossRef
28.
go back to reference Mars, J., Tang, L.: Chapter 2—understanding application contentiousness and sensitivity on modern multicores. In: Advances in Computers, pp. 59–85 (2013) Mars, J., Tang, L.: Chapter 2—understanding application contentiousness and sensitivity on modern multicores. In: Advances in Computers, pp. 59–85 (2013)
29.
go back to reference Mars, J., Tang, L., Hundt, R., et al.: Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: 44th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 248–259 (2011) Mars, J., Tang, L., Hundt, R., et al.: Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: 44th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 248–259 (2011)
30.
go back to reference Guo, J., Ma, A., Yan, Y., et al.: Application performance prediction method based on cross-core performance interference on multi-core processor. Microprocess. Microsyst. 47(Part A), 112–120 (2016)CrossRef Guo, J., Ma, A., Yan, Y., et al.: Application performance prediction method based on cross-core performance interference on multi-core processor. Microprocess. Microsyst. 47(Part A), 112–120 (2016)CrossRef
31.
go back to reference Babu, C.N., Reddy, B.E.: A moving-average filter based hybrid ARIMA–ANN model for forecasting time series data. Appl. Soft Comput. 23, 27–38 (2014)CrossRef Babu, C.N., Reddy, B.E.: A moving-average filter based hybrid ARIMA–ANN model for forecasting time series data. Appl. Soft Comput. 23, 27–38 (2014)CrossRef
32.
go back to reference Shukla, A., Chaturvedi, S., Simmhan, Y.: RIoTBench: an IoT benchmark for distributed stream processing systems. Concurr. Comput. Pract. Exp. 29(21), 1–22 (2017)CrossRef Shukla, A., Chaturvedi, S., Simmhan, Y.: RIoTBench: an IoT benchmark for distributed stream processing systems. Concurr. Comput. Pract. Exp. 29(21), 1–22 (2017)CrossRef
33.
go back to reference Nechifor, S., Stefan, I., Fischer, M., et al.: Event detection for urban dynamic data streams. In: 2016 IEEE 16th International Conference on Data Mining Workshops, pp. 53–60 (2016) Nechifor, S., Stefan, I., Fischer, M., et al.: Event detection for urban dynamic data streams. In: 2016 IEEE 16th International Conference on Data Mining Workshops, pp. 53–60 (2016)
34.
go back to reference Goyal, P., Kaushik, P., Gupta, P., et al.: Multilevel event detection, storyline generation, and summarization for tweet streams. IEEE Trans. Computat. Soc. Syst. 7(1), 8–23 (2020)CrossRef Goyal, P., Kaushik, P., Gupta, P., et al.: Multilevel event detection, storyline generation, and summarization for tweet streams. IEEE Trans. Computat. Soc. Syst. 7(1), 8–23 (2020)CrossRef
Metadata
Title
Two-stage scheduling for a fluctuant big data stream on heterogeneous servers with multicores in a data center
Authors
Shun Wang
Guo-sun Zeng
Publication date
31-05-2023
Publisher
Springer US
Published in
Cluster Computing / Issue 2/2024
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-023-04044-4

Other articles of this Issue 2/2024

Cluster Computing 2/2024 Go to the issue

Premium Partner