Skip to main content
Top
Published in: The Journal of Supercomputing 10/2021

26-03-2021

A frequency-aware and energy-saving strategy based on DVFS for Spark

Authors: Hongjian Li, Yaojun Wei, Yu Xiong, Enjie Ma, Wenhong Tian

Published in: The Journal of Supercomputing | Issue 10/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the fast growth of big data applications, it has brought about a huge increase in the energy consumption for big data processing in Cloud data centers. In this study, a frequency-aware and energy-saving strategy based on dynamic voltage and frequency scaling (abbreviated as FAESS-DVFS) is proposed to reduce energy consumption for big data processing in Spark on YARN. Energy saving in two layers (YARN layer and Spark layer) has been designed and implemented for the proposed method. First, an optimal CPU frequency is presented in YARN layer based on the minimum energy efficiency ratio (EER) which can be obtained from status monitoring module. Then, a task scheduling method in Spark layer is constructed to optimize the energy consumption by dynamically adjusting the CPU frequency of nodes in the life cycle of different stages. Test on Hibench, the proposed method can achieve substantial energy saving of up to 29.5% for big data processing compared with the default algorithm in Spark on YARN while satisfying SLA constrains.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Xu L, Yu X, Gulliver A (2021) Intelligent outage probability prediction for mobile IoT networks based on an IGWO-Elman Neural Network.IEEE Transactions on Vehicular Technology PP(99):1–1. Xu L, Yu X, Gulliver A (2021) Intelligent outage probability prediction for mobile IoT networks based on an IGWO-Elman Neural Network.IEEE Transactions on Vehicular Technology PP(99):1–1.
2.
go back to reference Jlassi A, Martineau P (2016) Virtualization technologies for the big data environment. In: Proceedings of the 31st annual ACM symposium on applied computing, ACM, pp 542–545. Jlassi A, Martineau P (2016) Virtualization technologies for the big data environment. In: Proceedings of the 31st annual ACM symposium on applied computing, ACM, pp 542–545.
3.
go back to reference Kansal NJ, Chana I (2016) Energy-aware virtual machine migration for Cloud computing-a firefly optimization approach. J Grid Comput 14(2):327–345CrossRef Kansal NJ, Chana I (2016) Energy-aware virtual machine migration for Cloud computing-a firefly optimization approach. J Grid Comput 14(2):327–345CrossRef
4.
go back to reference Liu Q, Ma Y, Alhussein M, Zhang Y, Peng L (2016) Green data center with iot sensing and Cloud-assisted smart temperature control system. Comput Netw 101:104–112CrossRef Liu Q, Ma Y, Alhussein M, Zhang Y, Peng L (2016) Green data center with iot sensing and Cloud-assisted smart temperature control system. Comput Netw 101:104–112CrossRef
5.
go back to reference Ilager S, Ramamohanarao K, Buyya R (2021) Thermal prediction for efficient energy management of clouds using machine learning. IEEE Trans Parallel Distrib Syst 32(5):1044–1056CrossRef Ilager S, Ramamohanarao K, Buyya R (2021) Thermal prediction for efficient energy management of clouds using machine learning. IEEE Trans Parallel Distrib Syst 32(5):1044–1056CrossRef
6.
go back to reference Arroba P, Moya JM, Ayala JL, BuyyaR, (2017) Dynamic voltage and frequency scaling- aware dynamic consolidation of virtual machines for energy efficient Cloud data centers. Concurr Comput: Practice Exp 29(10):e4067CrossRef Arroba P, Moya JM, Ayala JL, BuyyaR, (2017) Dynamic voltage and frequency scaling- aware dynamic consolidation of virtual machines for energy efficient Cloud data centers. Concurr Comput: Practice Exp 29(10):e4067CrossRef
7.
go back to reference Jiang J, Lin Y, Xie G, Fu L, Yang J (2017) Time and energy optimization algorithms for the static scheduling of multiple workflows in heterogeneous computing system. J Grid Comput 15(4):435–456CrossRef Jiang J, Lin Y, Xie G, Fu L, Yang J (2017) Time and energy optimization algorithms for the static scheduling of multiple workflows in heterogeneous computing system. J Grid Comput 15(4):435–456CrossRef
8.
go back to reference MTI I, SNSA B, SK A, RB A (2020) Cost-efficient dynamic scheduling of big data applications in apache spark on cloud. J Syst Softw 162. MTI I, SNSA B, SK A, RB A (2020) Cost-efficient dynamic scheduling of big data applications in apache spark on cloud. J Syst Softw 162.
9.
go back to reference Guo W, Huang C,Tian W (2020) Handling data skew at reduce stage in spark by reducepartition. Concurr Comput: Practice Exp 32(9). Guo W, Huang C,Tian W (2020) Handling data skew at reduce stage in spark by reducepartition. Concurr Comput: Practice Exp 32(9).
10.
go back to reference Tian W, Li G, Yang W, Buyya R (2016) Hscheduler: an optimal approach to minimize the makespan of multiple mapreduce jobs. J Supercomput 72(6):2376–2393CrossRef Tian W, Li G, Yang W, Buyya R (2016) Hscheduler: an optimal approach to minimize the makespan of multiple mapreduce jobs. J Supercomput 72(6):2376–2393CrossRef
11.
go back to reference Yousefi MHN, Goudarzi M (2018) A task-based greedy scheduling algorithm for minimizing energy of mapreduce jobs. J Grid Comput 16(4):535–551CrossRef Yousefi MHN, Goudarzi M (2018) A task-based greedy scheduling algorithm for minimizing energy of mapreduce jobs. J Grid Comput 16(4):535–551CrossRef
12.
go back to reference Rasooli A, Down DG (2012) A hybrid scheduling approach for scalable heterogeneous hadoop systems. High Performance Computing, Networking Storage and Analysis, pp 1284–1291 Rasooli A, Down DG (2012) A hybrid scheduling approach for scalable heterogeneous hadoop systems. High Performance Computing, Networking Storage and Analysis, pp 1284–1291
13.
go back to reference Rasooli A, Down DG (2014) Coshh: A classification and optimization-based scheduler for heterogeneous hadoop systems. Futur Gener Comput Syst 36:1–15CrossRef Rasooli A, Down DG (2014) Coshh: A classification and optimization-based scheduler for heterogeneous hadoop systems. Futur Gener Comput Syst 36:1–15CrossRef
14.
go back to reference Tang Z, Qi L, Cheng Z, Li K, Khan SU, Li K (2016) An energy-efficient task scheduling algorithm in dvfs-enabled Cloud environment. J Grid Comput 14(1):55–74CrossRef Tang Z, Qi L, Cheng Z, Li K, Khan SU, Li K (2016) An energy-efficient task scheduling algorithm in dvfs-enabled Cloud environment. J Grid Comput 14(1):55–74CrossRef
15.
go back to reference Rauber T, Runger G (2019) DVFS RK: performance and energy modeling of frequency-scaled multithreaded Runge-Kutta methods. In: 27th Euromicro international conference on parallel, distributed and network-based processing (PDP). Rauber T, Runger G (2019) DVFS RK: performance and energy modeling of frequency-scaled multithreaded Runge-Kutta methods. In: 27th Euromicro international conference on parallel, distributed and network-based processing (PDP).
16.
go back to reference Ge R, Feng X, Feng WC, Cameron KW (2007) CPU MISER: A performance-directed, run-time system for power-aware clusters. Parallel Processing, 2007. In: International conference on. IEEE computer society. Ge R, Feng X, Feng WC, Cameron KW (2007) CPU MISER: A performance-directed, run-time system for power-aware clusters. Parallel Processing, 2007. In: International conference on. IEEE computer society.
17.
go back to reference Ibrahim S, Phan TD, Carpen-Amarie A, Chihoub HE, Moise D, Antoniu G (2016) Governing energy consumption in hadoop through cpu frequency scaling: An analysis. Futur Gener Comput Syst 54:219–232CrossRef Ibrahim S, Phan TD, Carpen-Amarie A, Chihoub HE, Moise D, Antoniu G (2016) Governing energy consumption in hadoop through cpu frequency scaling: An analysis. Futur Gener Comput Syst 54:219–232CrossRef
18.
go back to reference Zhu X, He C, Li K, Qin X (2012) Adaptive energy-efficient scheduling for real-time tasks on dvs-enabled heterogeneous clusters. J Parallel Distrib Comput 72(6):751–763CrossRef Zhu X, He C, Li K, Qin X (2012) Adaptive energy-efficient scheduling for real-time tasks on dvs-enabled heterogeneous clusters. J Parallel Distrib Comput 72(6):751–763CrossRef
19.
go back to reference Li S, Abdelzaher T, Yuan M (2011) TAPA: Temperature aware power allocation in data center with Map-Reduce. In: International green computing conference and workshops pp 1–8. Li S, Abdelzaher T, Yuan M (2011) TAPA: Temperature aware power allocation in data center with Map-Reduce. In: International green computing conference and workshops pp 1–8.
20.
go back to reference Li X, Garraghan P, Jiang X, Wu Z, Xu J (2017) Holistic virtual machine scheduling in cloud datacenters towards minimizing total energy. IEEE Trans Parallel Distrib Syst PP(99) 1–1. Li X, Garraghan P, Jiang X, Wu Z, Xu J (2017) Holistic virtual machine scheduling in cloud datacenters towards minimizing total energy. IEEE Trans Parallel Distrib Syst PP(99) 1–1.
21.
go back to reference Mhedheb Y, Jrad F, Tao J, Zhao J, Kołodziej J, Streit A (2013) Load and Thermal-Aware VM Scheduling on the Cloud. In: Kołodziej J, Di Martino B, Talia D, Xiong K. (eds) Algorithms and architectures for parallel processing. ICA3PP 2013. Lecture Notes in Computer Science, vol 8285. Mhedheb Y, Jrad F, Tao J, Zhao J, Kołodziej J, Streit A (2013) Load and Thermal-Aware VM Scheduling on the Cloud. In: Kołodziej J, Di Martino B, Talia D, Xiong K. (eds) Algorithms and architectures for parallel processing. ICA3PP 2013. Lecture Notes in Computer Science, vol 8285.
22.
go back to reference Cai X, Li F, Li P, Ju L, Jia Z (2017) Sla-aware energy-efficient scheduling scheme for hadoop YARN. J Supercomput 73(8):3526–3546CrossRef Cai X, Li F, Li P, Ju L, Jia Z (2017) Sla-aware energy-efficient scheduling scheme for hadoop YARN. J Supercomput 73(8):3526–3546CrossRef
23.
go back to reference Yao Y, Wang J, Sheng B, Lin J, Mi N (2014) HaSTE: Hadoop YARN scheduling based on task-dependency and resource-demand. In: 2014 IEEE 7th international conference on cloud computing pp 184–191. Yao Y, Wang J, Sheng B, Lin J, Mi N (2014) HaSTE: Hadoop YARN scheduling based on task-dependency and resource-demand. In: 2014 IEEE 7th international conference on cloud computing pp 184–191.
24.
go back to reference Dhawalia P, Kailasam S, Janakiram D (2013) Chisel: A resource savvy approach for handling skew in MapReduce applications. In: IEEE sixth international conference on cloud computing, IEEE, pp 652–660. Dhawalia P, Kailasam S, Janakiram D (2013) Chisel: A resource savvy approach for handling skew in MapReduce applications. In: IEEE sixth international conference on cloud computing, IEEE, pp 652–660.
25.
go back to reference Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkatara-man S, Franklin MJ et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65CrossRef Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkatara-man S, Franklin MJ et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65CrossRef
26.
go back to reference Mashayekhy L, Nejad MM, Grosu D, Zhang Q, Shi W (2014) Energy-aware scheduling of mapreduce jobs for big data applications. IEEE Trans Parallel Distrib Syst 26(10):2720–2733CrossRef Mashayekhy L, Nejad MM, Grosu D, Zhang Q, Shi W (2014) Energy-aware scheduling of mapreduce jobs for big data applications. IEEE Trans Parallel Distrib Syst 26(10):2720–2733CrossRef
27.
go back to reference Guo Y, Rao J, Cheng D, Zhou X (2016) ishuffle: Improving hadoop performance with shuffle-on-write. IEEE Trans Parallel Distrib Syst 28(6):1649–1662CrossRef Guo Y, Rao J, Cheng D, Zhou X (2016) ishuffle: Improving hadoop performance with shuffle-on-write. IEEE Trans Parallel Distrib Syst 28(6):1649–1662CrossRef
28.
go back to reference Chen Q, Yao J, Xiao Z (2014) Libra: Lightweight data skew mitigation in mapreduce. IEEE Trans Parallel Distrib Syst 26(9):2520–2533CrossRef Chen Q, Yao J, Xiao Z (2014) Libra: Lightweight data skew mitigation in mapreduce. IEEE Trans Parallel Distrib Syst 26(9):2520–2533CrossRef
29.
go back to reference Buyya R, Yeo CS, Venu-gopal S, Broberg J, Brandic I (2009) Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility. Futur Gener Comput Syst 25(6):599–616CrossRef Buyya R, Yeo CS, Venu-gopal S, Broberg J, Brandic I (2009) Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility. Futur Gener Comput Syst 25(6):599–616CrossRef
30.
go back to reference Lee YC, Zomaya AY (2010) Energy conscious scheduling for distributed computing systems under different operating conditions. IEEE Trans Parallel Distrib Syst 22(8):1374–1381CrossRef Lee YC, Zomaya AY (2010) Energy conscious scheduling for distributed computing systems under different operating conditions. IEEE Trans Parallel Distrib Syst 22(8):1374–1381CrossRef
31.
go back to reference Sundriyal V, Sosonkina M (2018) Modeling of the cpu frequency to minimize energy consumption in parallel applications. Sustain Comput: Inform Syst 17:1–8 Sundriyal V, Sosonkina M (2018) Modeling of the cpu frequency to minimize energy consumption in parallel applications. Sustain Comput: Inform Syst 17:1–8
32.
go back to reference Li H, Wang H, Fang S, Zou Y, Tian W (2019) An energy-aware scheduling algorithm for big data applications in spark. Clust Comput 23(2):593–609CrossRef Li H, Wang H, Fang S, Zou Y, Tian W (2019) An energy-aware scheduling algorithm for big data applications in spark. Clust Comput 23(2):593–609CrossRef
33.
go back to reference Huang S, Huang J, Dai J, Xie T, Huang B (2010) The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: 2010 IEEE 26th international conference on data engineering workshops (ICDEW 2010), IEEE, pp 41–51. Huang S, Huang J, Dai J, Xie T, Huang B (2010) The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: 2010 IEEE 26th international conference on data engineering workshops (ICDEW 2010), IEEE, pp 41–51.
Metadata
Title
A frequency-aware and energy-saving strategy based on DVFS for Spark
Authors
Hongjian Li
Yaojun Wei
Yu Xiong
Enjie Ma
Wenhong Tian
Publication date
26-03-2021
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 10/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-021-03740-5

Other articles of this Issue 10/2021

The Journal of Supercomputing 10/2021 Go to the issue

Premium Partner