Skip to main content
Erschienen in: The Journal of Supercomputing 11/2019

01.08.2019

Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment

verfasst von: J. Rathinaraja, V. S. Ananthanarayana, Anand Paul

Erschienen in: The Journal of Supercomputing | Ausgabe 11/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

“More data, more information.” Big data helps businesses and research communities to gain insights and increase productivity. Many public cloud service providers offer Hadoop MapReduce as a service based on pay-per-use via infrastructure as a service on clusters of virtual machines promising on-demand horizontal scaling. These clusters of virtual machines are launched in various physical machines across racks in cloud data centers. Such multi-tenancy negatively introduces performance heterogeneity for Hadoop virtual machines due to hardware heterogeneity and interference from co-located virtual machine. Performance heterogeneity largely affects MapReduce job latency and resource utilization of rented Hadoop virtual clusters. Default MapReduce schedulers assign map/reduce tasks assuming the hardware is homogeneous. Interference-aware schedulers perform by only observing the interference pattern generated by co-located virtual machines. These schedulers do not consider the heterogeneous performance of virtual machines. Therefore, we propose a dynamic ranking-based MapReduce job scheduler that places the map and reduces tasks based on a virtual machine’s performance rank to minimize job latency and improve resource utilization. Our proposed approach calculates the performance score for each virtual machine based on hardware heterogeneity and co-located virtual machine interference. Then, it ranks the virtual machines based on the map and reduce performance separately to place map and reduce tasks. To demonstrate our ideas, we have set a test bed with 29 virtual machines on eight physical machines with different configurations and capacities. We modify a default fair scheduler in Hadoop 2.x to incorporate our ideas and evaluate them with different workloads on the PUMA dataset. The proposed method is then compared against a default fair scheduler (resource-aware) and an interference-aware scheduler based on job latency and resource utilization. Finally, we argue in favor of our approach as it improves resource utilization by 30–65% and overall job latency by up to 30%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Guo Y, Rao J, Jiang C, Zhou X (2014) Moving hadoop into the cloud with flexible slot management. In: IEEE Proceedings of the International Conference for High-Performance Computing, Networking, Storage and Analysis, pp 959–969 Guo Y, Rao J, Jiang C, Zhou X (2014) Moving hadoop into the cloud with flexible slot management. In: IEEE Proceedings of the International Conference for High-Performance Computing, Networking, Storage and Analysis, pp 959–969
2.
Zurück zum Zitat Vaibhav P, Poonam S (2018) How heterogeneity affects the design of hadoop MapReduce schedulers: a state-of-the-art survey and challenges. Big Data 6(2):72–95CrossRef Vaibhav P, Poonam S (2018) How heterogeneity affects the design of hadoop MapReduce schedulers: a state-of-the-art survey and challenges. Big Data 6(2):72–95CrossRef
3.
Zurück zum Zitat Jackson K (2012) OpenStack cloud computing cookbook. Packt Publishing, Birmingham Jackson K (2012) OpenStack cloud computing cookbook. Packt Publishing, Birmingham
4.
Zurück zum Zitat Boutaba R, Cheng L, Zhang Q (2012) On cloud computational models and the heterogeneity challenge. J Internet Ser Appl 3:77–86CrossRef Boutaba R, Cheng L, Zhang Q (2012) On cloud computational models and the heterogeneity challenge. J Internet Ser Appl 3:77–86CrossRef
5.
Zurück zum Zitat Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: 6th ACM Conference on Symposium on Operating Systems Design Implementation Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: 6th ACM Conference on Symposium on Operating Systems Design Implementation
6.
Zurück zum Zitat Mei Y, Liu L, Pu X, Sivathanu S (2010) Performance measurements and analysis of network I/O applications in virtualized cloud. In: IEEE 3rd International Conference on Cloud Computing, pp 59–66 Mei Y, Liu L, Pu X, Sivathanu S (2010) Performance measurements and analysis of network I/O applications in virtualized cloud. In: IEEE 3rd International Conference on Cloud Computing, pp 59–66
7.
Zurück zum Zitat Chiang RC, Howie Huang H (2014) TRACON: interference-aware scheduling for data-intensive applications in virtualized environments. IEEE Trans Parallel Distrib Syst 25(5):1349–1358CrossRef Chiang RC, Howie Huang H (2014) TRACON: interference-aware scheduling for data-intensive applications in virtualized environments. IEEE Trans Parallel Distrib Syst 25(5):1349–1358CrossRef
8.
Zurück zum Zitat Bu X, Rao J, Xu CZ (2013) Interference and locality-aware task scheduling for MapReduce applications in virtual clusters. In: High-Performance Parallel and Distributed Computing, pp 227–238 Bu X, Rao J, Xu CZ (2013) Interference and locality-aware task scheduling for MapReduce applications in virtual clusters. In: High-Performance Parallel and Distributed Computing, pp 227–238
9.
Zurück zum Zitat Nathuji R, Kansal A, Ghaffarkhah A (2010) Q-clouds: managing performance interference effects for QoS-aware clouds. In: EuroSys, pp 237–250 Nathuji R, Kansal A, Ghaffarkhah A (2010) Q-clouds: managing performance interference effects for QoS-aware clouds. In: EuroSys, pp 237–250
10.
Zurück zum Zitat Cheng D, Rao J, Guo Y, Jiang C, Zhou X (2017) Improving performance of heterogeneous MapReduce clusters with adaptive task tuning. IEEE Trans Parallel Distrib Syst 28:774–786CrossRef Cheng D, Rao J, Guo Y, Jiang C, Zhou X (2017) Improving performance of heterogeneous MapReduce clusters with adaptive task tuning. IEEE Trans Parallel Distrib Syst 28:774–786CrossRef
11.
Zurück zum Zitat Lei Yang Y, Dai BZ (2016) MapReduce scheduler by characterizing performance interference. China Commun 13(10):253–262CrossRef Lei Yang Y, Dai BZ (2016) MapReduce scheduler by characterizing performance interference. China Commun 13(10):253–262CrossRef
12.
Zurück zum Zitat Vasile M-A, Pop F, Tutueanu R-I, Cristea V, Kolodziej J (2015) Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gener Comput Syst 51:61–71CrossRef Vasile M-A, Pop F, Tutueanu R-I, Cristea V, Kolodziej J (2015) Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gener Comput Syst 51:61–71CrossRef
13.
Zurück zum Zitat Ikken S, Renault E, Kechadi MT, Tari A (2015) Toward scheduling I/O request of MapReduce tasks based on the Markov model. Springer, Berlin, pp 78–89 Ikken S, Renault E, Kechadi MT, Tari A (2015) Toward scheduling I/O request of MapReduce tasks based on the Markov model. Springer, Berlin, pp 78–89
14.
Zurück zum Zitat Zhang Q, Zhani MF, Yang Y, Boutaba R, Wong B (2015) PRISM: fine-grained resource-aware scheduling for MapReduce. IEEE Trans Cloud Comput 3:182–194CrossRef Zhang Q, Zhani MF, Yang Y, Boutaba R, Wong B (2015) PRISM: fine-grained resource-aware scheduling for MapReduce. IEEE Trans Cloud Comput 3:182–194CrossRef
15.
Zurück zum Zitat Yang S-J, Chen Y-R (2015) Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds. J Netw Comput Appl 57:61–70CrossRef Yang S-J, Chen Y-R (2015) Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds. J Netw Comput Appl 57:61–70CrossRef
16.
Zurück zum Zitat Anjos J, Izurieta IC, Kolberg W, Tibola AL, Arantes L, Geyer C (2015) MRA++: scheduling and data placement on MapReduce for heterogeneous environments. Future Gener Comput Syst 42:22–35CrossRef Anjos J, Izurieta IC, Kolberg W, Tibola AL, Arantes L, Geyer C (2015) MRA++: scheduling and data placement on MapReduce for heterogeneous environments. Future Gener Comput Syst 42:22–35CrossRef
17.
Zurück zum Zitat Mao Y, Zhong H, Wang L (2015) A Fine-grained and dynamic MapReduce task scheduling scheme for the heterogeneous cloud environment. In: IEEE 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), pp 155–158 Mao Y, Zhong H, Wang L (2015) A Fine-grained and dynamic MapReduce task scheduling scheme for the heterogeneous cloud environment. In: IEEE 14th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES), pp 155–158
18.
Zurück zum Zitat Zhang Z, Cherkasova L, Loo BT (2015) Exploiting cloud heterogeneity to optimize performance and cost of MapReduce processing. ACM Sigmet Perform Eval Rev 42:38–50CrossRef Zhang Z, Cherkasova L, Loo BT (2015) Exploiting cloud heterogeneity to optimize performance and cost of MapReduce processing. ACM Sigmet Perform Eval Rev 42:38–50CrossRef
19.
Zurück zum Zitat Yan F, Cherkasova L, Zhang Z, Smirni E (2017) DyScale: a MapReduce job scheduler for heterogeneous multicore processors. IEEE Trans Cloud Comput 5:317–330CrossRef Yan F, Cherkasova L, Zhang Z, Smirni E (2017) DyScale: a MapReduce job scheduler for heterogeneous multicore processors. IEEE Trans Cloud Comput 5:317–330CrossRef
20.
Zurück zum Zitat Lin W-H, Lei Z-M, Liu J, Yang J, Liu F, He G, Wang Q (2013) MapReduce optimization algorithm based on machine learning in a heterogeneous cloud environment. J China Univ Posts Telecommun 20:77–121CrossRef Lin W-H, Lei Z-M, Liu J, Yang J, Liu F, He G, Wang Q (2013) MapReduce optimization algorithm based on machine learning in a heterogeneous cloud environment. J China Univ Posts Telecommun 20:77–121CrossRef
21.
Zurück zum Zitat Verma A, Cherkasova L, Campbell RH (2011) ARIA: automatic resource inference and allocation for MapReduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp 235–244 Verma A, Cherkasova L, Campbell RH (2011) ARIA: automatic resource inference and allocation for MapReduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp 235–244
22.
Zurück zum Zitat Xie J, Yin S, Ruan X, Ding Z, Tian Y, Majors J, Manzanares A, Qin X (2010) Improving MapReduce performance through data placement in heterogeneous hadoop clusters. In: Parallel and Distributed Processing, Workshops and Ph.D. Forum (IPDPSW), pp 1–9 Xie J, Yin S, Ruan X, Ding Z, Tian Y, Majors J, Manzanares A, Qin X (2010) Improving MapReduce performance through data placement in heterogeneous hadoop clusters. In: Parallel and Distributed Processing, Workshops and Ph.D. Forum (IPDPSW), pp 1–9
23.
Zurück zum Zitat Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: ACM Proceedings of the 5th European Conference on Computer Systems, pp 265–278 Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: ACM Proceedings of the 5th European Conference on Computer Systems, pp 265–278
24.
Zurück zum Zitat Tian C, Zhou H, He Y, Zha L (2009) A dynamic MapReduce scheduler for heterogeneous workloads. In: IEEE Eighth International Conference on Grid and Cooperative Computing, pp 218–244 Tian C, Zhou H, He Y, Zha L (2009) A dynamic MapReduce scheduler for heterogeneous workloads. In: IEEE Eighth International Conference on Grid and Cooperative Computing, pp 218–244
27.
Zurück zum Zitat Chen C-H, Lin J-W, Kuo S-Y (2018) MapReduce scheduling for deadline-constrained jobs in heterogeneous cloud computing systems. IEEE Trans Cloud Comput 6(1):127–140CrossRef Chen C-H, Lin J-W, Kuo S-Y (2018) MapReduce scheduling for deadline-constrained jobs in heterogeneous cloud computing systems. IEEE Trans Cloud Comput 6(1):127–140CrossRef
28.
Zurück zum Zitat Hsieh S-Y, Chen C-T, Chen C-H, Yen T-H, Hsiao H-C, Buyya R (2018) Novel scheduling algorithms for efficient deployment of MapReduce applications in heterogeneous computing environments. IEEE Trans Cloud Comput 6(4):1080–1095CrossRef Hsieh S-Y, Chen C-T, Chen C-H, Yen T-H, Hsiao H-C, Buyya R (2018) Novel scheduling algorithms for efficient deployment of MapReduce applications in heterogeneous computing environments. IEEE Trans Cloud Comput 6(4):1080–1095CrossRef
29.
Zurück zum Zitat Cheng D, Zhou X, Yinggen X, Liu L, Jiang C (2019) Deadline-aware MapReduce job scheduling with dynamic resource availability. IEEE Trans Parallel Distrib Syst 30(4):814–826CrossRef Cheng D, Zhou X, Yinggen X, Liu L, Jiang C (2019) Deadline-aware MapReduce job scheduling with dynamic resource availability. IEEE Trans Parallel Distrib Syst 30(4):814–826CrossRef
30.
Zurück zum Zitat Yang Z, Bhimani J, Yao Y, Lin C-H, Wang J, Mi N, Sheng B (2018) AutoAdmin: automatic and dynamic resource reservation admission control in hadoop YARN clusters Scalable Comput Pract Exp 19(1):53–67 Yang Z, Bhimani J, Yao Y, Lin C-H, Wang J, Mi N, Sheng B (2018) AutoAdmin: automatic and dynamic resource reservation admission control in hadoop YARN clusters Scalable Comput Pract Exp 19(1):53–67
31.
Zurück zum Zitat Zeng X, Garg SK, Wen Z, Strazdins P, Zomaya AY, Ranjan R (2018) Cost efficient scheduling of MapReduce applications on public clouds. J Comput Sci 26:375–388CrossRef Zeng X, Garg SK, Wen Z, Strazdins P, Zomaya AY, Ranjan R (2018) Cost efficient scheduling of MapReduce applications on public clouds. J Comput Sci 26:375–388CrossRef
32.
Zurück zum Zitat Qureshi B (2019) Profile-based power-aware workflow scheduling framework for energy-efficient data centers. Future Gener Comput Syst 94:453–467CrossRef Qureshi B (2019) Profile-based power-aware workflow scheduling framework for energy-efficient data centers. Future Gener Comput Syst 94:453–467CrossRef
33.
Zurück zum Zitat Yao Y, Gao H, Wang J, Sheng B, Mi N (2019) New scheduling algorithms for improving performance and resource utilization in hadoop YARN clusters. IEEE Trans Cloud Comput (2019) Yao Y, Gao H, Wang J, Sheng B, Mi N (2019) New scheduling algorithms for improving performance and resource utilization in hadoop YARN clusters. IEEE Trans Cloud Comput (2019)
35.
Zurück zum Zitat Naik NS, Negi A, Tapas Bapu BR, Anitha R (2019) A data locality-based scheduler to enhance MapReduce performance in heterogeneous environments. Future Gener Comput Syst 90:423–434CrossRef Naik NS, Negi A, Tapas Bapu BR, Anitha R (2019) A data locality-based scheduler to enhance MapReduce performance in heterogeneous environments. Future Gener Comput Syst 90:423–434CrossRef
36.
Zurück zum Zitat Malik M, Neshatpour K, Rafatirad S, Joshi RV, Mohsenin T, Ghasemzadeh H, Homayoun H (2019) Big versus little core for energy-efficient hadoop computing. J Parallel Distrib Comput 129:110–124CrossRef Malik M, Neshatpour K, Rafatirad S, Joshi RV, Mohsenin T, Ghasemzadeh H, Homayoun H (2019) Big versus little core for energy-efficient hadoop computing. J Parallel Distrib Comput 129:110–124CrossRef
Metadaten
Titel
Dynamic ranking-based MapReduce job scheduler to exploit heterogeneous performance in a virtualized environment
verfasst von
J. Rathinaraja
V. S. Ananthanarayana
Anand Paul
Publikationsdatum
01.08.2019
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 11/2019
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-019-02960-0

Weitere Artikel der Ausgabe 11/2019

The Journal of Supercomputing 11/2019 Zur Ausgabe

Premium Partner