Skip to main content
Top
Published in: The Journal of Supercomputing 1/2021

19-03-2020

A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

Authors: Abdol Karim Javanmardi, S. Hadi Yaghoubyan, Karamollah Bagherifard, Samad Nejatian, Hamid Parvin

Published in: The Journal of Supercomputing | Issue 1/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A significant amount of research in the field of job scheduling is carried out in Hadoop. However, there is still need for research to overcome some challenges regarding scheduling jobs in Hadoop clusters. There are various factors affecting the performance of scheduling policies like data volume (storage), data source format (different data), speed (data rate), security and privacy, cost, connection and data sharing. To reach a better utilization of resources and managing big data, scheduling policies have been designed. In this paper, an algorithm has been presented that can run on heterogeneous Hadoop clusters and runs job in parallel. This algorithm first distributes data based on the performance of the nodes and then schedules the jobs according to their cost of execution and decreases the cost of executing the jobs. The presented algorithm offers better performance in terms of execution time, cost and locality compared to FIFO and Fair schedulers.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Khan N, Yaqoob I, Hashem IA, Inayat Z, Ali WK, Alam M, Shiraz M, Gani A (2014) Big data: survey, technologies, opportunities, and challenges. Sci World J 2014:712826 Khan N, Yaqoob I, Hashem IA, Inayat Z, Ali WK, Alam M, Shiraz M, Gani A (2014) Big data: survey, technologies, opportunities, and challenges. Sci World J 2014:712826
2.
go back to reference Guo Y, Wu L, Yu W, Wang B, Wang X] (2015) The improved job scheduling algorithm of Hadoop platform.pdf . arXiv e-prints Guo Y, Wu L, Yu W, Wang B, Wang X] (2015) The improved job scheduling algorithm of Hadoop platform.pdf . arXiv e-prints
3.
go back to reference Zhang H, Stafman L, Or A, Freedman MJ (2018) SLAQ: quality-driven scheduling for distributed machine learning. arXiv e-prints, arXiv:1802.04819 Zhang H, Stafman L, Or A, Freedman MJ (2018) SLAQ: quality-driven scheduling for distributed machine learning. arXiv e-prints, arXiv:​1802.​04819
4.
go back to reference Rasooli A, Down DG (2014) COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. Future Gener Comput Syst 36:1–15CrossRef Rasooli A, Down DG (2014) COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. Future Gener Comput Syst 36:1–15CrossRef
5.
go back to reference Malik M, Neshatpour K, Rafatirad S, Joshi RV, Mohsenin T, Ghasemzadeh H, Homayoun H (2019) Big vs little core for energy-efficient Hadoop computing. J Parallel Distrib Comput 129:110–124CrossRef Malik M, Neshatpour K, Rafatirad S, Joshi RV, Mohsenin T, Ghasemzadeh H, Homayoun H (2019) Big vs little core for energy-efficient Hadoop computing. J Parallel Distrib Comput 129:110–124CrossRef
6.
go back to reference Varga M, Petrescu-Nita A, Pop F (2018) Deadline scheduling algorithm for sustainable computing in Hadoop environment. Comput Secur 76:354–366CrossRef Varga M, Petrescu-Nita A, Pop F (2018) Deadline scheduling algorithm for sustainable computing in Hadoop environment. Comput Secur 76:354–366CrossRef
7.
go back to reference Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219CrossRef Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219CrossRef
8.
go back to reference Brahmwar M, Kumar M, Sikka G (2016) Tolhit—a scheduling algorithm for Hadoop cluster. Proc Comput Sci 89:203–208CrossRef Brahmwar M, Kumar M, Sikka G (2016) Tolhit—a scheduling algorithm for Hadoop cluster. Proc Comput Sci 89:203–208CrossRef
9.
go back to reference Suresh S, Gopalan NP (2014) An optimal task selection scheme for Hadoop scheduling. IERI Proc 10:70–75CrossRef Suresh S, Gopalan NP (2014) An optimal task selection scheme for Hadoop scheduling. IERI Proc 10:70–75CrossRef
10.
go back to reference Gupta S, Fritz C, Price B, Hoover R, Dekleer J, Witteveen C (2013) Throughputscheduler: learning to schedule on heterogeneous Hadoop clusters. In: Proceedings of the 10th International Conference on Autonomic Computing (ICAC’13), pp 159–165 Gupta S, Fritz C, Price B, Hoover R, Dekleer J, Witteveen C (2013) Throughputscheduler: learning to schedule on heterogeneous Hadoop clusters. In: Proceedings of the 10th International Conference on Autonomic Computing (ICAC’13), pp 159–165
11.
go back to reference Xie J, Meng F, Wang H, Pan H, Cheng J, Qin X (2013) Research on scheduling scheme for Hadoop clusters. Proc Comput Sci 18:2468–2471CrossRef Xie J, Meng F, Wang H, Pan H, Cheng J, Qin X (2013) Research on scheduling scheme for Hadoop clusters. Proc Comput Sci 18:2468–2471CrossRef
12.
go back to reference Usama M, Liu M, Chen M (2017) Job schedulers for big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit Commun Netw 3(4):260–273CrossRef Usama M, Liu M, Chen M (2017) Job schedulers for big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit Commun Netw 3(4):260–273CrossRef
13.
go back to reference Bidgoli A, Tabar M, Rahmani A (2010) An artificial immune system for task scheduling in grid computing with task balancing, pp 25–31 Bidgoli A, Tabar M, Rahmani A (2010) An artificial immune system for task scheduling in grid computing with task balancing, pp 25–31
14.
go back to reference Hammoud S, Li M, Liu Y, Alham NK, Liu Z (2010) MRSim: a discrete event based MapReduce simulator. In: Seventh International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, pp 2993–2997 Hammoud S, Li M, Liu Y, Alham NK, Liu Z (2010) MRSim: a discrete event based MapReduce simulator. In: Seventh International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, pp 2993–2997
15.
go back to reference Chen Y, Ganapathi A, Griffith R, Katz R (2011) The case for evaluating MapReduce performance using workload suites. In: 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems Chen Y, Ganapathi A, Griffith R, Katz R (2011) The case for evaluating MapReduce performance using workload suites. In: 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
16.
go back to reference Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user MapReduce clusters. EECS Department, University of California, Berkeley Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user MapReduce clusters. EECS Department, University of California, Berkeley
Metadata
Title
A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems
Authors
Abdol Karim Javanmardi
S. Hadi Yaghoubyan
Karamollah Bagherifard
Samad Nejatian
Hamid Parvin
Publication date
19-03-2020
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 1/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03256-4

Other articles of this Issue 1/2021

The Journal of Supercomputing 1/2021 Go to the issue

Premium Partner