Skip to main content
Erschienen in: The Journal of Supercomputing 1/2021

19.03.2020

A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems

verfasst von: Abdol Karim Javanmardi, S. Hadi Yaghoubyan, Karamollah Bagherifard, Samad Nejatian, Hamid Parvin

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A significant amount of research in the field of job scheduling is carried out in Hadoop. However, there is still need for research to overcome some challenges regarding scheduling jobs in Hadoop clusters. There are various factors affecting the performance of scheduling policies like data volume (storage), data source format (different data), speed (data rate), security and privacy, cost, connection and data sharing. To reach a better utilization of resources and managing big data, scheduling policies have been designed. In this paper, an algorithm has been presented that can run on heterogeneous Hadoop clusters and runs job in parallel. This algorithm first distributes data based on the performance of the nodes and then schedules the jobs according to their cost of execution and decreases the cost of executing the jobs. The presented algorithm offers better performance in terms of execution time, cost and locality compared to FIFO and Fair schedulers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Khan N, Yaqoob I, Hashem IA, Inayat Z, Ali WK, Alam M, Shiraz M, Gani A (2014) Big data: survey, technologies, opportunities, and challenges. Sci World J 2014:712826 Khan N, Yaqoob I, Hashem IA, Inayat Z, Ali WK, Alam M, Shiraz M, Gani A (2014) Big data: survey, technologies, opportunities, and challenges. Sci World J 2014:712826
2.
Zurück zum Zitat Guo Y, Wu L, Yu W, Wang B, Wang X] (2015) The improved job scheduling algorithm of Hadoop platform.pdf . arXiv e-prints Guo Y, Wu L, Yu W, Wang B, Wang X] (2015) The improved job scheduling algorithm of Hadoop platform.pdf . arXiv e-prints
3.
Zurück zum Zitat Zhang H, Stafman L, Or A, Freedman MJ (2018) SLAQ: quality-driven scheduling for distributed machine learning. arXiv e-prints, arXiv:1802.04819 Zhang H, Stafman L, Or A, Freedman MJ (2018) SLAQ: quality-driven scheduling for distributed machine learning. arXiv e-prints, arXiv:​1802.​04819
4.
Zurück zum Zitat Rasooli A, Down DG (2014) COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. Future Gener Comput Syst 36:1–15CrossRef Rasooli A, Down DG (2014) COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. Future Gener Comput Syst 36:1–15CrossRef
5.
Zurück zum Zitat Malik M, Neshatpour K, Rafatirad S, Joshi RV, Mohsenin T, Ghasemzadeh H, Homayoun H (2019) Big vs little core for energy-efficient Hadoop computing. J Parallel Distrib Comput 129:110–124CrossRef Malik M, Neshatpour K, Rafatirad S, Joshi RV, Mohsenin T, Ghasemzadeh H, Homayoun H (2019) Big vs little core for energy-efficient Hadoop computing. J Parallel Distrib Comput 129:110–124CrossRef
6.
Zurück zum Zitat Varga M, Petrescu-Nita A, Pop F (2018) Deadline scheduling algorithm for sustainable computing in Hadoop environment. Comput Secur 76:354–366CrossRef Varga M, Petrescu-Nita A, Pop F (2018) Deadline scheduling algorithm for sustainable computing in Hadoop environment. Comput Secur 76:354–366CrossRef
7.
Zurück zum Zitat Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219CrossRef Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219CrossRef
8.
Zurück zum Zitat Brahmwar M, Kumar M, Sikka G (2016) Tolhit—a scheduling algorithm for Hadoop cluster. Proc Comput Sci 89:203–208CrossRef Brahmwar M, Kumar M, Sikka G (2016) Tolhit—a scheduling algorithm for Hadoop cluster. Proc Comput Sci 89:203–208CrossRef
9.
Zurück zum Zitat Suresh S, Gopalan NP (2014) An optimal task selection scheme for Hadoop scheduling. IERI Proc 10:70–75CrossRef Suresh S, Gopalan NP (2014) An optimal task selection scheme for Hadoop scheduling. IERI Proc 10:70–75CrossRef
10.
Zurück zum Zitat Gupta S, Fritz C, Price B, Hoover R, Dekleer J, Witteveen C (2013) Throughputscheduler: learning to schedule on heterogeneous Hadoop clusters. In: Proceedings of the 10th International Conference on Autonomic Computing (ICAC’13), pp 159–165 Gupta S, Fritz C, Price B, Hoover R, Dekleer J, Witteveen C (2013) Throughputscheduler: learning to schedule on heterogeneous Hadoop clusters. In: Proceedings of the 10th International Conference on Autonomic Computing (ICAC’13), pp 159–165
11.
Zurück zum Zitat Xie J, Meng F, Wang H, Pan H, Cheng J, Qin X (2013) Research on scheduling scheme for Hadoop clusters. Proc Comput Sci 18:2468–2471CrossRef Xie J, Meng F, Wang H, Pan H, Cheng J, Qin X (2013) Research on scheduling scheme for Hadoop clusters. Proc Comput Sci 18:2468–2471CrossRef
12.
Zurück zum Zitat Usama M, Liu M, Chen M (2017) Job schedulers for big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit Commun Netw 3(4):260–273CrossRef Usama M, Liu M, Chen M (2017) Job schedulers for big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit Commun Netw 3(4):260–273CrossRef
13.
Zurück zum Zitat Bidgoli A, Tabar M, Rahmani A (2010) An artificial immune system for task scheduling in grid computing with task balancing, pp 25–31 Bidgoli A, Tabar M, Rahmani A (2010) An artificial immune system for task scheduling in grid computing with task balancing, pp 25–31
14.
Zurück zum Zitat Hammoud S, Li M, Liu Y, Alham NK, Liu Z (2010) MRSim: a discrete event based MapReduce simulator. In: Seventh International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, pp 2993–2997 Hammoud S, Li M, Liu Y, Alham NK, Liu Z (2010) MRSim: a discrete event based MapReduce simulator. In: Seventh International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, pp 2993–2997
15.
Zurück zum Zitat Chen Y, Ganapathi A, Griffith R, Katz R (2011) The case for evaluating MapReduce performance using workload suites. In: 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems Chen Y, Ganapathi A, Griffith R, Katz R (2011) The case for evaluating MapReduce performance using workload suites. In: 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
16.
Zurück zum Zitat Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user MapReduce clusters. EECS Department, University of California, Berkeley Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user MapReduce clusters. EECS Department, University of California, Berkeley
Metadaten
Titel
A unit-based, cost-efficient scheduler for heterogeneous Hadoop systems
verfasst von
Abdol Karim Javanmardi
S. Hadi Yaghoubyan
Karamollah Bagherifard
Samad Nejatian
Hamid Parvin
Publikationsdatum
19.03.2020
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 1/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03256-4

Weitere Artikel der Ausgabe 1/2021

The Journal of Supercomputing 1/2021 Zur Ausgabe

Premium Partner