Skip to main content
Erschienen in: The Journal of Supercomputing 6/2021

05.11.2020

An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems

verfasst von: Abdol Karim Javanmardi, S. Hadi Yaghoubyan, Karamollah BagheriFard, Samad Nejatian, Hamid Parvin

Erschienen in: The Journal of Supercomputing | Ausgabe 6/2021

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Job scheduling in Hadoop has been thus far investigated in several studies. However, some challenges including minimum share (min-share), heterogeneous cluster, execution time estimation, and scheduling program size facing Hadoop clusters have received less attention. Accordingly, one of the most important algorithms with regard to min-share is that presented by Facebook Inc., i.e., FAIR scheduler, based on its own needs, in which an equal min-share has been considered for users. In this article, an attempt has been made to make the proposed method superior to existing methods through automation and configuration, performance optimization, fairness and data locality. A high-level architectural model is designed. Then a scheduler is defined on this architectural model. The provided scheduler contains four components. Three components schedule jobs and one component distributes the data for each job among the nodes. The given scheduler will be capable of being executed on heterogeneous Hadoop clusters and running jobs in parallel, in which disparate min-shares can be assigned to each job or user. Moreover, an approach is presented for each problem associated with min-share, cluster heterogeneity, execution time estimation, and scheduler program size. These approaches can be also utilized on its own to improve the performance of other scheduling algorithms. The scheduler presented in this paper showed acceptable performance compared with First-In, First-Out (FIFO), and FAIR schedulers.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19(2):171–209CrossRef Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19(2):171–209CrossRef
2.
Zurück zum Zitat Breur T (2016) Statistical power analysis and the contemporary “crisis” in social sciences. J Mark Anal 4(2–3):61–65CrossRef Breur T (2016) Statistical power analysis and the contemporary “crisis” in social sciences. J Mark Anal 4(2–3):61–65CrossRef
3.
Zurück zum Zitat Zhou S, Xie J, Du N, Pang Y (2018) A random-keys genetic algorithm for scheduling unrelated parallel batch processing machines with different capacities and arbitrary job sizes. Appl Math Comput 334:254–268MathSciNetMATH Zhou S, Xie J, Du N, Pang Y (2018) A random-keys genetic algorithm for scheduling unrelated parallel batch processing machines with different capacities and arbitrary job sizes. Appl Math Comput 334:254–268MathSciNetMATH
4.
Zurück zum Zitat Cheng B, Cai J, Yang S, Hu X (2014) Algorithms for scheduling incompatible job families on single batching machine with limited capacity. Comput Ind Eng 75:116–120CrossRef Cheng B, Cai J, Yang S, Hu X (2014) Algorithms for scheduling incompatible job families on single batching machine with limited capacity. Comput Ind Eng 75:116–120CrossRef
5.
Zurück zum Zitat Hu Y, Zhou H, de Laat C, Zhao Z (2020) Concurrent container scheduling on heterogeneous clusters with multi-resource constraints. Future Gener Comput Syst 102:562–573CrossRef Hu Y, Zhou H, de Laat C, Zhao Z (2020) Concurrent container scheduling on heterogeneous clusters with multi-resource constraints. Future Gener Comput Syst 102:562–573CrossRef
6.
Zurück zum Zitat Osorio-Valenzuela L, Pereira J, Quezada F, Vásquez ÓC (2019) Minimizing the number of machines with limited workload capacity for scheduling jobs with interval constraints. Appl Math Model 74:512–527MathSciNetCrossRef Osorio-Valenzuela L, Pereira J, Quezada F, Vásquez ÓC (2019) Minimizing the number of machines with limited workload capacity for scheduling jobs with interval constraints. Appl Math Model 74:512–527MathSciNetCrossRef
7.
Zurück zum Zitat Moon Y-H, Youn C-H (2015) Multihybrid job scheduling for fault-tolerant distributed computing in policy-constrained resource networks. Comput Netw 82:81–95CrossRef Moon Y-H, Youn C-H (2015) Multihybrid job scheduling for fault-tolerant distributed computing in policy-constrained resource networks. Comput Netw 82:81–95CrossRef
8.
Zurück zum Zitat Varga M, Petrescu-Nita A, Pop F (2018) Deadline scheduling algorithm for sustainable computing in Hadoop environment. Comput Secur 76:354–366CrossRef Varga M, Petrescu-Nita A, Pop F (2018) Deadline scheduling algorithm for sustainable computing in Hadoop environment. Comput Secur 76:354–366CrossRef
9.
Zurück zum Zitat Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user mapreduce clusters. In: EECS Department, University of California, Berkeley Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2009) Job scheduling for multi-user mapreduce clusters. In: EECS Department, University of California, Berkeley
10.
Zurück zum Zitat Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219CrossRef Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219CrossRef
11.
Zurück zum Zitat Suresh S, Gopalan NP (2014) An optimal task selection scheme for Hadoop scheduling. IERI Proced 10:70–75CrossRef Suresh S, Gopalan NP (2014) An optimal task selection scheme for Hadoop scheduling. IERI Proced 10:70–75CrossRef
12.
Zurück zum Zitat Usama M, Liu M, Chen M (2017) Job schedulers for Big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit Commun Netw 3(4):260–273CrossRef Usama M, Liu M, Chen M (2017) Job schedulers for Big data processing in Hadoop environment: testing real-life schedulers using benchmark programs. Digit Commun Netw 3(4):260–273CrossRef
13.
Zurück zum Zitat Guoa Y, Wu L, Yuc W, Wud B, Wange X (2015) The improved job scheduling algorithm of Hadoop platform.pdf. arXiv e-prints Guoa Y, Wu L, Yuc W, Wud B, Wange X (2015) The improved job scheduling algorithm of Hadoop platform.pdf. arXiv e-prints
14.
Zurück zum Zitat Gupta S, Fritz C, Price B, Hoover R, Dekleer J, Witteveen C (2013) Throughputscheduler: learning to schedule on heterogeneous Hadoop clusters. In: Proceedings of the 10th International Conference on Autonomic Computing (ICAC'13), pp 159–165. Gupta S, Fritz C, Price B, Hoover R, Dekleer J, Witteveen C (2013) Throughputscheduler: learning to schedule on heterogeneous Hadoop clusters. In: Proceedings of the 10th International Conference on Autonomic Computing (ICAC'13), pp 159–165.
15.
Zurück zum Zitat Naik NS, Negi A, BR TB, Anitha R, (2019) A data locality based scheduler to enhance MapReduce performance in heterogeneous environments. Future Gener Comput Syst 90:423–434CrossRef Naik NS, Negi A, BR TB, Anitha R, (2019) A data locality based scheduler to enhance MapReduce performance in heterogeneous environments. Future Gener Comput Syst 90:423–434CrossRef
16.
Zurück zum Zitat Xie J, Meng F, Wang H, Pan H, Cheng J, Qin X (2013) Research on scheduling scheme for Hadoop clusters. Proced Comput Sci 18:2468–2471CrossRef Xie J, Meng F, Wang H, Pan H, Cheng J, Qin X (2013) Research on scheduling scheme for Hadoop clusters. Proced Comput Sci 18:2468–2471CrossRef
17.
Zurück zum Zitat Rasooli A, Down DG (2014) COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. Future Gener Comput Syst 36:1–15CrossRef Rasooli A, Down DG (2014) COSHH: a classification and optimization based scheduler for heterogeneous Hadoop systems. Future Gener Comput Syst 36:1–15CrossRef
18.
Zurück zum Zitat Liang W, Chen Y, Liu J, An H (2019) CARS: a contention-aware scheduler for efficient resource management of HPC storage systems. Parallel Comput 87:25–34CrossRef Liang W, Chen Y, Liu J, An H (2019) CARS: a contention-aware scheduler for efficient resource management of HPC storage systems. Parallel Comput 87:25–34CrossRef
19.
Zurück zum Zitat Brahmwar M, Kumar M, Sikka G (2016) Tolhit: a scheduling algorithm for Hadoop cluster. Proced Comput Sci 89:203–208CrossRef Brahmwar M, Kumar M, Sikka G (2016) Tolhit: a scheduling algorithm for Hadoop cluster. Proced Comput Sci 89:203–208CrossRef
20.
Zurück zum Zitat Zhang H, Stafman L, Or A, Freedman MJ (2018) SLAQ: quality-driven scheduling for distributed machine learning. arXiv e-prints. arXiv:1802.04819 Zhang H, Stafman L, Or A, Freedman MJ (2018) SLAQ: quality-driven scheduling for distributed machine learning. arXiv e-prints. arXiv:1802.04819
21.
Zurück zum Zitat Chen Y, Ganapathi A, Griffith R, Katz R (2011) The case for evaluating mapreduce performance using workload suites. In: Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems Chen Y, Ganapathi A, Griffith R, Katz R (2011) The case for evaluating mapreduce performance using workload suites. In: Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
22.
Zurück zum Zitat Malik M, Neshatpour K, Rafatirad S, Joshi RV, Mohsenin T, Ghasemzadeh H, Homayoun H (2019) Big vs little core for energy-efficient Hadoop computing. J Parallel Distrib Comput 129:110–124CrossRef Malik M, Neshatpour K, Rafatirad S, Joshi RV, Mohsenin T, Ghasemzadeh H, Homayoun H (2019) Big vs little core for energy-efficient Hadoop computing. J Parallel Distrib Comput 129:110–124CrossRef
23.
Zurück zum Zitat Islam MT, Srirama SN, Karunasekera S, Buyya R (2020) Cost-efficient dynamic scheduling of big data applications in apache spark on cloud. J Syst Softw 162:110515CrossRef Islam MT, Srirama SN, Karunasekera S, Buyya R (2020) Cost-efficient dynamic scheduling of big data applications in apache spark on cloud. J Syst Softw 162:110515CrossRef
24.
Zurück zum Zitat Hammoud S, Li M, Liu Y, Alham NK, Liu Z (2010) MRSim: a discrete event based mapreduce simulator. In: Proceedings of the Seventh International Conference on Fuzzy Systems and Knowledge Discovery. IEEE. pp 2993–2997. Hammoud S, Li M, Liu Y, Alham NK, Liu Z (2010) MRSim: a discrete event based mapreduce simulator. In: Proceedings of the Seventh International Conference on Fuzzy Systems and Knowledge Discovery. IEEE. pp 2993–2997.
25.
Zurück zum Zitat Hv A, Sebastian S (2017) Comparative study of job schedulers in Hadoop environment. Int J Adv Res Comput Sci 8(3). Hv A, Sebastian S (2017) Comparative study of job schedulers in Hadoop environment. Int J Adv Res Comput Sci 8(3).
26.
27.
Zurück zum Zitat Hamad F (2018) An overview of Hadoop scheduler algorithms. Mod Appl Sci 12:69CrossRef Hamad F (2018) An overview of Hadoop scheduler algorithms. Mod Appl Sci 12:69CrossRef
Metadaten
Titel
An architecture for scheduling with the capability of minimum share to heterogeneous Hadoop systems
verfasst von
Abdol Karim Javanmardi
S. Hadi Yaghoubyan
Karamollah BagheriFard
Samad Nejatian
Hamid Parvin
Publikationsdatum
05.11.2020
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 6/2021
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03487-5

Weitere Artikel der Ausgabe 6/2021

The Journal of Supercomputing 6/2021 Zur Ausgabe