Skip to main content
Top
Published in: The Journal of Supercomputing 7/2021

04-01-2021

Constraint programming versus heuristic approach to MapReduce scheduling problem in Hadoop YARN for energy minimization

Authors: Vaibhav Pandey, Poonam Saini

Published in: The Journal of Supercomputing | Issue 7/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we consider a deadline-constrained MR scheduling problem of minimizing energy consumption in Hadoop’s generic resource manager known as yet another resource negotiator. The problem has been modeled as an integer programming (IP) problem using the time-indexed decision variables. We propose two solution approaches to the problem. First, we give a heuristic algorithm that generates sub-optimal schedules in polynomial time. Second, we propose a novel constraint programming (CP) model (as an alternative to the IP model) which always generates optimal schedules when solved by a CP solver. The CP technique is a relatively new and an alternative approach to IP-based branch-and-cut algorithm to exactly solve NP-hard optimization problems. We performed several experiments to compare both proposed solution approaches over real data traces of a wide variety of MR jobs from the HiBench and PUMA benchmark suite. It is noticed that for large-scale big data jobs, the heuristic algorithm provides sub-optimal results in a very small amount of time. On the other hand, the CP approach not only gives optimal results but also takes a small amount of time when compared to IP-based approaches. Therefore, it can be used in non-time-critical situations for getting an optimal schedule. Besides this, a few experiments were also performed to compare the tightest satisfiable deadline under both approaches with the conclusion that the CP technique is able to produce optimal schedules in tighter deadline constraints than the heuristic approach. Moreover, we investigate the sensitivity of total energy consumption of tasks and the execution time of both approaches separately on the number of tasks and deadlines.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Footnotes
1
Equals to the number of computing cores in the node.
 
2
Linear programming (LP), integer programming (IP), quadratic programming (QP), quadratic constraint programming (QCP), etc. are collectively known as mathematical programming (MP).
 
3
IP modeling is a common way to represent machine scheduling problems in general.
 
Literature
2.
go back to reference White T (2009) Hadoop: The definitive guide. O’Reilly Media, Inc., USA White T (2009) Hadoop: The definitive guide. O’Reilly Media, Inc., USA
3.
go back to reference Tiwari N, Sarkar S, Bellur U, Indrawan M (2015) Classification framework of mapreduce scheduling algorithms. ACM Comput Surv (CSUR) 47(3):49CrossRef Tiwari N, Sarkar S, Bellur U, Indrawan M (2015) Classification framework of mapreduce scheduling algorithms. ACM Comput Surv (CSUR) 47(3):49CrossRef
4.
go back to reference Pandey V, Saini P (2018) How heterogeneity affects the design of hadoop mapreduce schedulers: A state-of-the-art survey and challenges. Big Data 6(2):72–95CrossRef Pandey V, Saini P (2018) How heterogeneity affects the design of hadoop mapreduce schedulers: A state-of-the-art survey and challenges. Big Data 6(2):72–95CrossRef
5.
go back to reference Moseley B, Dasgupta A, Kumar R, Sarlós T (2011) On scheduling in map-reduce and flow-shops. In: Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 289–298. ACM Moseley B, Dasgupta A, Kumar R, Sarlós T (2011) On scheduling in map-reduce and flow-shops. In: Proceedings of the Twenty-Third Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 289–298. ACM
6.
go back to reference Verma A, Cherkasova L, Campbell RH (2013) Orchestrating an ensemble of MapReduce jobs for minimizing their makespan. IEEE Trans Dependable Secure Comput 10(5):314–327CrossRef Verma A, Cherkasova L, Campbell RH (2013) Orchestrating an ensemble of MapReduce jobs for minimizing their makespan. IEEE Trans Dependable Secure Comput 10(5):314–327CrossRef
7.
go back to reference Tian W, Li G, Yang W, Buyya R (2016) Hscheduler: an optimal approach to minimize the makespan of multiple mapreduce jobs. J Supercomput 72(6):2376–2393CrossRef Tian W, Li G, Yang W, Buyya R (2016) Hscheduler: an optimal approach to minimize the makespan of multiple mapreduce jobs. J Supercomput 72(6):2376–2393CrossRef
8.
go back to reference Mashayekhy L, Nejad MM, Grosu D, Zhang Q, Shi W (2015) Energy-aware scheduling of mapreduce jobs for big data applications. IEEE Trans Parallel Distrib Syst 26(10):2720–2733CrossRef Mashayekhy L, Nejad MM, Grosu D, Zhang Q, Shi W (2015) Energy-aware scheduling of mapreduce jobs for big data applications. IEEE Trans Parallel Distrib Syst 26(10):2720–2733CrossRef
9.
go back to reference Yousefi MHN, Goudarzi M (2018) A task-based greedy scheduling algorithm for minimizing energy of mapreduce jobs. J Grid Comput 16(4):535–551CrossRef Yousefi MHN, Goudarzi M (2018) A task-based greedy scheduling algorithm for minimizing energy of mapreduce jobs. J Grid Comput 16(4):535–551CrossRef
10.
go back to reference Fischer MJ, Su X, Yin Y (2010) Assigning tasks for efficiency in hadoop. In: Proceedings of the Twenty-Second Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 30–39. ACM Fischer MJ, Su X, Yin Y (2010) Assigning tasks for efficiency in hadoop. In: Proceedings of the Twenty-Second Annual ACM Symposium on Parallelism in Algorithms and Architectures, pp. 30–39. ACM
11.
go back to reference Aggarwal V, Xu M, Lan T, Subramaniam S (2017) On the optimality of scheduling dependent mapreduce tasks on heterogeneous machines. arXiv preprint arXiv:1711.09964 Aggarwal V, Xu M, Lan T, Subramaniam S (2017) On the optimality of scheduling dependent mapreduce tasks on heterogeneous machines. arXiv preprint arXiv:1711.09964
12.
go back to reference Zhu Y, Jiang Y, Wu W, Ding L, Teredesai A, Li D, Lee W (2014) Minimizing makespan and total completion time in mapreduce-like systems. In: IEEE INFOCOM 2014-IEEE Conference on Computer Communications, pp. 2166–2174. IEEE Zhu Y, Jiang Y, Wu W, Ding L, Teredesai A, Li D, Lee W (2014) Minimizing makespan and total completion time in mapreduce-like systems. In: IEEE INFOCOM 2014-IEEE Conference on Computer Communications, pp. 2166–2174. IEEE
13.
go back to reference Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, et al (2013) Apache hadoop yarn: Yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, p. 5. ACM Vavilapalli VK, Murthy AC, Douglas C, Agarwal S, Konar M, Evans R, Graves T, Lowe J, Shah H, Seth S, et al (2013) Apache hadoop yarn: Yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, p. 5. ACM
14.
go back to reference Fotakis D, Milis I, Papadigenopoulos O, Vassalos V, Zois G (2016) Scheduling mapreduce jobs under multi-round precedences. In: European Conference on Parallel Processing, pp. 209–222. Springer Fotakis D, Milis I, Papadigenopoulos O, Vassalos V, Zois G (2016) Scheduling mapreduce jobs under multi-round precedences. In: European Conference on Parallel Processing, pp. 209–222. Springer
15.
go back to reference Chen F, Kodialam M, Lakshman T (2012) Joint scheduling of processing and shuffle phases in mapreduce systems. In: 2012 Proceedings IEEE INFOCOM, pp. 1143–1151. IEEE Chen F, Kodialam M, Lakshman T (2012) Joint scheduling of processing and shuffle phases in mapreduce systems. In: 2012 Proceedings IEEE INFOCOM, pp. 1143–1151. IEEE
16.
go back to reference Zheng Y, Shroff NB, Sinha P (2013) A new analytical technique for designing provably efficient mapreduce schedulers. In: 2013 Proceedings IEEE INFOCOM, pp. 1600–1608. IEEE Zheng Y, Shroff NB, Sinha P (2013) A new analytical technique for designing provably efficient mapreduce schedulers. In: 2013 Proceedings IEEE INFOCOM, pp. 1600–1608. IEEE
17.
go back to reference Fotakis D, Milis I, Papadigenopoulos O, Zampetakis E, Zois G (2015) Scheduling mapreduce jobs and data shuffle on unrelated processors. In: International Symposium on Experimental Algorithms, pp. 137–150. Springer Fotakis D, Milis I, Papadigenopoulos O, Zampetakis E, Zois G (2015) Scheduling mapreduce jobs and data shuffle on unrelated processors. In: International Symposium on Experimental Algorithms, pp. 137–150. Springer
18.
go back to reference Edis EB, Ozkarahan I (2011) A combined integer/constraint programming approach to a resource-constrained parallel machine scheduling problem with machine eligibility restrictions. Eng Optim 43(2):135–157MathSciNetCrossRef Edis EB, Ozkarahan I (2011) A combined integer/constraint programming approach to a resource-constrained parallel machine scheduling problem with machine eligibility restrictions. Eng Optim 43(2):135–157MathSciNetCrossRef
19.
go back to reference Edis EB, Oguz C, Ozkarahan I (2013) Parallel machine scheduling with additional resources: Notation, classification, models and solution methods. Eur J Oper Res 230(3):449–463MathSciNetCrossRef Edis EB, Oguz C, Ozkarahan I (2013) Parallel machine scheduling with additional resources: Notation, classification, models and solution methods. Eur J Oper Res 230(3):449–463MathSciNetCrossRef
20.
go back to reference Ibm ilog cplex optimization studio opl language reference manual. White Paper (2018) Ibm ilog cplex optimization studio opl language reference manual. White Paper (2018)
21.
go back to reference Ibm ilog cplex optimization studio opl user reference manual. White Paper (2018) Ibm ilog cplex optimization studio opl user reference manual. White Paper (2018)
22.
go back to reference Yigitbasi N, Datta K, Jain N, Willke T (2011) Energy efficient scheduling of mapreduce workloads on heterogeneous clusters. In: Green Computing Middleware on Proceedings of the 2nd International Workshop, p. 1. ACM Yigitbasi N, Datta K, Jain N, Willke T (2011) Energy efficient scheduling of mapreduce workloads on heterogeneous clusters. In: Green Computing Middleware on Proceedings of the 2nd International Workshop, p. 1. ACM
23.
go back to reference Bampis E, Chau V, Letsios D, Lucarelli G, Milis I, Zois G (2014) Energy efficient scheduling of mapreduce jobs. In: European Conference on Parallel Processing, pp. 198–209. Springer, Berlin Bampis E, Chau V, Letsios D, Lucarelli G, Milis I, Zois G (2014) Energy efficient scheduling of mapreduce jobs. In: European Conference on Parallel Processing, pp. 198–209. Springer, Berlin
24.
go back to reference Shao Y, Li C, Gu J, Zhang J, Luo Y (2018) Efficient jobs scheduling approach for big data applications. Comput Ind Eng 117:249–261CrossRef Shao Y, Li C, Gu J, Zhang J, Luo Y (2018) Efficient jobs scheduling approach for big data applications. Comput Ind Eng 117:249–261CrossRef
25.
go back to reference Cai X, Li F, Li P, Ju L, Jia Z (2017) Sla-aware energy-efficient scheduling scheme for hadoop yarn. J Supercomput 73(8):3526–3546CrossRef Cai X, Li F, Li P, Ju L, Jia Z (2017) Sla-aware energy-efficient scheduling scheme for hadoop yarn. J Supercomput 73(8):3526–3546CrossRef
26.
go back to reference Verma A, Cherkasova L, Campbell RH (2011) Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM Verma A, Cherkasova L, Campbell RH (2011) Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, pp. 235–244. ACM
27.
go back to reference Edis EB, Oguz C (2011) Parallel machine scheduling with additional resources: a lagrangian-based constraint programming approach. In: International Conference on AI and OR Techniques in Constriant Programming for Combinatorial Optimization Problems, pp. 92–98. Springer Edis EB, Oguz C (2011) Parallel machine scheduling with additional resources: a lagrangian-based constraint programming approach. In: International Conference on AI and OR Techniques in Constriant Programming for Combinatorial Optimization Problems, pp. 92–98. Springer
28.
go back to reference Edis EB, Oguz C (2012) Parallel machine scheduling with flexible resources. Comput Ind Eng 63(2):433–447CrossRef Edis EB, Oguz C (2012) Parallel machine scheduling with flexible resources. Comput Ind Eng 63(2):433–447CrossRef
30.
go back to reference Gökgür B, Hnich B, Özpeynirci S (2018) Parallel machine scheduling with tool loading: a constraint programming approach. Int J Prod Res 56(16):5541–5557CrossRef Gökgür B, Hnich B, Özpeynirci S (2018) Parallel machine scheduling with tool loading: a constraint programming approach. Int J Prod Res 56(16):5541–5557CrossRef
31.
go back to reference Özpeynirci S, Gökgür B, Hnich B (2016) Parallel machine scheduling with tool loading. Appl Math Modell 40(9–10):5660–5671MathSciNetCrossRef Özpeynirci S, Gökgür B, Hnich B (2016) Parallel machine scheduling with tool loading. Appl Math Modell 40(9–10):5660–5671MathSciNetCrossRef
32.
go back to reference Arbaoui T, Yalaoui F (2018) Solving the unrelated parallel machine scheduling problem with additional resources using constraint programming. In: Asian Conference on Intelligent Information and Database Systems, pp. 716–725. Springer Arbaoui T, Yalaoui F (2018) Solving the unrelated parallel machine scheduling problem with additional resources using constraint programming. In: Asian Conference on Intelligent Information and Database Systems, pp. 716–725. Springer
33.
go back to reference Fanjul-Peyro L, Perea F, Ruiz R (2017) Models and matheuristics for the unrelated parallel machine scheduling problem with additional resources. Eur J Oper Res 260(2):482–493MathSciNetCrossRef Fanjul-Peyro L, Perea F, Ruiz R (2017) Models and matheuristics for the unrelated parallel machine scheduling problem with additional resources. Eur J Oper Res 260(2):482–493MathSciNetCrossRef
34.
go back to reference Van den Akker J, Hurkens CA, Savelsbergh MW (2000) Time-indexed formulations for machine scheduling problems: Column generation. INFORMS J Comput 12(2):111–124MathSciNetCrossRef Van den Akker J, Hurkens CA, Savelsbergh MW (2000) Time-indexed formulations for machine scheduling problems: Column generation. INFORMS J Comput 12(2):111–124MathSciNetCrossRef
35.
go back to reference Queyranne M, Schulz AS (1994) Polyhedral approaches to machine scheduling. TU, Fachbereich 3, Berlin Queyranne M, Schulz AS (1994) Polyhedral approaches to machine scheduling. TU, Fachbereich 3, Berlin
36.
go back to reference Smith BM, Brailsford SC, Hubbard PM, Williams HP (1996) The progressive party problem: Integer linear programming and constraint programming compared. Constraints 1(1–2):119–138MathSciNetCrossRef Smith BM, Brailsford SC, Hubbard PM, Williams HP (1996) The progressive party problem: Integer linear programming and constraint programming compared. Constraints 1(1–2):119–138MathSciNetCrossRef
37.
go back to reference Darby-Dowman K, Little J, Mitra G, Zaffalon M (1997) Constraint logic programming and integer programming approaches and their collaboration in solving an assignment scheduling problem. Constraints 1(3):245–264MathSciNetCrossRef Darby-Dowman K, Little J, Mitra G, Zaffalon M (1997) Constraint logic programming and integer programming approaches and their collaboration in solving an assignment scheduling problem. Constraints 1(3):245–264MathSciNetCrossRef
38.
go back to reference Darby-Dowman K, Little J (1998) Properties of some combinatorial optimization problems and their effect on the performance of integer programming and constraint logic programming. INFORMS J Comput 10(3):276–286MathSciNetCrossRef Darby-Dowman K, Little J (1998) Properties of some combinatorial optimization problems and their effect on the performance of integer programming and constraint logic programming. INFORMS J Comput 10(3):276–286MathSciNetCrossRef
39.
go back to reference Ibm ilog cplex optimization studiogetting started with scheduling in cplexstudio. White Paper (2018) Ibm ilog cplex optimization studiogetting started with scheduling in cplexstudio. White Paper (2018)
40.
go back to reference Huang S, Huang J, Dai J, Xie T, Huang B (2010) The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pp. 41–51. IEEE Huang S, Huang J, Dai J, Xie T, Huang B (2010) The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), pp. 41–51. IEEE
41.
go back to reference Ahmad F, Lee S, Thottethodi M, Vijaykumar T (2012) Puma: Purdue mapreduce benchmarks suite Ahmad F, Lee S, Thottethodi M, Vijaykumar T (2012) Puma: Purdue mapreduce benchmarks suite
Metadata
Title
Constraint programming versus heuristic approach to MapReduce scheduling problem in Hadoop YARN for energy minimization
Authors
Vaibhav Pandey
Poonam Saini
Publication date
04-01-2021
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 7/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03516-3

Other articles of this Issue 7/2021

The Journal of Supercomputing 7/2021 Go to the issue

Premium Partner