Skip to main content
Erschienen in: The Journal of Supercomputing 5/2015

01.05.2015

Analyzing job completion reliability and job energy consumption for a heterogeneous MapReduce cluster under different intermediate-data replication policies

verfasst von: Jia-Chun Lin, Fang-Yie Leu, Ying-Ping Chen

Erschienen in: The Journal of Supercomputing | Ausgabe 5/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recently, MapReduce has been a popular distributed programming framework for solving data-intensive applications. However, a large-scale MapReduce cluster has inevitable machine/node failures and considerable energy consumption. To solve these problems, MapReduce has employed several policies for replicating input data, storing/replicating intermediate data, and re-executing failed tasks. In this study, we concentrate on two typical policies for storing/replicating intermediate data, and derive the job completion reliability (JCR for short) and job energy consumption (JEC for short) of a MapReduce cluster when the two policies are individually employed. The two policies are further analyzed and compared given various scenarios in which jobs with different input data sizes, numbers of reduce tasks, and other parameters are run in a MapReduce cluster with two extreme parallel execution capabilities. From the analytical results, MapReduce managers are able to comprehend how the two policies influence the JCR and JEC of a MapReduce cluster.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef
2.
Zurück zum Zitat Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53(1):72–77CrossRef Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53(1):72–77CrossRef
4.
Zurück zum Zitat Chen S, Schlosser S (2008) Map-Reduce meets wider varieties of applications. Technical report IRP-TR-08-05, Intel Research Chen S, Schlosser S (2008) Map-Reduce meets wider varieties of applications. Technical report IRP-TR-08-05, Intel Research
5.
Zurück zum Zitat White B, Yeh T, Lin J, Davis L (2010) Web-scale computer vision using MapReduce for multimedia data mining. In: Proceedings of the international workshop on multimedia data mining, pp 1–10 White B, Yeh T, Lin J, Davis L (2010) Web-scale computer vision using MapReduce for multimedia data mining. In: Proceedings of the international workshop on multimedia data mining, pp 1–10
6.
Zurück zum Zitat Matsunaga A, Tsugawa M, Fortes J (2008) CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications. In: the IEEE international conference on e-science, pp 222–229 Matsunaga A, Tsugawa M, Fortes J (2008) CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications. In: the IEEE international conference on e-science, pp 222–229
7.
Zurück zum Zitat Wiley K, Connolly A, Gardner JP, Krughof S, Balazinska M, Howe B, Kwon Y, Bu Y (2011) Astronomy in the cloud: using Mapreduce for image coaddition. Astronomy 123(901):366–380 Wiley K, Connolly A, Gardner JP, Krughof S, Balazinska M, Howe B, Kwon Y, Bu Y (2011) Astronomy in the cloud: using Mapreduce for image coaddition. Astronomy 123(901):366–380
8.
Zurück zum Zitat Ko S, Hoque I, Cho B, Gupta I (2010) Making cloud intermediate data fault-tolerant. In: Proceedings of the ACM symposium on cloud computing, pp 181–192 Ko S, Hoque I, Cho B, Gupta I (2010) Making cloud intermediate data fault-tolerant. In: Proceedings of the ACM symposium on cloud computing, pp 181–192
9.
Zurück zum Zitat Barroso LA, Hölzle U (2009) The datacenter as a computer: an introduction to the design of Warehouse-Scale machines. Synthe Lect Comput Archit 4(1):1–108CrossRef Barroso LA, Hölzle U (2009) The datacenter as a computer: an introduction to the design of Warehouse-Scale machines. Synthe Lect Comput Archit 4(1):1–108CrossRef
10.
Zurück zum Zitat Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1099–1110 Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1099–1110
11.
Zurück zum Zitat Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of the EuroSys conference, pp 59–72 Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of the EuroSys conference, pp 59–72
12.
Zurück zum Zitat Moise D, Trieu T-T-L, Bouge L, Antoniu G (2011) Optimizing intermediate data management in MapReduce computations. In: Proceedings of the first international workshop on cloud computing platforms Moise D, Trieu T-T-L, Bouge L, Antoniu G (2011) Optimizing intermediate data management in MapReduce computations. In: Proceedings of the first international workshop on cloud computing platforms
13.
Zurück zum Zitat Jiang D, Ooi BC, Shi L, Wu S (2010) The performance of Mapreduce: an in-depth study. Proc VLDB Endowm 3(1–2):472–483CrossRef Jiang D, Ooi BC, Shi L, Wu S (2010) The performance of Mapreduce: an in-depth study. Proc VLDB Endowm 3(1–2):472–483CrossRef
14.
Zurück zum Zitat Okorafor E, Patrick MK (2012) Availability of JobTracker machine in Hadoop/MapReduce zookeeper coordinated clusters. Adv Comput: Int J 3(3):19–29 Okorafor E, Patrick MK (2012) Availability of JobTracker machine in Hadoop/MapReduce zookeeper coordinated clusters. Adv Comput: Int J 3(3):19–29
15.
Zurück zum Zitat Lin J-C, Leu F-Y, Chen Y-p (2013) Deriving job completion reliability and job energy consumption for a general MapReduce infrastructure from single-job perspective. In: The international conference on advanced information networking and applications workshops, pp 1642–1647 Lin J-C, Leu F-Y, Chen Y-p (2013) Deriving job completion reliability and job energy consumption for a general MapReduce infrastructure from single-job perspective. In: The international conference on advanced information networking and applications workshops, pp 1642–1647
16.
Zurück zum Zitat Dai Y-S, Yang B, Dongarra J, Zhang G (2009) Cloud service reliability: modeling and analysis. In: The IEEE Pacific Rim international symposium on dependable computing Dai Y-S, Yang B, Dongarra J, Zhang G (2009) Cloud service reliability: modeling and analysis. In: The IEEE Pacific Rim international symposium on dependable computing
17.
Zurück zum Zitat Dinu F, Ng TS (2012) Understanding the effects and implications of compute node related failures in Hadoop. In: Proceedings of the international symposium on high-performance parallel and distributed computing, pp 187–198 Dinu F, Ng TS (2012) Understanding the effects and implications of compute node related failures in Hadoop. In: Proceedings of the international symposium on high-performance parallel and distributed computing, pp 187–198
18.
Zurück zum Zitat Jin H, Qiao K, Sun X-H, Li Y (2011) Performance under failures of MapReduce applications. In: Proceedings of the IEEE/ACM international symposium on cluster, cloud and grid computing, pp 608–609 Jin H, Qiao K, Sun X-H, Li Y (2011) Performance under failures of MapReduce applications. In: Proceedings of the IEEE/ACM international symposium on cluster, cloud and grid computing, pp 608–609
19.
Zurück zum Zitat Liu C, Qin X, Kulkarni S, Wang C, Li S, Manzanares A, Baskiyar S (2008) Distributed energy-efficient scheduling for data-intensive applications with deadline constraints on data grids. In: The IEEE international conference on performance, computing and communications conference, pp 26–33 Liu C, Qin X, Kulkarni S, Wang C, Li S, Manzanares A, Baskiyar S (2008) Distributed energy-efficient scheduling for data-intensive applications with deadline constraints on data grids. In: The IEEE international conference on performance, computing and communications conference, pp 26–33
20.
Zurück zum Zitat Lang W, Patel JM (2010) Energy management for MapReduce clusters. Proc VLDB Endowm 3(1–2):129–139 Lang W, Patel JM (2010) Energy management for MapReduce clusters. Proc VLDB Endowm 3(1–2):129–139
21.
Zurück zum Zitat Feng B, Lu J, Zhou Y, Yang N (2012) Energy efficiency for MapReduce workloads: an in-depth study. In: Proceedings of the Australasian database conference, pp 61–69 Feng B, Lu J, Zhou Y, Yang N (2012) Energy efficiency for MapReduce workloads: an in-depth study. In: Proceedings of the Australasian database conference, pp 61–69
22.
Zurück zum Zitat White T (2009) Hadoop: the definitive guide, O’Reilly Media, Yahoo! Press, 5 June 2009 White T (2009) Hadoop: the definitive guide, O’Reilly Media, Yahoo! Press, 5 June 2009
23.
Zurück zum Zitat Wang G, Butt AR, Pandey P, Gupta K (2009) A simulation approach to evaluating design decisions in MapReduce setups. In: The international symposium on modelling, analysis and simulation of computer and telecommunication systems, pp 1–11 Wang G, Butt AR, Pandey P, Gupta K (2009) A simulation approach to evaluating design decisions in MapReduce setups. In: The international symposium on modelling, analysis and simulation of computer and telecommunication systems, pp 1–11
24.
Zurück zum Zitat Haight FA (1967) Handbook of the Poisson distribution. Wiley, New YorkMATH Haight FA (1967) Handbook of the Poisson distribution. Wiley, New YorkMATH
25.
Zurück zum Zitat Lin J-C, Leu F-Y, Chen Y-p (2013) Analyzing job completion reliability and job energy consumption for a general MapReduce infrastructure. J High Speed Netw 19(3):203–214 Lin J-C, Leu F-Y, Chen Y-p (2013) Analyzing job completion reliability and job energy consumption for a general MapReduce infrastructure. J High Speed Netw 19(3):203–214
Metadaten
Titel
Analyzing job completion reliability and job energy consumption for a heterogeneous MapReduce cluster under different intermediate-data replication policies
verfasst von
Jia-Chun Lin
Fang-Yie Leu
Ying-Ping Chen
Publikationsdatum
01.05.2015
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 5/2015
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-014-1286-7

Weitere Artikel der Ausgabe 5/2015

The Journal of Supercomputing 5/2015 Zur Ausgabe