Skip to main content
Erschienen in: The Journal of Supercomputing 5/2020

15.12.2018

Fast Recovery MapReduce (FAR-MR) to accelerate failure recovery in big data applications

verfasst von: Yongqing Zhu, Juniarto Samsudin, Renuga Kanagavelu, Weiwen Zhang, Long Wang, Theint Theint Aye, Rick Siow Mong Goh

Erschienen in: The Journal of Supercomputing | Ausgabe 5/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Existing Hadoop MapReduce fault tolerance strategy causes the computing jobs suffering from high performance penalty during failure recovery. In this paper, we propose Fast Recovery MapReduce (FAR-MR) to improve MapReduce performance in failure recovery. FAR-MR includes a novel fault tolerance strategy that combines distributed checkpointing and proactive push mechanism to support fast recovery from task failure and node failure. With distributed checkpointing, computing progress of each task is recorded as checkpoints periodically and kept in distributed data storage. The recovered task can obtain the last progress of the failed task from the distributed storage during failure recovery. In addition, the proactive push mechanism enables the computing results of map tasks to be proactively transmitted to the nodes hosting reduce tasks of the same computing job. When a failure happens, the partial output results being pushed to the reducer nodes can be used by the reduce tasks without the necessity of re-compute. FAR-MR allows a failed task to be recovered efficiently at any node in the cluster. The performance evaluation has shown that the proposed FAR-MR can improve computing job performance by up to 62% and 45% compared to Hadoop MapReduce in the case of task failure recovery and node failure recovery, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2:3CrossRef Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2:3CrossRef
3.
Zurück zum Zitat Cardenas AA, Manadhata PK, Rajan SP (2013) Big data analytics for security. IEEE Secur Priv 11(6):74–76CrossRef Cardenas AA, Manadhata PK, Rajan SP (2013) Big data analytics for security. IEEE Secur Priv 11(6):74–76CrossRef
4.
Zurück zum Zitat Zhu Y, Juniarto S, Shi H, Wang J (2015) VH-DSI: speeding up data visualization via a heterogeneous distributed storage infrastructure. In: Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems (ICPADS 2015), pp 658–665 Zhu Y, Juniarto S, Shi H, Wang J (2015) VH-DSI: speeding up data visualization via a heterogeneous distributed storage infrastructure. In: Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems (ICPADS 2015), pp 658–665
6.
Zurück zum Zitat Dean J, Ghemawat S (2008) Map-Reduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef Dean J, Ghemawat S (2008) Map-Reduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef
8.
Zurück zum Zitat Rahman MT, Gabriel E, Subhlok J (2017) Performance implications of failures on MapReduce applications. In: Proceedings of 2017 IEEE International Conference on Cluster Computing, pp 741–748 Rahman MT, Gabriel E, Subhlok J (2017) Performance implications of failures on MapReduce applications. In: Proceedings of 2017 IEEE International Conference on Cluster Computing, pp 741–748
9.
Zurück zum Zitat Yang C, Yen C, Tan C, Madden SR (2010) Osprey: implementing MapReduce-style fault tolerance in a shared-nothing distributed database. In: Proceedings of IEEE ICDE, pp 657–668 Yang C, Yen C, Tan C, Madden SR (2010) Osprey: implementing MapReduce-style fault tolerance in a shared-nothing distributed database. In: Proceedings of IEEE ICDE, pp 657–668
10.
Zurück zum Zitat Wang G, Butt AR, Pandey P, Gupta K (2009) A simulation approach to evaluating design decisions in MapReduce setups. In: Proceedings of IEEE/ACM MASCOTS, pp 1–11 Wang G, Butt AR, Pandey P, Gupta K (2009) A simulation approach to evaluating design decisions in MapReduce setups. In: Proceedings of IEEE/ACM MASCOTS, pp 1–11
11.
Zurück zum Zitat Khalil S, Salem SA, Nassar S, Saad EM (2013) MapReduce performance in heterogeneous environments: a review. Int J Sci Eng Res 4(4):410–416 Khalil S, Salem SA, Nassar S, Saad EM (2013) MapReduce performance in heterogeneous environments: a review. Int J Sci Eng Res 4(4):410–416
12.
Zurück zum Zitat Carlson JL (2013) Redis in action. Manning Publications, Greenwich Carlson JL (2013) Redis in action. Manning Publications, Greenwich
13.
Zurück zum Zitat Fitzpatrick B (2004) Distributed caching with memcached. Linux J 2004(124):72–78 Fitzpatrick B (2004) Distributed caching with memcached. Linux J 2004(124):72–78
14.
Zurück zum Zitat Chervenak A, Foster I, Kesselman C, Salisbury C, Tuecke S (2000) The data grid: towards an architecture for the distributed management and analysis of large scientific data sets. J Netw Comput Appl 23:187CrossRef Chervenak A, Foster I, Kesselman C, Salisbury C, Tuecke S (2000) The data grid: towards an architecture for the distributed management and analysis of large scientific data sets. J Netw Comput Appl 23:187CrossRef
18.
Zurück zum Zitat Treaster M (2005) A survey of Fault-tolerance and Fault-recovery techniques in parallel systems. Technical Report cs.DC/0501002, ACM Computing Research Repository (CoRR) Treaster M (2005) A survey of Fault-tolerance and Fault-recovery techniques in parallel systems. Technical Report cs.DC/0501002, ACM Computing Research Repository (CoRR)
19.
Zurück zum Zitat Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I (2008) Improving MapReduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, USA, pp 29–42 Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I (2008) Improving MapReduce performance in heterogeneous environments. In: Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, USA, pp 29–42
20.
Zurück zum Zitat Chen Q, Zhang D, Guo M, Deng Q, Guo S (2010) SAMR: a selfadaptive MapReduce scheduling algorithm in heterogeneous environment. In: Proceedings of the IEEE 10th International Conference on Computer and Information Technology, pp 2736–2743 Chen Q, Zhang D, Guo M, Deng Q, Guo S (2010) SAMR: a selfadaptive MapReduce scheduling algorithm in heterogeneous environment. In: Proceedings of the IEEE 10th International Conference on Computer and Information Technology, pp 2736–2743
21.
Zurück zum Zitat Ananthanarayanan G, Kandula S, Greenberg A, Stoica I, Lu Y, Saha B, Harris E (2010) Reining in the outliers in map-reduce clusters using Mantri. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, USA, pp 1–16 Ananthanarayanan G, Kandula S, Greenberg A, Stoica I, Lu Y, Saha B, Harris E (2010) Reining in the outliers in map-reduce clusters using Mantri. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, USA, pp 1–16
22.
Zurück zum Zitat Wang Y, Fu H, Yu W (2015) Cracking down MapReduce failure amplification through analytics logging and migration. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS’15), pp 261–270 Wang Y, Fu H, Yu W (2015) Cracking down MapReduce failure amplification through analytics logging and migration. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium (IPDPS’15), pp 261–270
23.
Zurück zum Zitat Gates A et al (2009) Building a highlevel dataflow system on top of MapReduce: the pig experience. PVLDB 2(2):1414 Gates A et al (2009) Building a highlevel dataflow system on top of MapReduce: the pig experience. PVLDB 2(2):1414
24.
Zurück zum Zitat Thusoo A et al (2009) Hive—a warehousing solution over a Map-Reduce framework. PVLDB 2(2):1626 Thusoo A et al (2009) Hive—a warehousing solution over a Map-Reduce framework. PVLDB 2(2):1626
25.
Zurück zum Zitat Balazinska M, Balakrishnan H, Madden SR, Stonebraker M (2008) Fault-tolerance in the borealis distributed stream processing system. ACM Trans Database Syst 33(1):3CrossRef Balazinska M, Balakrishnan H, Madden SR, Stonebraker M (2008) Fault-tolerance in the borealis distributed stream processing system. ACM Trans Database Syst 33(1):3CrossRef
26.
Zurück zum Zitat Hwang J-H, Xing Y, Cetintemel U, Zdonik S (2007) A cooperative, self-configuring high-availability solution for stream processing. In: Proceedings of the IEEE 23rd International Conference on Data Engineering, pp 176–185 Hwang J-H, Xing Y, Cetintemel U, Zdonik S (2007) A cooperative, self-configuring high-availability solution for stream processing. In: Proceedings of the IEEE 23rd International Conference on Data Engineering, pp 176–185
27.
Zurück zum Zitat Liedes A-P, Wolski A (2006) SIREN: a memory-conserving, snapshot-consistent checkpoint algorithm for in-memory databases. In: Proceedings of the 22nd International Conference on Data Engineering, pp 99–99 Liedes A-P, Wolski A (2006) SIREN: a memory-conserving, snapshot-consistent checkpoint algorithm for in-memory databases. In: Proceedings of the 22nd International Conference on Data Engineering, pp 99–99
28.
Zurück zum Zitat Quiané-Ruiz J-A, Pinkel C, Schad J (2011) RAFTing MapReduce: fast recovery on the RAFT. In: Proceedings of the IEEE 27th International Conference on Data Engineering (ICDE’11), pp 589–600 Quiané-Ruiz J-A, Pinkel C, Schad J (2011) RAFTing MapReduce: fast recovery on the RAFT. In: Proceedings of the IEEE 27th International Conference on Data Engineering (ICDE’11), pp 589–600
29.
Zurück zum Zitat Lin C-Y, Chen T-H, Cheng Y-N (2013) On improving fault tolerance for heterogeneous Hadoop MapReduce clusters. In: Proceedings of 2013 IEEE International Conference on Cloud Computing and Big Data, pp 38–43 Lin C-Y, Chen T-H, Cheng Y-N (2013) On improving fault tolerance for heterogeneous Hadoop MapReduce clusters. In: Proceedings of 2013 IEEE International Conference on Cloud Computing and Big Data, pp 38–43
30.
Zurück zum Zitat Wang H, Chen H, Zhenwei D, Fei H (2016) BeTL: MapReduce checkpoint tactics beneath the task level. IEEE Trans Serv Comput 9:84–95 Wang H, Chen H, Zhenwei D, Fei H (2016) BeTL: MapReduce checkpoint tactics beneath the task level. IEEE Trans Serv Comput 9:84–95
31.
Zurück zum Zitat Wang H, Chen H, Hu F (2014) Rect: improving MapReduce performance under failures with resilient checkpointing tactics. In: Proceedings of the IEEE International Conference Big Data (Big Data), pp 27–32 Wang H, Chen H, Hu F (2014) Rect: improving MapReduce performance under failures with resilient checkpointing tactics. In: Proceedings of the IEEE International Conference Big Data (Big Data), pp 27–32
Metadaten
Titel
Fast Recovery MapReduce (FAR-MR) to accelerate failure recovery in big data applications
verfasst von
Yongqing Zhu
Juniarto Samsudin
Renuga Kanagavelu
Weiwen Zhang
Long Wang
Theint Theint Aye
Rick Siow Mong Goh
Publikationsdatum
15.12.2018
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 5/2020
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-018-2716-8

Weitere Artikel der Ausgabe 5/2020

The Journal of Supercomputing 5/2020 Zur Ausgabe