Skip to main content
Top
Published in: The Journal of Supercomputing 7/2021

04-01-2021

Analysis and implementation of reactive fault tolerance techniques in Hadoop: a comparative study

Authors: Hassan Asghar, Babar Nazir

Published in: The Journal of Supercomputing | Issue 7/2021

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Hadoop is a state-of-the-art industry’s de facto tool for the computation of Big Data. Native fault tolerance procedure in Hadoop is dilatory and leads us towards performance degradation. Moreover, it is failed to completely consider the computational overhead and storage cost. On the other hand, the dynamic nature of MapReduce and complexity are also important parameters that affect the response time of the job. To achieve all this, it is essential to have a foolproof failure handling technique. In this paper, we have performed an analysis of notable fault tolerance techniques to see the impact of using different performance metrics under variable dataset with variable fault injections. The critical result shows that response timewise, the byzantine technique has a performance priority over the retrying and checkpointing technique in regards to killing one node failure. In addition, throughput wise, task-level byzantine fault tolerance technique once again had high priority as compared to checkpointing and retrying in terms of network disconnect failure. All in all, this comparative study highlights the strengths and weaknesses of different fault-tolerant techniques and is essential in determining the best technique in a given environment.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Jin H, Ibrahim S, Qi L, Cao H, Wu S, Shi X (2011) The mapreduce programming model and implementations. In: Cloud computing: principles and paradigms, pp 373–390 Jin H, Ibrahim S, Qi L, Cao H, Wu S, Shi X (2011) The mapreduce programming model and implementations. In: Cloud computing: principles and paradigms, pp 373–390
2.
go back to reference Borthakur D et al (2008) Hdfs architecture guide. Hadoop Apache Project 53 Borthakur D et al (2008) Hdfs architecture guide. Hadoop Apache Project 53
3.
go back to reference Madani SA, Hayat K, Li H, Khan SU, Ranjan R, Khan IA, Kolodziej J, Nazir B, Chen D, Irfan R, Wang L, Bickler G (2013) Survey on social networking services. IET Netw 2(4):224–234CrossRef Madani SA, Hayat K, Li H, Khan SU, Ranjan R, Khan IA, Kolodziej J, Nazir B, Chen D, Irfan R, Wang L, Bickler G (2013) Survey on social networking services. IET Netw 2(4):224–234CrossRef
4.
go back to reference Cowsalya T, Mugunthan S (2015) Hadoop architecture and fault tolerance based Hadoop clusters in geographically distributed data centre. ARPN J Eng Appl Sci 10(7):2818–2821 Cowsalya T, Mugunthan S (2015) Hadoop architecture and fault tolerance based Hadoop clusters in geographically distributed data centre. ARPN J Eng Appl Sci 10(7):2818–2821
5.
go back to reference Khan FG, Qureshi K, Nazir B (2010) Performance evaluation of fault tolerance techniques in grid computing system. Comput Electr Eng 36(6):1110–1122CrossRef Khan FG, Qureshi K, Nazir B (2010) Performance evaluation of fault tolerance techniques in grid computing system. Comput Electr Eng 36(6):1110–1122CrossRef
6.
go back to reference Dinu F, Ng T (2012) Understanding the effects and implications of compute node related failures in hadoop. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing. ACM, pp 187–198 Dinu F, Ng T (2012) Understanding the effects and implications of compute node related failures in hadoop. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing. ACM, pp 187–198
7.
go back to reference Schroeder B, Gibson GA (2007) Understanding failures in petascale computers. In: Journal of Physics: Conference Series, vol 78, no 1. IOP Publishing, , p 012022 Schroeder B, Gibson GA (2007) Understanding failures in petascale computers. In: Journal of Physics: Conference Series, vol 78, no 1. IOP Publishing, , p 012022
8.
go back to reference Dean J (2004) Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (San Francisco, CA, Dec. 6.8). Usenix Association, 2004 Dean J (2004) Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (San Francisco, CA, Dec. 6.8). Usenix Association, 2004
9.
go back to reference Subramanian S, Zhang Y, Vaidyanathan R, Gunawi HS, Arpaci-Dusseau AC, Arpaci-Dusseau RH, Naughton JF (2010) Impact of disk corruption on open-source DBMS. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE). IEEE, pp 509–520 Subramanian S, Zhang Y, Vaidyanathan R, Gunawi HS, Arpaci-Dusseau AC, Arpaci-Dusseau RH, Naughton JF (2010) Impact of disk corruption on open-source DBMS. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE). IEEE, pp 509–520
10.
go back to reference Yang C, Yen C, Tan C, Madden SR (2010) Osprey: implementing MapReduce-style fault tolerance in a shared-nothing distributed database. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE). IEEE, pp 657–668 Yang C, Yen C, Tan C, Madden SR (2010) Osprey: implementing MapReduce-style fault tolerance in a shared-nothing distributed database. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE). IEEE, pp 657–668
11.
go back to reference Faghri F, Bazarbayev S, Overholt M, Farivar R, Campbell RH, Sanders WH (2012) Failure scenario as a service (fsaas) for Hadoop clusters. In: Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management. ACM, p 5 Faghri F, Bazarbayev S, Overholt M, Farivar R, Campbell RH, Sanders WH (2012) Failure scenario as a service (fsaas) for Hadoop clusters. In: Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management. ACM, p 5
12.
go back to reference Sangroya A, Serrano D, Bouchenak S (2012) MRBS: towards dependability benchmarking for Hadoop mapreduce. In: European Conference on Parallel Processing. Springer, Berlin, Heidelberg, pp 3–12 Sangroya A, Serrano D, Bouchenak S (2012) MRBS: towards dependability benchmarking for Hadoop mapreduce. In: European Conference on Parallel Processing. Springer, Berlin, Heidelberg, pp 3–12
13.
go back to reference Malik S, Nazir B, Qureshi K, Khan IA (2013) A reliable checkpoint storage strategy for grid. Computing 95(7):611–632CrossRef Malik S, Nazir B, Qureshi K, Khan IA (2013) A reliable checkpoint storage strategy for grid. Computing 95(7):611–632CrossRef
14.
go back to reference Quiane-Ruiz JA, Pinkel C, Schad J, Dittrich J (2011) RAFTing MapReduce: fast recovery on the RAFT. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE). IEEE, pp 589–600 Quiane-Ruiz JA, Pinkel C, Schad J, Dittrich J (2011) RAFTing MapReduce: fast recovery on the RAFT. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE). IEEE, pp 589–600
15.
16.
go back to reference Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219CrossRef Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219CrossRef
17.
go back to reference Soualhia M, Khomh F, Tahar S (2015) Atlas: an adaptive failure-aware scheduler for Hadoop. In: 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC). IEEE, pp 1–8 Soualhia M, Khomh F, Tahar S (2015) Atlas: an adaptive failure-aware scheduler for Hadoop. In: 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC). IEEE, pp 1–8
18.
go back to reference Costa P, Pasin M, Bessani AN, Correia M (2011) Byzantine fault-tolerant mapreduce: faults are not just crashes. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, pp 32–39 Costa P, Pasin M, Bessani AN, Correia M (2011) Byzantine fault-tolerant mapreduce: faults are not just crashes. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, pp 32–39
19.
go back to reference Liu Y, Wei W (2015) A replication-based mechanism for fault tolerance in mapreduce framework. In: Mathematical problems in engineering 2015 Liu Y, Wei W (2015) A replication-based mechanism for fault tolerance in mapreduce framework. In: Mathematical problems in engineering 2015
20.
go back to reference Mustafa S, Nazir B, Hayat A, Khan AR, Madani SA (2015) Resource management in cloud computing: taxonomy, prospects, and challenges. Comput Electr Eng 47:186–203CrossRef Mustafa S, Nazir B, Hayat A, Khan AR, Madani SA (2015) Resource management in cloud computing: taxonomy, prospects, and challenges. Comput Electr Eng 47:186–203CrossRef
21.
go back to reference Kuromatsu N, Okita M, Hagihara K (2013) Evolving fault tolerance in Hadoop with robust auto-recovering JobTracker. Bull Netw Comput Syst Softw 2(1):4 Kuromatsu N, Okita M, Hagihara K (2013) Evolving fault tolerance in Hadoop with robust auto-recovering JobTracker. Bull Netw Comput Syst Softw 2(1):4
22.
go back to reference Varghese LA, Sreejith V, Bose S (2014) Enhancing NameNode fault tolerance in Hadoop over cloud environment. In: 2014 6th International Conference on Advanced Computing (ICoAC). IEEE, pp 82–85 Varghese LA, Sreejith V, Bose S (2014) Enhancing NameNode fault tolerance in Hadoop over cloud environment. In: 2014 6th International Conference on Advanced Computing (ICoAC). IEEE, pp 82–85
23.
go back to reference Song L, Wu S, Wang H, Yang Q (2014) Distributed mapreduce engine with fault tolerance. In: 2014 IEEE International Conference on Communications (ICC). IEEE, pp 3626–3630 Song L, Wu S, Wang H, Yang Q (2014) Distributed mapreduce engine with fault tolerance. In: 2014 IEEE International Conference on Communications (ICC). IEEE, pp 3626–3630
24.
go back to reference Costa PA, Bai X, Ramos FM, Correia M (2016) Medusa: an efficient cloud fault-tolerant mapreduce. In: 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, pp 443–452 Costa PA, Bai X, Ramos FM, Correia M (2016) Medusa: an efficient cloud fault-tolerant mapreduce. In: 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, pp 443–452
25.
go back to reference Bala A, Chana I (2012) Fault tolerance-challenges, techniques and implementation in cloud computing. IJCSI Int J Comput Sci Issues 9(1):1694–1814 Bala A, Chana I (2012) Fault tolerance-challenges, techniques and implementation in cloud computing. IJCSI Int J Comput Sci Issues 9(1):1694–1814
26.
go back to reference Nazir B, Qureshi K, Manuel P (2009) Adaptive checkpointing strategy to tolerate faults in economy based grid. J Supercomput 50(1):1–18CrossRef Nazir B, Qureshi K, Manuel P (2009) Adaptive checkpointing strategy to tolerate faults in economy based grid. J Supercomput 50(1):1–18CrossRef
27.
go back to reference Vernica R, Balmin A, Beyer KS, Ercegovac V (2012) Adaptive mapreduce using situation-aware mappers. In: Proceedings of the 15th International Conference on Extending Database Technology. ACM, pp 420–431 Vernica R, Balmin A, Beyer KS, Ercegovac V (2012) Adaptive mapreduce using situation-aware mappers. In: Proceedings of the 15th International Conference on Extending Database Technology. ACM, pp 420–431
28.
go back to reference Zhao D (2017) Performance comparison between Hadoop and HAMR under laboratory environment. Procedia Comput Sci 111:223–229CrossRef Zhao D (2017) Performance comparison between Hadoop and HAMR under laboratory environment. Procedia Comput Sci 111:223–229CrossRef
29.
go back to reference Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef
30.
go back to reference Chen Y, Alspaugh S, Katz R (2012) Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc VLDB Endow 5(12):1802–1813CrossRef Chen Y, Alspaugh S, Katz R (2012) Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc VLDB Endow 5(12):1802–1813CrossRef
Metadata
Title
Analysis and implementation of reactive fault tolerance techniques in Hadoop: a comparative study
Authors
Hassan Asghar
Babar Nazir
Publication date
04-01-2021
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 7/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03491-9

Other articles of this Issue 7/2021

The Journal of Supercomputing 7/2021 Go to the issue

Premium Partner