Top

The Journal of Supercomputing

Published in:

04-01-2021

Analysis and implementation of reactive fault tolerance techniques in Hadoop: a comparative study

Authors: Hassan Asghar, Babar Nazir

Published in: The Journal of Supercomputing | Issue 7/2021

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Hadoop is a state-of-the-art industry’s de facto tool for the computation of Big Data. Native fault tolerance procedure in Hadoop is dilatory and leads us towards performance degradation. Moreover, it is failed to completely consider the computational overhead and storage cost. On the other hand, the dynamic nature of MapReduce and complexity are also important parameters that affect the response time of the job. To achieve all this, it is essential to have a foolproof failure handling technique. In this paper, we have performed an analysis of notable fault tolerance techniques to see the impact of using different performance metrics under variable dataset with variable fault injections. The critical result shows that response timewise, the byzantine technique has a performance priority over the retrying and checkpointing technique in regards to killing one node failure. In addition, throughput wise, task-level byzantine fault tolerance technique once again had high priority as compared to checkpointing and retrying in terms of network disconnect failure. All in all, this comparative study highlights the strengths and weaknesses of different fault-tolerant techniques and is essential in determining the best technique in a given environment.

previous article Construction and verification of color fundus image retinal vessels segmentation algorithm under BP neural network

next article Correction to: Analysis and implementation of reactive fault tolerance techniques in Hadoop: a comparative study

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Jin H, Ibrahim S, Qi L, Cao H, Wu S, Shi X (2011) The mapreduce programming model and implementations. In: Cloud computing: principles and paradigms, pp 373–390

Borthakur D et al (2008) Hdfs architecture guide. Hadoop Apache Project 53

Madani SA, Hayat K, Li H, Khan SU, Ranjan R, Khan IA, Kolodziej J, Nazir B, Chen D, Irfan R, Wang L, Bickler G (2013) Survey on social networking services. IET Netw 2(4):224–234CrossRef

Cowsalya T, Mugunthan S (2015) Hadoop architecture and fault tolerance based Hadoop clusters in geographically distributed data centre. ARPN J Eng Appl Sci 10(7):2818–2821

Khan FG, Qureshi K, Nazir B (2010) Performance evaluation of fault tolerance techniques in grid computing system. Comput Electr Eng 36(6):1110–1122CrossRef

Dinu F, Ng T (2012) Understanding the effects and implications of compute node related failures in hadoop. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing. ACM, pp 187–198

Schroeder B, Gibson GA (2007) Understanding failures in petascale computers. In: Journal of Physics: Conference Series, vol 78, no 1. IOP Publishing, , p 012022

Dean J (2004) Simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating System Design and Implementation (San Francisco, CA, Dec. 6.8). Usenix Association, 2004

Subramanian S, Zhang Y, Vaidyanathan R, Gunawi HS, Arpaci-Dusseau AC, Arpaci-Dusseau RH, Naughton JF (2010) Impact of disk corruption on open-source DBMS. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE). IEEE, pp 509–520

10.

Yang C, Yen C, Tan C, Madden SR (2010) Osprey: implementing MapReduce-style fault tolerance in a shared-nothing distributed database. In: 2010 IEEE 26th International Conference on Data Engineering (ICDE). IEEE, pp 657–668

11.

Faghri F, Bazarbayev S, Overholt M, Farivar R, Campbell RH, Sanders WH (2012) Failure scenario as a service (fsaas) for Hadoop clusters. In: Proceedings of the Workshop on Secure and Dependable Middleware for Cloud Monitoring and Management. ACM, p 5

12.

Sangroya A, Serrano D, Bouchenak S (2012) MRBS: towards dependability benchmarking for Hadoop mapreduce. In: European Conference on Parallel Processing. Springer, Berlin, Heidelberg, pp 3–12

13.

Malik S, Nazir B, Qureshi K, Khan IA (2013) A reliable checkpoint storage strategy for grid. Computing 95(7):611–632CrossRef

14.

Quiane-Ruiz JA, Pinkel C, Schad J, Dittrich J (2011) RAFTing MapReduce: fast recovery on the RAFT. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE). IEEE, pp 589–600

15.

Hu P, Dai W (2014) Enhancing fault tolerance based on Hadoop cluster. Int J Database Theory Appl 7(1):37–48MathSciNetCrossRef

16.

Yildiz O, Ibrahim S, Antoniu G (2017) Enabling fast failure recovery in shared Hadoop clusters: towards failure-aware scheduling. Future Gener Comput Syst 74:208–219CrossRef

17.

Soualhia M, Khomh F, Tahar S (2015) Atlas: an adaptive failure-aware scheduler for Hadoop. In: 2015 IEEE 34th International Performance Computing and Communications Conference (IPCCC). IEEE, pp 1–8

18.

Costa P, Pasin M, Bessani AN, Correia M (2011) Byzantine fault-tolerant mapreduce: faults are not just crashes. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, pp 32–39

19.

Liu Y, Wei W (2015) A replication-based mechanism for fault tolerance in mapreduce framework. In: Mathematical problems in engineering 2015

20.

Mustafa S, Nazir B, Hayat A, Khan AR, Madani SA (2015) Resource management in cloud computing: taxonomy, prospects, and challenges. Comput Electr Eng 47:186–203CrossRef

21.

Kuromatsu N, Okita M, Hagihara K (2013) Evolving fault tolerance in Hadoop with robust auto-recovering JobTracker. Bull Netw Comput Syst Softw 2(1):4

22.

Varghese LA, Sreejith V, Bose S (2014) Enhancing NameNode fault tolerance in Hadoop over cloud environment. In: 2014 6th International Conference on Advanced Computing (ICoAC). IEEE, pp 82–85

23.

Song L, Wu S, Wang H, Yang Q (2014) Distributed mapreduce engine with fault tolerance. In: 2014 IEEE International Conference on Communications (ICC). IEEE, pp 3626–3630

24.

Costa PA, Bai X, Ramos FM, Correia M (2016) Medusa: an efficient cloud fault-tolerant mapreduce. In: 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid). IEEE, pp 443–452

25.

Bala A, Chana I (2012) Fault tolerance-challenges, techniques and implementation in cloud computing. IJCSI Int J Comput Sci Issues 9(1):1694–1814

26.

Nazir B, Qureshi K, Manuel P (2009) Adaptive checkpointing strategy to tolerate faults in economy based grid. J Supercomput 50(1):1–18CrossRef

27.

Vernica R, Balmin A, Beyer KS, Ercegovac V (2012) Adaptive mapreduce using situation-aware mappers. In: Proceedings of the 15th International Conference on Extending Database Technology. ACM, pp 420–431

28.

Zhao D (2017) Performance comparison between Hadoop and HAMR under laboratory environment. Procedia Comput Sci 111:223–229CrossRef

29.

Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef

30.

Chen Y, Alspaugh S, Katz R (2012) Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads. Proc VLDB Endow 5(12):1802–1813CrossRef

31.

david78k, Jul 2013. david78k/anarchyape. https://github.com/david78k/anarchyape. Accessed 16 Jan 2017

32.

Bouchenak S, Sangroya A (2016) MRBS—Hadoop MapReduce dependability and performance benchmarking. Mrbs.gforge.liris.cnrs.fr. https://mrbs.gforge.liris.cnrs.fr/um_configuring.php. Accessed 12 Nov 2017

33.

Noll MG (2011) Michael G. Noll. Benchmarking and Stress Testing an Hadoop Cluster with TeraSort, TestDFSIO & Co.—Michael G. Noll. www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/. Accessed 16 Jan 2017

Title: Analysis and implementation of reactive fault tolerance techniques in Hadoop: a comparative study
Authors: Hassan Asghar
Babar Nazir
Publication date: 04-01-2021
Publisher: Springer US
Published in: The Journal of Supercomputing / Issue 7/2021
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-020-03491-9

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 7/2021

Fair and near-optimal coflow scheduling without prior knowledge of coflow size

High-performance analysis of interleaved high-gain converter with active switched inductor using intelligent controller

A blockchain-based intelligent anti-switch package in tracing logistics system

Dynamic swarm class rebalancing for the process mining of rare events

Distributed L-diversity using spark-based algorithm for large resource description frameworks data

Enhancing the identification accuracy of deep learning object detection using natural language processing

Premium Partner