Skip to main content
Top
Published in: The Journal of Supercomputing 14/2023

26-04-2023

Efficient data persistence and data division for distributed computing in cloud data center networks

Authors: Xi Wang, Xinzhi Hu, Weibei Fan, Ruchuan Wang

Published in: The Journal of Supercomputing | Issue 14/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Container-based Hadoop distributed file system (HDFS) storage has been widely used in cloud data center networks, while traditional HDFS has single point problem resulting in overall unavailability. In this paper, we mainly study the storage reliability of the Docker container-based HDFS cluster with single point of failure. Firstly, we investigate a data volume-based persistence solution of Hadoop with the single point failure and single backup strategy of HDFS cluster. Secondly, we propose an HDFS-based replica placement algorithm for data storage with considering the performance of the host and container nodes. Thirdly, we design the KADC-KNN data segmentation algorithm to effectively store the persistent data of the Docker container. Extensive experimental results show that this method can effectively ensure the stable storage and fast migration of cluster data. Compared with the most advanced algorithm, the proposed data volume persistence algorithm DVPS can improve the data reliability by 19.8%. The data partitioning algorithm KADC-KNN improves the partitioning accuracy by 20.2% and has lower time overhead.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Mostafa S, Tavassolipour A, Motahari M, Taghi MS (2019) Learning of gaussian processes in distributed and communication limited systems. IEEE Trans Pattern Anal Mach Intell 42(8):1928–1941MATH Mostafa S, Tavassolipour A, Motahari M, Taghi MS (2019) Learning of gaussian processes in distributed and communication limited systems. IEEE Trans Pattern Anal Mach Intell 42(8):1928–1941MATH
2.
go back to reference Jones KJ, Alli M (2021) Data aware caching using map reduce framework. Int J Comput Appl 7(1):1797–2250 Jones KJ, Alli M (2021) Data aware caching using map reduce framework. Int J Comput Appl 7(1):1797–2250
3.
go back to reference Chen X, Huo H, Huan J, Vitter JS, Zou L (2021) Msq-index: a succinct index for fast graph similarity search. IEEE Trans Knowl Data Eng 33(6):2654–2668CrossRef Chen X, Huo H, Huan J, Vitter JS, Zou L (2021) Msq-index: a succinct index for fast graph similarity search. IEEE Trans Knowl Data Eng 33(6):2654–2668CrossRef
4.
go back to reference Elkawkagy M, Elbeh H (2020) High performance hadoop distributed file system. Int J Network Distrib Comput 8(3):119–123CrossRef Elkawkagy M, Elbeh H (2020) High performance hadoop distributed file system. Int J Network Distrib Comput 8(3):119–123CrossRef
5.
go back to reference Fan W, Han Z, Li P, Zhou J, Fan J, Wang R (2019) A live migration algorithm for containers based on resource locality. J Signal Process Syst 91(10):1077–1089CrossRef Fan W, Han Z, Li P, Zhou J, Fan J, Wang R (2019) A live migration algorithm for containers based on resource locality. J Signal Process Syst 91(10):1077–1089CrossRef
6.
go back to reference Gemayel N (2016) Analyzing google file system and Hadoop distributed file system. Res J Inf Technol 8(3):66–74 Gemayel N (2016) Analyzing google file system and Hadoop distributed file system. Res J Inf Technol 8(3):66–74
7.
go back to reference Kalid S, Syed A, Mohammad A, Halgamuge M (2017) Big-data NoSQL databases: comparison and analysis of “Big-Table”, “DynamoDB”, and “Cassandra”. In: IEEE 2nd International Conference on Big Data Analysis (ICBDA’17), pp. 89–93 Kalid S, Syed A, Mohammad A, Halgamuge M (2017) Big-data NoSQL databases: comparison and analysis of “Big-Table”, “DynamoDB”, and “Cassandra”. In: IEEE 2nd International Conference on Big Data Analysis (ICBDA’17), pp. 89–93
8.
go back to reference Chen D, Zhang R (2022) An open source project for tuning and analyzing mapreduce performance in Hadoop and Spark. IEEE Softw 39(1):61–69CrossRef Chen D, Zhang R (2022) An open source project for tuning and analyzing mapreduce performance in Hadoop and Spark. IEEE Softw 39(1):61–69CrossRef
9.
go back to reference Fan W, Xiao F, Fan J, Han Z, Sun L, Wang R (2023) Fault-tolerant routing with load balancing in LeTQ networks. IEEE Trans Depend Secure Comput 20(1):68–82CrossRef Fan W, Xiao F, Fan J, Han Z, Sun L, Wang R (2023) Fault-tolerant routing with load balancing in LeTQ networks. IEEE Trans Depend Secure Comput 20(1):68–82CrossRef
10.
go back to reference Zhang H, Zhou R (2017) The analysis and optimization of decision tree based on ID3 algorithm. In: 9th International Conference on Modelling, Identification and Control (ICMIC), pp 924–928 Zhang H, Zhou R (2017) The analysis and optimization of decision tree based on ID3 algorithm. In: 9th International Conference on Modelling, Identification and Control (ICMIC), pp 924–928
11.
go back to reference Fan W, He J, Guo M, Li P, Han Z, Wang R (2020) Privacy preserving classification on local differential privacy in data centers. J Parallel Distrib Comput 135:70–82CrossRef Fan W, He J, Guo M, Li P, Han Z, Wang R (2020) Privacy preserving classification on local differential privacy in data centers. J Parallel Distrib Comput 135:70–82CrossRef
12.
go back to reference Das S, Kumar Kolya A (2017) Sense GST: text mining and sentiment analysis of GST tweets by Naive Bayes algorithm. In: Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), pp 239–244 Das S, Kumar Kolya A (2017) Sense GST: text mining and sentiment analysis of GST tweets by Naive Bayes algorithm. In: Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), pp 239–244
13.
go back to reference Huang J, Wei Y, Yi J et al (2018) An improved kNN based on class contribution and feature weighting. In: 10th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp 313–316 Huang J, Wei Y, Yi J et al (2018) An improved kNN based on class contribution and feature weighting. In: 10th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp 313–316
14.
go back to reference Konovalenko I, Ludwig A (2022) Generating decision support for alarm processing in cold supply chains using a hybrid \(k\)-nn algorithm. Expert Syst Appl 190:1–15CrossRef Konovalenko I, Ludwig A (2022) Generating decision support for alarm processing in cold supply chains using a hybrid \(k\)-nn algorithm. Expert Syst Appl 190:1–15CrossRef
15.
go back to reference Xu B, Fu Y, Jiang YG, Li B, Sigal L (2018) Heterogeneous knowledge transfer in video emotion recognition, attribution and summarization. IEEE Trans Affect Comput 9(2):255–270CrossRef Xu B, Fu Y, Jiang YG, Li B, Sigal L (2018) Heterogeneous knowledge transfer in video emotion recognition, attribution and summarization. IEEE Trans Affect Comput 9(2):255–270CrossRef
16.
go back to reference Triguero I, Maillo J, Luengo J et al (2017) From big data to smart data with the \(k\)-nearest neighbours algorithm. In: IEEE International Conference on Internet of Things, pp. 859–864 Triguero I, Maillo J, Luengo J et al (2017) From big data to smart data with the \(k\)-nearest neighbours algorithm. In: IEEE International Conference on Internet of Things, pp. 859–864
17.
go back to reference Fan W, Han Z, Wang R (2018) An evaluation model and benchmark for parallel computing frameworks. Mob Inf Syst 1–14 Fan W, Han Z, Wang R (2018) An evaluation model and benchmark for parallel computing frameworks. Mob Inf Syst 1–14
18.
go back to reference Fan W, Xiao F, Chen X, Cui L, Yu S (2021) Efficient virtual network embedding of cloud-based data center networks into optical networks. IEEE Trans Parallel Distrib Syst 32(11):2793–2808CrossRef Fan W, Xiao F, Chen X, Cui L, Yu S (2021) Efficient virtual network embedding of cloud-based data center networks into optical networks. IEEE Trans Parallel Distrib Syst 32(11):2793–2808CrossRef
19.
go back to reference Schmitz C, Peled G, Koren O (2021). Small files in HDFS and their impact on Hadoop performance. In The 23rd International Conference on Information Integration and Web Intelligence, pp 385–390 Schmitz C, Peled G, Koren O (2021). Small files in HDFS and their impact on Hadoop performance. In The 23rd International Conference on Information Integration and Web Intelligence, pp 385–390
20.
go back to reference Fan W, Fan J, Zhang Y, Han Z, Chen G (2022) Communication and performance evaluation of 3-ary \(n\)-cubes onto network-on-chips. Sci China Inf Sci 65:179101–179104MathSciNetCrossRef Fan W, Fan J, Zhang Y, Han Z, Chen G (2022) Communication and performance evaluation of 3-ary \(n\)-cubes onto network-on-chips. Sci China Inf Sci 65:179101–179104MathSciNetCrossRef
21.
go back to reference Fan W, He J, Han Z, Li P, Wang R (2020) Intelligent resource scheduling based on locality principle in data center networks. IEEE Commun Mag 58(10):94–100CrossRef Fan W, He J, Han Z, Li P, Wang R (2020) Intelligent resource scheduling based on locality principle in data center networks. IEEE Commun Mag 58(10):94–100CrossRef
22.
go back to reference Usman AM, Haider S (2022) A flexible framework for diverse multi-robot task allocation scenarios including multi-tasking. ACM Trans Auton Adapt Syst 16(1):1–23 Usman AM, Haider S (2022) A flexible framework for diverse multi-robot task allocation scenarios including multi-tasking. ACM Trans Auton Adapt Syst 16(1):1–23
23.
go back to reference Pradeep Kumar S, Aswini A, Kavithadevi M, Ramya S (2017) Improvised dedupication with keys and chunks in HDFS storage. In: Third International Conference on Science Technology Engineering and Management (ICONSTEM), pp 226–230 Pradeep Kumar S, Aswini A, Kavithadevi M, Ramya S (2017) Improvised dedupication with keys and chunks in HDFS storage. In: Third International Conference on Science Technology Engineering and Management (ICONSTEM), pp 226–230
24.
go back to reference Liu J, Wang P, Zhou J, Li K (2019) Mctar: a multi-trigger checkpointing tactic for fast task recovery in mapreduce. IEEE Trans Serv Comput 14(6):1824–1836CrossRef Liu J, Wang P, Zhou J, Li K (2019) Mctar: a multi-trigger checkpointing tactic for fast task recovery in mapreduce. IEEE Trans Serv Comput 14(6):1824–1836CrossRef
25.
go back to reference Zhou J, Chen Y, Wang W, He S, Meng D (2020) A highly reliable metadata service for large-scale distributed file systems. IEEE Trans Parallel Distrib Syst 31(2):374–392CrossRef Zhou J, Chen Y, Wang W, He S, Meng D (2020) A highly reliable metadata service for large-scale distributed file systems. IEEE Trans Parallel Distrib Syst 31(2):374–392CrossRef
26.
go back to reference Wang X, Lee B, Qiao Y (2016) Experimental evaluation of memory configurations of Hadoop in Docker environments. In 2016 27th Irish Signals and Systems Conference (ISSC), pp 1–6 Wang X, Lee B, Qiao Y (2016) Experimental evaluation of memory configurations of Hadoop in Docker environments. In 2016 27th Irish Signals and Systems Conference (ISSC), pp 1–6
27.
go back to reference Lin CY, Lin YC (2015) A load-balancing algorithm for Hadoop distributed file system. In: International Conference on Network Based Information Systems, pp 173–179 Lin CY, Lin YC (2015) A load-balancing algorithm for Hadoop distributed file system. In: International Conference on Network Based Information Systems, pp 173–179
28.
go back to reference Islam NS, Wasi-ur-Rahman M, Lu X, et al (2016) Efficient data access strategies for hadoop and spark on HPC cluster with heterogeneous storage. In: IEEE International Conference on Big Data, pp 223–232 Islam NS, Wasi-ur-Rahman M, Lu X, et al (2016) Efficient data access strategies for hadoop and spark on HPC cluster with heterogeneous storage. In: IEEE International Conference on Big Data, pp 223–232
29.
go back to reference Sun D (2021) Efficient text feature extraction by integrating the average linkage and K-medoids clustering. Mod Phys Lett B 35(09):2150151MathSciNetCrossRef Sun D (2021) Efficient text feature extraction by integrating the average linkage and K-medoids clustering. Mod Phys Lett B 35(09):2150151MathSciNetCrossRef
30.
go back to reference Deng Z, Zhu X, Cheng D et al (2016) Efficient kNN classification algorithm for big data. Neurocomputing 195:143–148CrossRef Deng Z, Zhu X, Cheng D et al (2016) Efficient kNN classification algorithm for big data. Neurocomputing 195:143–148CrossRef
31.
go back to reference Chen W, Chen S, Zhang H, Wu T (2017) A hybrid prediction model for type 2 diabetes using \(k\)-means and decision tree. In: 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp 386–390 Chen W, Chen S, Zhang H, Wu T (2017) A hybrid prediction model for type 2 diabetes using \(k\)-means and decision tree. In: 8th IEEE International Conference on Software Engineering and Service Science (ICSESS), pp 386–390
32.
go back to reference Gallego AJ, Calvo-Zaragoza J, Valero-Mas JJ et al (2014) Clustering-based \(k\)-nearest neighbor classification for large-scale data with neural codes representation. Pattern Recogn 74:443–531 Gallego AJ, Calvo-Zaragoza J, Valero-Mas JJ et al (2014) Clustering-based \(k\)-nearest neighbor classification for large-scale data with neural codes representation. Pattern Recogn 74:443–531
33.
go back to reference Zhang X, Wang L, Huang Z, Xie H, Zhang Y, Ngulube M (2022) ConeSSD: a novel policy to optimize the performance of HDFS heterogeneous storage. In: 2022 IEEE 24th International Conference on High Performance Computing and Communications; 8th International Conference on Data Science and Systems; 20th International Conference on Smart City; 8th International Conference on Dependability in Sensor, Cloud and Big Data Systems and Application (HPCC/DSS/SmartCity/DependSys), pp 876–881 Zhang X, Wang L, Huang Z, Xie H, Zhang Y, Ngulube M (2022) ConeSSD: a novel policy to optimize the performance of HDFS heterogeneous storage. In: 2022 IEEE 24th International Conference on High Performance Computing and Communications; 8th International Conference on Data Science and Systems; 20th International Conference on Smart City; 8th International Conference on Dependability in Sensor, Cloud and Big Data Systems and Application (HPCC/DSS/SmartCity/DependSys), pp 876–881
34.
go back to reference Dai W, Ibrahim I, Bassiouni M (2017) An improved replica placement policy for Hadoop distributed file system running on cloud platforms. In: IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud), pp 270–275 Dai W, Ibrahim I, Bassiouni M (2017) An improved replica placement policy for Hadoop distributed file system running on cloud platforms. In: IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud), pp 270–275
Metadata
Title
Efficient data persistence and data division for distributed computing in cloud data center networks
Authors
Xi Wang
Xinzhi Hu
Weibei Fan
Ruchuan Wang
Publication date
26-04-2023
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 14/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-023-05276-2

Other articles of this Issue 14/2023

The Journal of Supercomputing 14/2023 Go to the issue

Premium Partner