nach oben

Evolutionary Intelligence

Erschienen in:

02.04.2020 | Special Issue

Distributed deduplication with fingerprint index management model for big data storage in the cloud

verfasst von: S. Sabeetha Saraswathi, N. Malarvizhi

Erschienen in: Evolutionary Intelligence | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

As data progressively grows within data centers, the cloud storage models face several issues while storing data and offers abilities needed to shift data in an adequate time frame. This study aims to develop a distributed deduplication model to achieve scalable throughput and capacity utilizing many data servers for duplicating data in parallel with minimal loss. This paper proposes a new cloud storage model based on a distributed deduplication with the fingerprint index management (DDFI) model. The DDFI model operates on three main stages. At the initial stage, the DDFI model makes use of an effective routing technique depending upon the similarity level of the data, which leads to low network overhead by rapid identification of storage locations. In the second stage, the duplicate data identification procedure is carried out by the use of the MD5 algorithm. At the final stage, a fingerprint index management process is executed where a fingerprint index comprises fingerprints and its corresponding position details of every written chunk. For optimizing the results of the deduplication performance, the DDFI model manages the fingerprint index in storage space and only sometimes writes to disk at the same time as the cloud database scheme is idle. The simulation outcome exhibited that the presented DDFI model offered maximum results with a higher deduplication ratio (DR) with a minimum overhead of network bandwidth. From the detailed comparative analysis, it is inferred that the presented DFFI model offered maximum relative DR, maximum duplication performance, minimum read bandwidth, and write bandwidth.

Vorheriger Artikel Feedback-based fuzzy resource management in IoT using fog computing

Nächster Artikel Hybrid encryption framework for securing big data storage in multi-cloud environment

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Biggar H (2012) Experiencing data de-duplication: improving efficiency and reducing capacity requirements. White paper, Feb. 2007. The Enterprise Strategy Group, Dublin

Kubiatowicz J, Bindel D, Chen Y et al (2000) Oceanstore: an architecture for global-scale persistent storage. ACM Sigplan Not 35(11):190–201CrossRef

Quinlan S, Dorward S (2002) Venti: a new approach to archival storage. In: Proceedings of the conference on file and storage technologies, vol 2, pp 89–101

Lillibridge M, Eshghi K, Bhagwat D et al (2009) Sparse indexing: large scale, inline deduplication using sampling and locality In: Proceedings of the conference on file and storage technologies, vol 9, pp 111–123

Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of compression complexity sequences, pp 21–29

Debnath B, Sengupta S, Li J (2010) ChunkStash: speeding up inline storage deduplication using flash memory. In: Proceedings of conference on USENIX annual technical conference, pp 16–16

EMC Data Domain Global Deduplication Array. https://www.datadomain.com/products/global-deduplication-array.html. Visited in 2015

Dubnicki C, Gryz L, Heldt L et al (2009) HYDRAstor: a scalable secondary storage. In: FAST, vol 9, pp 197–210

Dong W, Douglis F, Li K et al (2011) Tradeoffs in scalable data routing for deduplication clusters. In: Proceedings of the conference on file and storage technologies, pp 15–29

10.

Wang L, Zhu Z, Zhang X, Dong X, Wang Y (2017) DOMe: a deduplication optimization method for the NewSQL database backups. PLoS ONE 12(10):e0185189CrossRef

11.

Luo S, Zhang G, Wu C, Khan S, Li K (2015) Boafft: distributed deduplication for big data storage in the cloud. IEEE Trans Cloud Comput 61:1–13

12.

Li M, Zhang H, Wu Y, Zhao C (2019) Prefetch-aware fingerprint cache management for data deduplication systems. Front Comput Sci 13(3):500–515CrossRef

13.

Muthitacharoen A, Chen B, Mazieres D (2001) A low-bandwidth network file system. ACM SIGOPS Oper Syst Rev 35(5):174–187CrossRef

14.

Vijayan MK, Kochunni JO, Attarde DR, Ankireddypalle RR, CommVault Systems Inc (2019) Deduplication replication in a distributed deduplication data storage system. U.S. patent application 16/232,950

15.

Thakur MA, Bari S, Deshmukh R, Auty S (2020) Secure key agreement model for group data sharing and achieving data deduplication in cloud computing. In Information and communication technology for sustainable development. Springer, Singapore, pp 121–127

16.

Hema S, Kangaiammal A (2019) Distributed storage hash algorithm (DSHA) for file-based deduplication in cloud computing. In: International conference on computer networks and inventive communication technologies. Springer, Cham, pp 572–581

17.

An B, Li Y, Ma J, Huang G, Chen X, Cao D (2019) DCStore: a deduplication-based cloud-of-clouds storage service. In: 2019 IEEE international conference on web services (ICWS). IEEE, pp 291–295

18.

Yuan H, Chen X, Li J, Jiang T, Wang J, Deng R (2019) Secure cloud data deduplication with efficient re-encryption. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2019.2948007CrossRef

Titel: Distributed deduplication with fingerprint index management model for big data storage in the cloud
verfasst von: S. Sabeetha Saraswathi
N. Malarvizhi
Publikationsdatum: 02.04.2020
Verlag: Springer Berlin Heidelberg
Erschienen in: Evolutionary Intelligence / Ausgabe 2/2021
Print ISSN: 1864-5909
Elektronische ISSN: 1864-5917
DOI: https://doi.org/10.1007/s12065-020-00395-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2021

A proficient remote information responsibility check protocol in multi-cloud environment

An improved security and privacy management system for data in multi-cloud environments using a hybrid approach

Evolutionary intelligence techniques for humanized computing

Predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes using deep neural network

Correction to: Visual topic models for healthcare data clustering

On deep ensemble CNN–SAE based novel agro-market price forecasting