Skip to main content
Erschienen in: Evolutionary Intelligence 2/2021

02.04.2020 | Special Issue

Distributed deduplication with fingerprint index management model for big data storage in the cloud

verfasst von: S. Sabeetha Saraswathi, N. Malarvizhi

Erschienen in: Evolutionary Intelligence | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

As data progressively grows within data centers, the cloud storage models face several issues while storing data and offers abilities needed to shift data in an adequate time frame. This study aims to develop a distributed deduplication model to achieve scalable throughput and capacity utilizing many data servers for duplicating data in parallel with minimal loss. This paper proposes a new cloud storage model based on a distributed deduplication with the fingerprint index management (DDFI) model. The DDFI model operates on three main stages. At the initial stage, the DDFI model makes use of an effective routing technique depending upon the similarity level of the data, which leads to low network overhead by rapid identification of storage locations. In the second stage, the duplicate data identification procedure is carried out by the use of the MD5 algorithm. At the final stage, a fingerprint index management process is executed where a fingerprint index comprises fingerprints and its corresponding position details of every written chunk. For optimizing the results of the deduplication performance, the DDFI model manages the fingerprint index in storage space and only sometimes writes to disk at the same time as the cloud database scheme is idle. The simulation outcome exhibited that the presented DDFI model offered maximum results with a higher deduplication ratio (DR) with a minimum overhead of network bandwidth. From the detailed comparative analysis, it is inferred that the presented DFFI model offered maximum relative DR, maximum duplication performance, minimum read bandwidth, and write bandwidth.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Biggar H (2012) Experiencing data de-duplication: improving efficiency and reducing capacity requirements. White paper, Feb. 2007. The Enterprise Strategy Group, Dublin Biggar H (2012) Experiencing data de-duplication: improving efficiency and reducing capacity requirements. White paper, Feb. 2007. The Enterprise Strategy Group, Dublin
2.
Zurück zum Zitat Kubiatowicz J, Bindel D, Chen Y et al (2000) Oceanstore: an architecture for global-scale persistent storage. ACM Sigplan Not 35(11):190–201CrossRef Kubiatowicz J, Bindel D, Chen Y et al (2000) Oceanstore: an architecture for global-scale persistent storage. ACM Sigplan Not 35(11):190–201CrossRef
3.
Zurück zum Zitat Quinlan S, Dorward S (2002) Venti: a new approach to archival storage. In: Proceedings of the conference on file and storage technologies, vol 2, pp 89–101 Quinlan S, Dorward S (2002) Venti: a new approach to archival storage. In: Proceedings of the conference on file and storage technologies, vol 2, pp 89–101
4.
Zurück zum Zitat Lillibridge M, Eshghi K, Bhagwat D et al (2009) Sparse indexing: large scale, inline deduplication using sampling and locality In: Proceedings of the conference on file and storage technologies, vol 9, pp 111–123 Lillibridge M, Eshghi K, Bhagwat D et al (2009) Sparse indexing: large scale, inline deduplication using sampling and locality In: Proceedings of the conference on file and storage technologies, vol 9, pp 111–123
5.
Zurück zum Zitat Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of compression complexity sequences, pp 21–29 Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of compression complexity sequences, pp 21–29
6.
Zurück zum Zitat Debnath B, Sengupta S, Li J (2010) ChunkStash: speeding up inline storage deduplication using flash memory. In: Proceedings of conference on USENIX annual technical conference, pp 16–16 Debnath B, Sengupta S, Li J (2010) ChunkStash: speeding up inline storage deduplication using flash memory. In: Proceedings of conference on USENIX annual technical conference, pp 16–16
8.
Zurück zum Zitat Dubnicki C, Gryz L, Heldt L et al (2009) HYDRAstor: a scalable secondary storage. In: FAST, vol 9, pp 197–210 Dubnicki C, Gryz L, Heldt L et al (2009) HYDRAstor: a scalable secondary storage. In: FAST, vol 9, pp 197–210
9.
Zurück zum Zitat Dong W, Douglis F, Li K et al (2011) Tradeoffs in scalable data routing for deduplication clusters. In: Proceedings of the conference on file and storage technologies, pp 15–29 Dong W, Douglis F, Li K et al (2011) Tradeoffs in scalable data routing for deduplication clusters. In: Proceedings of the conference on file and storage technologies, pp 15–29
10.
Zurück zum Zitat Wang L, Zhu Z, Zhang X, Dong X, Wang Y (2017) DOMe: a deduplication optimization method for the NewSQL database backups. PLoS ONE 12(10):e0185189CrossRef Wang L, Zhu Z, Zhang X, Dong X, Wang Y (2017) DOMe: a deduplication optimization method for the NewSQL database backups. PLoS ONE 12(10):e0185189CrossRef
11.
Zurück zum Zitat Luo S, Zhang G, Wu C, Khan S, Li K (2015) Boafft: distributed deduplication for big data storage in the cloud. IEEE Trans Cloud Comput 61:1–13 Luo S, Zhang G, Wu C, Khan S, Li K (2015) Boafft: distributed deduplication for big data storage in the cloud. IEEE Trans Cloud Comput 61:1–13
12.
Zurück zum Zitat Li M, Zhang H, Wu Y, Zhao C (2019) Prefetch-aware fingerprint cache management for data deduplication systems. Front Comput Sci 13(3):500–515CrossRef Li M, Zhang H, Wu Y, Zhao C (2019) Prefetch-aware fingerprint cache management for data deduplication systems. Front Comput Sci 13(3):500–515CrossRef
13.
Zurück zum Zitat Muthitacharoen A, Chen B, Mazieres D (2001) A low-bandwidth network file system. ACM SIGOPS Oper Syst Rev 35(5):174–187CrossRef Muthitacharoen A, Chen B, Mazieres D (2001) A low-bandwidth network file system. ACM SIGOPS Oper Syst Rev 35(5):174–187CrossRef
14.
Zurück zum Zitat Vijayan MK, Kochunni JO, Attarde DR, Ankireddypalle RR, CommVault Systems Inc (2019) Deduplication replication in a distributed deduplication data storage system. U.S. patent application 16/232,950 Vijayan MK, Kochunni JO, Attarde DR, Ankireddypalle RR, CommVault Systems Inc (2019) Deduplication replication in a distributed deduplication data storage system. U.S. patent application 16/232,950
15.
Zurück zum Zitat Thakur MA, Bari S, Deshmukh R, Auty S (2020) Secure key agreement model for group data sharing and achieving data deduplication in cloud computing. In Information and communication technology for sustainable development. Springer, Singapore, pp 121–127 Thakur MA, Bari S, Deshmukh R, Auty S (2020) Secure key agreement model for group data sharing and achieving data deduplication in cloud computing. In Information and communication technology for sustainable development. Springer, Singapore, pp 121–127
16.
Zurück zum Zitat Hema S, Kangaiammal A (2019) Distributed storage hash algorithm (DSHA) for file-based deduplication in cloud computing. In: International conference on computer networks and inventive communication technologies. Springer, Cham, pp 572–581 Hema S, Kangaiammal A (2019) Distributed storage hash algorithm (DSHA) for file-based deduplication in cloud computing. In: International conference on computer networks and inventive communication technologies. Springer, Cham, pp 572–581
17.
Zurück zum Zitat An B, Li Y, Ma J, Huang G, Chen X, Cao D (2019) DCStore: a deduplication-based cloud-of-clouds storage service. In: 2019 IEEE international conference on web services (ICWS). IEEE, pp 291–295 An B, Li Y, Ma J, Huang G, Chen X, Cao D (2019) DCStore: a deduplication-based cloud-of-clouds storage service. In: 2019 IEEE international conference on web services (ICWS). IEEE, pp 291–295
Metadaten
Titel
Distributed deduplication with fingerprint index management model for big data storage in the cloud
verfasst von
S. Sabeetha Saraswathi
N. Malarvizhi
Publikationsdatum
02.04.2020
Verlag
Springer Berlin Heidelberg
Erschienen in
Evolutionary Intelligence / Ausgabe 2/2021
Print ISSN: 1864-5909
Elektronische ISSN: 1864-5917
DOI
https://doi.org/10.1007/s12065-020-00395-8

Weitere Artikel der Ausgabe 2/2021

Evolutionary Intelligence 2/2021 Zur Ausgabe