Skip to main content

2016 | OriginalPaper | Buchkapitel

Large-Scale Data Management System Using Data De-duplication System

verfasst von : S. Abirami, Rashmi Vikraman, S. Murugappan

Erschienen in: Proceedings of the Second International Conference on Computer and Communication Technologies

Verlag: Springer India

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Data de-duplication is the process of finding duplicates and eliminating it from the storage environment. There are various levels where the data de-duplication can be performed, such as file level, where the entire file as a whole is considered for the purpose of duplicate detection. Second is chunk level, where the file is split into small units called chunks and those chunks are used for the duplicate detection. Third is byte level, where the comparisons take byte-level comparison. The fingerprint of the chunks is the main parameter for the duplicate detection. These fingerprints are placed inside the chunk index. As the chunk index size increases, the chunk index needs to be placed in the disk. Searching for the fingerprint in the chunk index placed in the disk will consume a lot of time which will lead to a problem known as chunk lookup disk bottleneck problem. This paper eliminates that problem to some extent by placing a bloom filter in the cache as a probabilistic summary of all the fingerprints in the chunk index placed in the disk. This paper uses the backup data sets obtained from the university labs. The performance is measured with respect to the data de-duplication ratio.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Mark, R.C., Whitner, S.: Data De-duplication for Dummies. Wiley Publishing, Inc (2008) Mark, R.C., Whitner, S.: Data De-duplication for Dummies. Wiley Publishing, Inc (2008)
2.
Zurück zum Zitat Vikraman, Rashmi, Abirami, S.: A study on various data de-duplication systems. Int. J. Comput. Appl. 94(4), 35–40 (2014) Vikraman, Rashmi, Abirami, S.: A study on various data de-duplication systems. Int. J. Comput. Appl. 94(4), 35–40 (2014)
3.
Zurück zum Zitat Bhagwat, D., Eshghi, K., Lillibridge, M., Long, D.D.E.: Extreme binning: scalable, parallel de-duplication for chunk-based file backup. In: Proceedings of the IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 1–9 (2009) Bhagwat, D., Eshghi, K., Lillibridge, M., Long, D.D.E.: Extreme binning: scalable, parallel de-duplication for chunk-based file backup. In: Proceedings of the IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems, pp. 1–9 (2009)
4.
Zurück zum Zitat Thein, N.L., Thwel T.T.: An efficient indexing mechanism for data de-duplication. In: Proceedings of the International Conference on the Current Trends in Information Technology (CTIT), pp. 1–5 (2012) Thein, N.L., Thwel T.T.: An efficient indexing mechanism for data de-duplication. In: Proceedings of the International Conference on the Current Trends in Information Technology (CTIT), pp. 1–5 (2012)
5.
Zurück zum Zitat He, Q., Zhang, X., Li, Z.: Data de-duplication techniques. In: Proceedings of the International Conference on Future Information Technology and Management Engineering, pp. 430–433 (2010) He, Q., Zhang, X., Li, Z.: Data de-duplication techniques. In: Proceedings of the International Conference on Future Information Technology and Management Engineering, pp. 430–433 (2010)
6.
Zurück zum Zitat Rothenberg, C.E., Lagerspetz, E., Tarkoma, S.: Theory and practice of bloom filters for distributed systems. Published in IEEE Communications Surveys and Tutorials, pp. 131–155 (2012) Rothenberg, C.E., Lagerspetz, E., Tarkoma, S.: Theory and practice of bloom filters for distributed systems. Published in IEEE Communications Surveys and Tutorials, pp. 131–155 (2012)
7.
Zurück zum Zitat Zhu, B., Patterson, H., Li, K.: Avoiding the disk bottleneck in the data domain de-duplication file system. In: Proceedings of the Sixth USENIX Conference on File and Storage Technologies, pp. 269–282 (2008) Zhu, B., Patterson, H., Li, K.: Avoiding the disk bottleneck in the data domain de-duplication file system. In: Proceedings of the Sixth USENIX Conference on File and Storage Technologies, pp. 269–282 (2008)
8.
Zurück zum Zitat Rabin, M.O.: Fingerprinting by random polynomials. Technical Report TR-15-81, Center for Research in Computing Technology, Harvard University (1981) Rabin, M.O.: Fingerprinting by random polynomials. Technical Report TR-15-81, Center for Research in Computing Technology, Harvard University (1981)
9.
Zurück zum Zitat Chang, B., Moh, T.: A running time improvement for two thresholds two divisors algorithm. In: Proceedings of the ACM Southeast Regional Conference, pp. 69–107 (2010) Chang, B., Moh, T.: A running time improvement for two thresholds two divisors algorithm. In: Proceedings of the ACM Southeast Regional Conference, pp. 69–107 (2010)
10.
Zurück zum Zitat Mishra, M., Sengar, S.S.: E-DAID: an efficient distributed architecture for in-line data de-duplication. In: Proceedings of the International Conference on Communication Systems and Network Technologies, pp. 438–442 (2012) Mishra, M., Sengar, S.S.: E-DAID: an efficient distributed architecture for in-line data de-duplication. In: Proceedings of the International Conference on Communication Systems and Network Technologies, pp. 438–442 (2012)
11.
Zurück zum Zitat Wang, C., Wan, J., Yang, L., Qin, Z.-G.: A fast duplicate chunk identifying method based on hierarchical indexing structure. In: Proceedings of the International Conference on Industrial Control and Electronics Engineering (ICICEE), pp. 624–627 (2012) Wang, C., Wan, J., Yang, L., Qin, Z.-G.: A fast duplicate chunk identifying method based on hierarchical indexing structure. In: Proceedings of the International Conference on Industrial Control and Electronics Engineering (ICICEE), pp. 624–627 (2012)
12.
Zurück zum Zitat Gadan, A., Miller, E., Rodeh, O.: HANDS: a heuristically arranged non-backup in-line de-duplication system. In: Proceedings of the 29th IEEE International Conference on Data Engineering (ICDE), pp. 446–457 (2013) Gadan, A., Miller, E., Rodeh, O.: HANDS: a heuristically arranged non-backup in-line de-duplication system. In: Proceedings of the 29th IEEE International Conference on Data Engineering (ICDE), pp. 446–457 (2013)
13.
Zurück zum Zitat Feng, D., Sha, E.H., Ge, X., Tan, Y., Yan, Z.: Reducing the de-linearization of data placement to improve de-duplication performance. In: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 796–800 (2012) Feng, D., Sha, E.H., Ge, X., Tan, Y., Yan, Z.: Reducing the de-linearization of data placement to improve de-duplication performance. In: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 796–800 (2012)
Metadaten
Titel
Large-Scale Data Management System Using Data De-duplication System
verfasst von
S. Abirami
Rashmi Vikraman
S. Murugappan
Copyright-Jahr
2016
Verlag
Springer India
DOI
https://doi.org/10.1007/978-81-322-2517-1_23