skip to main content
10.1145/1462735.1462739acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Demystifying data deduplication

Published:01 December 2008Publication History

ABSTRACT

Effectiveness and tradeoffs of deduplication technologies are not well understood -- vendors tout Deduplication as a "silver bullet" that can help any enterprise optimize its deployed storage capacity. This paper aims to provide a comprehensive taxonomy and experimental evaluation using real-world data. While the rate of change of data on a day-to-day basis has the greatest influence on the duplication in backup data, we investigate the duplication inherent in this data, independent of rate of change of data or backup schedule or backup algorithm used. Our experimental results show that between different deduplication techniques the space savings varies by about 30%, the CPU usage differs by almost 6 times and the time to reconstruct a deduplicated file can vary by more than 15 times.

References

  1. A. Z. Broder. Identifying and filtering near duplicate documents. In Combinatorial Pattern Matching: 11th Annual Symposium, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. J. Hunt, K.-P. Vo, and W. F. Tichy. An empirical study of delta algorithms. In ICSE '96: Proceedings of the SCM-6 Workshop on System Configuration Management, pages 49--66, London, UK, 1996. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Kulkarni, F. Douglis, J. LaVoie, and J. M. Tracey. Redundancy elimination within large collections of files. In ATEC '04: Proceedings of the annual conference on USENIX Annual Technical Conference, pages 5--5, Berkeley, CA, USA, 2004. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Policroniades and I. Pratt. Alternatives for detecting redundancy in storage systems data. In USENIX Annual Technical Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. O. Rabin. Fingerprinting by random polynomials. In Center for Research in Computing Technology, Harvard University. Tech Report TRCSE-03-01, 2006, 1981.Google ScholarGoogle Scholar
  6. L. You and C. Karamanolis. Evaluation of efficient archival storage techniques. In 21st IEEE/12th NASA Goddard Conference on Mass Storage systems and Technologies, 2004.Google ScholarGoogle Scholar
  7. L. You, K. Pollack, and D. Long. Deep store: An archival storage system architecture. In 21st International Conference on Data Engineering, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Zhu, K. Li, and H. Patterson. Avoiding the disk bottleneck in the data domain deduplication file system. In FAST, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Demystifying data deduplication

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          Companion '08: Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion
          December 2008
          134 pages
          ISBN:9781605583693
          DOI:10.1145/1462735

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 December 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader