skip to main content
research-article

A Survey and Classification of Storage Deduplication Systems

Authors Info & Claims
Published:01 June 2014Publication History
Skip Abstract Section

Abstract

The automatic elimination of duplicate data in a storage system, commonly known as deduplication, is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid-state drives, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development.

The first contribution of this article is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope. This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.

References

  1. Ashok Anand, Chitra Muthukrishnan, Steven Kappes, Aditya Akella, and Suman Nath. 2010. Cheap and large CAMs for high performance data-intensive networked systems. In Proceedings of the Symposium on Networked Systems Design and Implementation (NSDI). USENIX, Berkeley, CA, 433--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Andrea Arcangeli, Izik Eidus, and Chris Wright. 2009. Increasing memory density by using KSM. In Proceedings of the Linux Symposium. 19--28.Google ScholarGoogle Scholar
  3. Lior Aronovich, Ron Asher, Eitan Bachmat, Haim Bitner, Michael Hirsch, and Shmuel T. Klein. 2009. The design of a similarity based deduplication system. In Proceedings of International Systems and Storage Conference (SYSTOR). ACM, New York, NY, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brian Berliner. 1990. CVS II: Parallelizing software development. In Proceedings of USENIX Winter Technical Conference. USENIX, Berkeley, CA, 341--352.Google ScholarGoogle Scholar
  5. Deepavali Bhagwat, Kave Eshghi, Darrell D. E. Long, and Mark Lillibridge. 2009. Extreme Binning: Scalable, parallel deduplication for chunk-based file backup. In Proceedings of International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE Computer Society, Washington, DC, 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  6. Deepavali Bhagwat, Kristal Pollack, Darrell D. E. Long, Thomas Schwarz, Ethan L. Miller, and Jehan Franois Pris. 2006. Providing high reliability in a minimum redundancy archival storage system. In Proceedings of International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE Computer Society, Washington, DC, 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Deepak R. Bobbarjung, Suresh Jagannathan, and Cezary Dubnicki. 2006. Improving duplicate elimination in storage systems. ACM Transactions on Storage 2, 4 (November 2006), 424--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. William J. Bolosky, Scott Corbin, David Goebel, and John R. Douceur. 2000. Single instance storage in Windows 2000. In Proceedings of the USENIX Windows System Symposium (WSS). USENIX, Berkeley, CA, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Andrei Broder. 1997. On the resemblance and containment of documents. In Proceedings of the Compression and Complexity of Sequences. IEEE Computer Society, Washington, DC, 21--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Andrei Z. Broder. 1993. Some applications of Rabin’s fingerprinting method. In Sequences II: Methods in Communications, Security, and Computer Science. 143--152.Google ScholarGoogle Scholar
  11. Edouard Bugnion, Scott Devine, and Mendel Rosenblum. 1997. Disco: Running commodity operating systems on scalable multiprocessors. ACM Transactions on Computer Systems 15, 4 (November 1997), 143--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Randal C. Burns and Darrell D. E. Long. 1997. Efficient distributed backup with delta compression. In Proceedings of the Workshop on I/O in Parallel and Distributed Systems (IOPADS). ACM, New York, NY, 27--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Feng Chen, Tian Luo, and Xiaodong Zhang. 2011. CAFTL: A content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 77--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. David Cheriton, Amin Firoozshahian, Alex Solomatnikov, John P. Stevenson, and Omid Azizi. 2012. HICAMP: Architectural support for efficient concurrency-safe shared structured data dccess. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, New York, NY, 287--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Christopher Chute, Alex Manfrediz, Stephen Minton, David Reinsel, Wolfgang Schlichting, and Anna Toncheva. 2008. The diverse and exploding digital universe: An updated forecast of worldwide information growth through 2011. IDC white paper, sponsored by EMC. Retrieved September 12, 2013, from http://www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf.Google ScholarGoogle Scholar
  16. Austin T. Clements, Irfan Ahmad, Murali Vilayannur, and Jinyuan Li. 2009. Decentralized deduplication in SAN cluster file systems. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Christian Collberg, John H. Hartman, Sridivya Babu, and Sharath K. Udupa. 2005. Slinky: Static linking reloaded. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 309--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Cornel Constantinescu, Joseph Glider, and David Chambliss. 2011. Mixing deduplication and compression on active data sets. In Proceedings of the Data Compression Conference (DCC). IEEE Computer Society, Washington, DC, 393--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Landon P. Cox, Christopher D. Murray, and Brian D. Noble. 2002. Pastiche: Making backup cheap and easy. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX, Berkeley, CA, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Biplob Debnath, Sudipta Sengupta, and Jin Li. 2010. ChunkStash: Speeding up inline storage deduplication using flash memory. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Biplob Debnath, Sudipta Sengupta, and Jin Li. 2011. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the ACM’s Special Interest Group on Management of Data (SIGMOD). ACM, New York, NY, 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Wei Dong, Fred Douglis, Kai Li, Hugo Patterson, Sazzala Reddy, and Philip Shilane. 2011. Tradeoffs in scalable data routing for deduplication clusters. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 15--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. John R. Douceur, Atul Adya, William J. Bolosky, Dan Simon, and Marvin Theimer. 2002. Reclaiming space from duplicate files in a serverless distributed file system. Technical Report MSR-TR-2002-30. Microsoft Research. 1--14 pages.Google ScholarGoogle Scholar
  24. Fred Douglis and Arun Iyengar. 2003. Application-specific delta-encoding via resemblance detection. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 113--126.Google ScholarGoogle Scholar
  25. Fred Douglis, Jason Lavoie, John M. Tracey, Purushottam Kulkarni, and Purushottam Kulkarni. 2004. Redundancy elimination within large collections of files. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 1--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Cezary Dubnicki, Leszek Gryz, Lukasz Heldt, Michal Kaczmarczyk, Wojciech Kilian, Przemyslaw Strzelczak, Jerzy Szczepkowski, Cristian Ungureanu, and Michal Welnicki. 2009. HYDRAstor: A scalable secondary storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 197--210. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ahmed El-Shimi, Ran Kalach, Ankit Kumar, Adi Oltean, Jin Li, and Sudipta Sengupta. 2012. Primary data deduplication large scale study and system design. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kave Eshghi, Mark Lillibridge, Lawrence Wilcock, Guillaume Belrose, and Rycharde Hawkes. 2007. Jumbo Store: Providing efficient incremental upload and versioning for a utility rendering service. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 123--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kave Eshghi and Hsiu K. Tang. 2005. A framework for analyzing and improving content-based chunking algorithms. Technical Report HPL-2005-30. Intelligent Enterprise Technologies Laboratory. 1--10 pages. Available at http://www.hpl.hp.com/techreports/2005/HPL-2005-30R1.pdf.Google ScholarGoogle Scholar
  30. Davide Frey, Anne-Marie Kermarrec, and Konstantinos Kloudas. 2012. Probabilistic deduplication for cluster-based storage systems. In Proceedings of the 3rd ACM Symposium on Cloud Computing (SOCC). ACM, New York, NY, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yinjin Fu, Hong Jiang, and Nong Xiao. 2012. A scalable inline cluster deduplication framework for big data protection. In Proceedings of the ACM/IFIP/USENIX International Middleware Conference. ACM, New York, NY, 354--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Fanglu Guo and Petros Efstathopoulos. 2011. Building a high-performance deduplication system. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Aayush Gupta, Raghav Pisolkar, Bhuvan Urgaonkar, and Anand Sivasubramaniam. 2011. Leveraging value locality in optimizing NAND flash-based SSDs. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 91--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Diwaker Gupta, Sangmin Lee, Michael Vrable, Stefan Savage, Alex C. Snoeren, George Varghese, Geoffrey M. Voelker, and Amin Vahdat. 2010. Difference engine: Harnessing memory redundancy in virtual machines. Communications of the ACM 53, 10 (October 2010), 85--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg. 2010. Side channels in cloud services: Deduplication in cloud storage. IEEE Security and Privacy 8, 6 (November 2010), 40--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Bo Hong and Darrell D. E. Long. 2004. Duplicate data elimination in a SAN file system. In Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society, Washington, DC, 301--314.Google ScholarGoogle Scholar
  37. James J. Hunt, Kiem-Phong Vo, and Walter F. Tichy. 1998. Delta algorithms: An empirical analysis. ACM Transactions on Software Engineering and Methodology 7, 2 (April 1998), 192--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Keren Jin and Ethan L. Miller. 2009. The effectiveness of deduplication on virtual machine disk images. In Proceedings of the International Systems and Storage Conference (SYSTOR). ACM, New York, NY, 7:1--7:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Michal Kaczmarczyk, Marcin Barczynski, Wojciech Kilian, and Cezary Dubnicki. 2012. Reducing impact of data fragmentation caused by in-line deduplication. In Proceedings of the International Systems and Storage Conference (SYSTOR). ACM, New York, NY, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jürgen Kaiser, Dirk Meister, André Brinkmann, and Sascha Effert. 2012. Design of an exact data deduplication cluster. In Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society, Washington, DC, 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  41. Jonghwa Kim, Choonghyun Lee, Sangyup Lee, Ikjoon Son, Jongmoo Choi, Sungroh Yoon, Hu ung Lee, Sooyong Kang, Youjip Won, and Jaehyuk Cha. 2012. Deduplication in SSDs: Model and quantitative analysis. In Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society, Washington, DC, 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  42. Ricardo Koller and Raju Rangaswami. 2010. I/O deduplication: Utilizing content similarity to improve I/O performance. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 211--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Erik Kruus, Cristian Ungureanu, and Cezary Dubnicki. 2010. Bimodal content defined chunking for backup streams. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 239--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Anthony Liguori and Eric Van Hensbergen. 2008. Experiences with content addressable storage and virtual disks. In Proceedings of the USENIX Workshop on I/O Virtualization (WIOV). USENIX, Berkeley, CA, 1--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Camble. 2009. Sparse indexing: Large scale, inline deduplication using sampling and locality. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 111--123. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky. 2011. SILT: A memory-efficient, high-performance key-value store. In Proceedings of the Symposium on Operating Systems Principles (SOSP). ACM, New York, NY, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Guanlin Lu, Yu Jin, and David H. C. Du. 2010. Frequency based chunking for data de-duplication. In Proceedings of the International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE Computer Society, Washington, DC, 287--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Guanlin Lu, Youngjin Nam, and David H. C. Du. 2012. BloomStore: Bloom-filter based memory-efficient key-value store for indexing of data deduplication on flash. In Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society, Washington, DC, 1--11.Google ScholarGoogle Scholar
  49. Udi Manber. 1994. Finding similar files in a large file system. In Proceedings of the USENIX Winter Technical Conference. USENIX, Berkeley, CA, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Nagapramod Mandagere, Pin Zhou, Mark A. Smith, and Sandeep Uttamchandani. 2008. Demystifying data deduplication. In Proceedings of the ACM/IFIP/USENIX International Middleware Conference. ACM, New York, NY, 12--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Dirk Meister and André Brinkmann. 2009. Multi-level comparison of data deduplication in a backup scenario. In Proceedings of the International Systems and Storage Conference (SYSTOR). ACM, New York, NY, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Dirk Meister and André Brinkmann. 2010. dedupv1: Improving deduplication throughput using solid state drives (SSD). In Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society, Washington, DC, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Dutch T. Meyer, Gitika Aggarwal, Brendan Cully, Geoffrey Lefebvre, Michael J. Feeley, Norman C. Hutchinson, and Andrew Warfield. 2008. Parallax: Virtual disks for virtual machines. In Proceedings of the European Conference on Computer Systems (EuroSys). ACM, New York, NY, 41--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Dutch T. Meyer and William J. Bolosky. 2011. A study of practical deduplication. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Grzegorz Milos, Derek G. Murray, Steven Hand, and Michael A. Fetterman. 2009. Satori: Enlightened page sharing. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Athicha Muthitacharoen, Benjie Chen, and David Mazières. 2001. A low-bandwidth network file system. In Proceedings of the Symposium on Operating Systems Principles (SOSP). ACM, New York, NY, 174--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Partho Nath, Michael A. Kozuch, David R. O’Hallaron, Jan Harkes, M. Satyanarayanan, Niraj Tolia, and Matt Toups. 2006. Design tradeoffs in applying content addressable storage to enterprise-scale systems based on virtual machines. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 71--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Chun-Ho Ng, Mingcao Ma, Tsz-Yeung Wong, Patrick P. C. Lee, and John C. S. Lui. 2011. Live deduplication storage of virtual machine images in an open-source cloud. In Proceedings of the ACM/IFIP/USENIX International Middleware Conference. ACM, New York, 1--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Zan Ouyang, Nasir D. Memon, Torsten Suel, and Dimitre Trendafilov. 2002. Cluster-based delta compression of a collection of files. In Proceedings of the International Conference on Web Information Systems Engineering (WISE). IEEE Computer Society, Washington, DC, 257--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Ying-Shiuan Pan, Jui-Hao Chiang, Han-Lin Li, Po-Jui Tsao, Ming-Fen Lin, and Tzi-cker Chiueh. 2011. Hypervisor support for efficient memory de-duplication. In Proceedings of the International Conference on Parallel and Distributed Systems (ICPADS). IEEE Computer Society, Washington, DC, 33--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. João Paulo, Pedro Reis, Jose Pereira, and Antonio Sousa. 2012. DEDISbench: A benchmark for deduplicated storage systems. In Proceedings of the International Symposium on Secure Virtual Infrastructures (DOA-SVI). 1--18.Google ScholarGoogle ScholarCross RefCross Ref
  62. Calicrates Policroniades and Ian Pratt. 2004. Alternatives for detecting redundancy in storage systems data. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 73--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Sean Quinlan and Sean Dorward. 2002. Venti: A new approach to archival storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Michael O. Rabin. 1981. Fingerprinting by Random Polynomials. Technical Report TR-15-81. Harvard Aiken Computation Laboratory. 1--12 pages.Google ScholarGoogle Scholar
  65. Fatema Rashid, Ali Miri, and Isaac Woungang. 2012. A secure data deduplication framework for cloud environments. In Proceedings of the International Conference on Privacy, Security and Trust (PST). IEEE Computer Society, Washington, DC, 81--87. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Sean Rhea, Russ Cox, and Alex Pesterev. 2008. Fast, inexpensive content-addressed storage in foundation. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 143--156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Eric W. D. Rozier, William H. Sanders, Pin Zhou, Nagapramod Mandagere, Sandeep M. Uttamchandani, and Mark L. Yakushev. 2011. Modeling the fault tolerance consequences of deduplication. In Proceedings of the International Symposium on Reliable Distributed Systems (SRDS). IEEE Computer Society, Washington, DC, 75--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Leonard D. Shapiro. 1986. Join processing in database systems with large main memories. ACM Transactions on Database Systems 11, 3 (September 1986), 239--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Prateek Sharma and Purushottam Kulkarni. 2012. Singleton: System-wide page deduplication in virtual environments. In Proceedings of the Symposium on High Performance Distributed Computing (HPDC). ACM, New York, NY, 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Philip Shilane, Grant Wallace, Mark Huang, and Windsor Hsu. 2012. Delta compressed and deduplicated storage using stream-informed locality. In Proceedings of the USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage). USENIX, Berkeley, CA, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Kiran Srinivasan, Tim Bisson, Garth Goodson, and Kaladhar Voruganti. 2012. iDedup: Latency-aware, inline data deduplication for primary storage. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Mark W. Storer, Kevin Greenan, Darrell D. E. Long, and Ethan L. Miller. 2008. Secure data deduplication. In Proceedings of the Workshop on Storage Security and Survivability (StorageSS). ACM, New York, NY, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Kuniyasu Suzaki, Kengo Iijima, Toshiki Yagi, and Cyrille Artho. 2011. Memory deduplication as a threat to the guest OS. In Proceedings of the European Workshop on Systems Security (EuroSec). ACM, New York, NY, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Kuniyasu Suzaki, Toshiki Yagi, Kengo Iijima, Nguyen Anh Quynh, Cyrille Artho, and Yoshihito Watanebe. 2010. Moving from logical sharing of guest OS to physical sharing of deduplication on virtual machine. In Proceedings of the Workshop on Hot Topics in Security (HotSec). USENIX, Berkeley, CA, 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Vasily Tarasov, Amar Mudrankit, Will Buik, Philip Shilane, Geoff Kuenning, and Erez Zadok. 2012. Generating realistic datasets for deduplication analysis. In Poster Session of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 1--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Yoshihiro Tsuchiya and Takashi Watanabe. 2011. DBLK: Deduplication for primary block storage. In Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society, Washington, DC, 1--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Cristian Ungureanu, Benjamin Atkin, Akshat Aranya, Salil Gokhale, Stephen Rago, Grzegorz Calkowski, Cezary Dubnicki, and Aniruddha Bohra. 2010. HydraFS: A high-throughput file system for the HYDRAstor content-addressable storage system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 225--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Carl A. Waldspurger. 2002. Memory resource management in VMware ESX server. SIGOPS Operating Systems Review 36, SI (December 2002), 181--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Jiansheng Wei, Hong Jiang, Ke Zhou, and Dan Feng. 2010. MAD2: A scalable high-throughput exact deduplication approach for network backup services. In Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society, Washington, DC, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Timothy Wood, Gabriel Tarasuk-Levin, Prashant Shenoy, Peter Desnoyers, Emmanuel Cecchet, and Mark D. Corner. 2009. Memory buddies: Exploiting page sharing for smart colocation in virtualized data centers. In Proceedings of the Conference on Virtual Execution Environments (VEE). ACM, New York, NY, 31--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Jeff Wright. 2011. Sun ZFS Storage Appliance Deduplication Design and Implementation Guidelines. Retrieved September 12, 2013, from http://www.oracle.com/technetwork/articles/servers-storage-admin/zfs-storage-deduplication-335298.html.Google ScholarGoogle Scholar
  82. Wen Xia, Hong Jiang, Dan Feng, and Yu Hua. 2011. SiLo: A similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput. In Proceedings of the USENIX Annual Technical Conference (ATC). USENIX, Berkeley, CA, 26--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Tianming Yang, Dan Feng, Zhongying Niu, and Ya ping Wan. 2010a. Scalable high performance de-duplication backup via hash join. Journal of Zhejiang University—Science C 11, 5 (November 2010), 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  84. Tianming Yang, Hong Jiang, Dan Feng, Zhongying Niu, Ke Zhou, and Yaping Wan. 2010b. DEBAR: A scalable high-performance de-duplication storage system for backup and archiving. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS). IEEE Computer Society, Washington, DC, 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  85. Lawrence You and Christos Karamanolis. 2004. Evaluation of efficient archival storage techniques. In Proceedings of the Conference on Mass Storage Systems (MSST). IEEE Computer Society, Washington, DC, 227--232.Google ScholarGoogle Scholar
  86. Lawrence L. You, Kristal T. Pollack, and Darrell D. E. Long. 2005. Deep Store: An archival storage system architecture. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE Computer Society, Washington, DC, 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Benjamin Zhu, Kai Li, and Hugo Patterson. 2008. Avoiding the disk bottleneck in the data domain deduplication file system. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST). USENIX, Berkeley, CA, 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Survey and Classification of Storage Deduplication Systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 47, Issue 1
      July 2014
      551 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/2620784
      Issue’s Table of Contents

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 June 2014
      • Accepted: 1 April 2014
      • Revised: 1 October 2012
      • Received: 1 April 2012
      Published in csur Volume 47, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader