Skip to main content
Erschienen in: The Journal of Supercomputing 11/2020

04.02.2020

CA-Dedupe: content-aware deduplication in SSDs

verfasst von: Ramin Gholami Taghizadeh, Reza Gholami Taghizadeh, Fahimeh Khakpash, Mohammadreza Binesh Marvasti, Seyyed Amir Asghari

Erschienen in: The Journal of Supercomputing | Ausgabe 11/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Flash memories have been around for many years because of their high performance compared to HDDs. But flash memories have a limited lifespan, and they will wear prematurely if used in write-intensive usages. Solutions such as wear leveling, compression and deduplication have been proposed to address this issue. Deduplication is a proper way to improve flash memories’ lifespan, but deduplication methods proposed in previous works usually impose a significant delay on write operations. This paper provides an intelligent method for data deduplication on flash memories which works by categorizing write requests based on their contents and types. In this scheme, calculated metadata for write requests is placed in separate categories and during deduplication procedure, the search operation is performed in one category. As a result, the proposed method improves the search delay and the deduplication rate significantly. Simulation results show that the proposed method improves delay of write operations by 32%, when compared to other methods, and achieves the deduplication rate of 69.8%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Mittal S, Vetter JS (2016) A survey of software techniques for using non-volatile memories for storage and main memory systems. IEEE Trans Parallel Distrib Syst 27(5):1537–1550CrossRef Mittal S, Vetter JS (2016) A survey of software techniques for using non-volatile memories for storage and main memory systems. IEEE Trans Parallel Distrib Syst 27(5):1537–1550CrossRef
2.
Zurück zum Zitat Tavakkol A, Arjomand M, Sarbazi-Azad H (2014) Design for scalability in enterprise SSDs. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, pp 417–429 Tavakkol A, Arjomand M, Sarbazi-Azad H (2014) Design for scalability in enterprise SSDs. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, pp 417–429
3.
Zurück zum Zitat Ramasamy AS, Karantharaj P (2015) RFFE: a buffer cache management algorithm for flash-memory-based SSD to improve write performance. Can J Electr Comput Eng 38(3):219–231CrossRef Ramasamy AS, Karantharaj P (2015) RFFE: a buffer cache management algorithm for flash-memory-based SSD to improve write performance. Can J Electr Comput Eng 38(3):219–231CrossRef
4.
Zurück zum Zitat Tavakkol A, Arjomand M, Sarbazi-Azad H (2013) Network-on-SSD: a scalable and high-performance communication design paradigm for SSDs. IEEE Comput Archit Lett 12(1):5–8CrossRef Tavakkol A, Arjomand M, Sarbazi-Azad H (2013) Network-on-SSD: a scalable and high-performance communication design paradigm for SSDs. IEEE Comput Archit Lett 12(1):5–8CrossRef
5.
Zurück zum Zitat Kim D et al (2016) Exploiting compression-induced internal fragmentation for power-off recovery in SSD. IEEE Trans Comput 65(6):1720–1733MathSciNetCrossRef Kim D et al (2016) Exploiting compression-induced internal fragmentation for power-off recovery in SSD. IEEE Trans Comput 65(6):1720–1733MathSciNetCrossRef
6.
Zurück zum Zitat Chen F, Luo T, Zhang X (2011) CAFTL: a content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In: Proceedings of the 9th USENIX Conference on File and Storage Technologies, February 15–17, 2011, San Jose, California, p 6 Chen F, Luo T, Zhang X (2011) CAFTL: a content-aware flash translation layer enhancing the lifespan of flash memory based solid state drives. In: Proceedings of the 9th USENIX Conference on File and Storage Technologies, February 15–17, 2011, San Jose, California, p 6
7.
Zurück zum Zitat Yang M-C, Chang Y-H, Kuo T-W, Huang P-C (2016) Capacity-independent address mapping for flash storage devices with explosively growing capacity. IEEE Trans Comput 65(2):448–465MathSciNetCrossRef Yang M-C, Chang Y-H, Kuo T-W, Huang P-C (2016) Capacity-independent address mapping for flash storage devices with explosively growing capacity. IEEE Trans Comput 65(2):448–465MathSciNetCrossRef
8.
Zurück zum Zitat Tsao CW, Chang YH, Yang MC, Huang PC (2015) Efficient victim block selection for flash storage devices. IEEE Trans Comput 64(12):3444–3460MathSciNetCrossRef Tsao CW, Chang YH, Yang MC, Huang PC (2015) Efficient victim block selection for flash storage devices. IEEE Trans Comput 64(12):3444–3460MathSciNetCrossRef
9.
Zurück zum Zitat Xu Z, Li R, Xu C-Z (2012) CAST: a page-level FTL with compact address mapping and parallel data blocks. In: Proceedings of IEEE International Performance Computing and Communication Conference, pp 142–151 Xu Z, Li R, Xu C-Z (2012) CAST: a page-level FTL with compact address mapping and parallel data blocks. In: Proceedings of IEEE International Performance Computing and Communication Conference, pp 142–151
10.
Zurück zum Zitat Park Y, Kim JS (2011) zFTL: power-efficient data compression support for NAND flash-based consumer electronics devices. IEEE Trans Consum Electron 57(3):1148–1156CrossRef Park Y, Kim JS (2011) zFTL: power-efficient data compression support for NAND flash-based consumer electronics devices. IEEE Trans Consum Electron 57(3):1148–1156CrossRef
11.
Zurück zum Zitat Ji C, Chang L-P, Shi L, Gao C, Wu C, Wang Y, Xue CJ (2017) Lightweight data compression for mobile flash storage. ACM Trans Embed Comput Syst 16(5s):1–18CrossRef Ji C, Chang L-P, Shi L, Gao C, Wu C, Wang Y, Xue CJ (2017) Lightweight data compression for mobile flash storage. ACM Trans Embed Comput Syst 16(5s):1–18CrossRef
12.
Zurück zum Zitat Seo BK, Maeng S, Lee J, Seo E (2015) DRACO: a deduplicating FTL for tangible extra capacity. IEEE Comput Archit Lett 14(2):123–126CrossRef Seo BK, Maeng S, Lee J, Seo E (2015) DRACO: a deduplicating FTL for tangible extra capacity. IEEE Comput Archit Lett 14(2):123–126CrossRef
13.
Zurück zum Zitat Lee S, Park J, Fleming K, Kim J (2011) Improving performance and lifetime of solid-state drives using hardware-accelerated compression. IEEE Trans Consum Electron 57(4):1732–1739CrossRef Lee S, Park J, Fleming K, Kim J (2011) Improving performance and lifetime of solid-state drives using hardware-accelerated compression. IEEE Trans Consum Electron 57(4):1732–1739CrossRef
14.
Zurück zum Zitat Xie N, Dong G, Zhang T (2011) Using lossless data compression in data storage systems: not for saving space. IEEE Trans Comput 60(3):335–345MathSciNetCrossRef Xie N, Dong G, Zhang T (2011) Using lossless data compression in data storage systems: not for saving space. IEEE Trans Comput 60(3):335–345MathSciNetCrossRef
15.
Zurück zum Zitat Liu J, Chai YP, Qin X, Liu YH (2018) Endurable SSD-based read cache for improving the performance of selective restore from deduplication systems. J Comput Sci Technol 33(1):58–78CrossRef Liu J, Chai YP, Qin X, Liu YH (2018) Endurable SSD-based read cache for improving the performance of selective restore from deduplication systems. J Comput Sci Technol 33(1):58–78CrossRef
16.
Zurück zum Zitat Freudenberger J, Rajab M, Rohweder D, Safieh M (2018) A codec architecture for the compression of short data blocks. J Circuits Syst Comput 27(2):1850019CrossRef Freudenberger J, Rajab M, Rohweder D, Safieh M (2018) A codec architecture for the compression of short data blocks. J Circuits Syst Comput 27(2):1850019CrossRef
19.
Zurück zum Zitat Li WJ, Wang K, Stolfo S, Herzog B (2005) Fileprints: identifying file types by n-gram analysis. In: Proceedings from the Sixth Annual IEEE Systems, Man and Cybernetics (SMC) Information Assurance Workshop, pp 64–71 Li WJ, Wang K, Stolfo S, Herzog B (2005) Fileprints: identifying file types by n-gram analysis. In: Proceedings from the Sixth Annual IEEE Systems, Man and Cybernetics (SMC) Information Assurance Workshop, pp 64–71
20.
Zurück zum Zitat Mcdaniel M, Heydari M (2003) Content based file type detection algorithms. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 2003, p 10 Mcdaniel M, Heydari M (2003) Content based file type detection algorithms. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences, 2003, p 10
21.
Zurück zum Zitat Karresand M, Shahmehri N (2006) File type identification of data fragments by their binary structure. In: 2006 IEEE Information Assurance Workshop, pp 140–147 Karresand M, Shahmehri N (2006) File type identification of data fragments by their binary structure. In: 2006 IEEE Information Assurance Workshop, pp 140–147
22.
Zurück zum Zitat Calhoun WC, Coles D (2008) Predicting the types of file fragments. Digit Investig 5:S14–S20CrossRef Calhoun WC, Coles D (2008) Predicting the types of file fragments. Digit Investig 5:S14–S20CrossRef
24.
Zurück zum Zitat Chen Z, Chen Z, Xiao N, Liu F (2015) NF-Dedupe: a novel no-fingerprint deduplication scheme for flash-based SSDs. In: 2015 IEEE symposium on computers and communication (ISCC), 2015 Chen Z, Chen Z, Xiao N, Liu F (2015) NF-Dedupe: a novel no-fingerprint deduplication scheme for flash-based SSDs. In: 2015 IEEE symposium on computers and communication (ISCC), 2015
25.
Zurück zum Zitat Ha JY, Lee YS, Kim JS (2013) Deduplication with block-level content-aware chunking for solid state drives (SSDs). In: 2013 IEEE 10th International Conference on High Performance Computing and Communications and 2013 IEEE International Conference on Embedded and Ubiquitous Computing Ha JY, Lee YS, Kim JS (2013) Deduplication with block-level content-aware chunking for solid state drives (SSDs). In: 2013 IEEE 10th International Conference on High Performance Computing and Communications and 2013 IEEE International Conference on Embedded and Ubiquitous Computing
26.
Zurück zum Zitat Seo BK, Maeng S, Lee J, Seo E (2015) DRACO: a deduplicating FTL for tangible extra capacity. IEEE Comput Archit Lett 14(2):123–126CrossRef Seo BK, Maeng S, Lee J, Seo E (2015) DRACO: a deduplicating FTL for tangible extra capacity. IEEE Comput Archit Lett 14(2):123–126CrossRef
27.
Zurück zum Zitat Hu Y, Jiang H, Feng D, Tian L, Luo H, Zhang S (2011) Performance impact and interplay of SSD parallelism through advanced commands. In: Proceedings of the ICS'11, pp 96–107 Hu Y, Jiang H, Feng D, Tian L, Luo H, Zhang S (2011) Performance impact and interplay of SSD parallelism through advanced commands. In: Proceedings of the ICS'11, pp 96–107
28.
Zurück zum Zitat Nazari M, Taghizadeh R, Asghari SA, Marvasti MB, Rahmani AM (2019) FRCD: fast recovery of compressible data in flash memories. Comput Electr Eng 78:520–535CrossRef Nazari M, Taghizadeh R, Asghari SA, Marvasti MB, Rahmani AM (2019) FRCD: fast recovery of compressible data in flash memories. Comput Electr Eng 78:520–535CrossRef
29.
Zurück zum Zitat Bucy JS et al (2008) The DiskSim simulation environment version 4.0 reference manual. Technical Report CMU-PDL-08-101, Parallel Data Laboratory, Carnegie Mellon University, May 2008 Bucy JS et al (2008) The DiskSim simulation environment version 4.0 reference manual. Technical Report CMU-PDL-08-101, Parallel Data Laboratory, Carnegie Mellon University, May 2008
31.
Zurück zum Zitat Kim D et al (2016) Exploiting compression-induced internal fragmentation for power-off recovery in SSD. IEEE Trans Comput 65(6):1720–1733MathSciNetCrossRef Kim D et al (2016) Exploiting compression-induced internal fragmentation for power-off recovery in SSD. IEEE Trans Comput 65(6):1720–1733MathSciNetCrossRef
Metadaten
Titel
CA-Dedupe: content-aware deduplication in SSDs
verfasst von
Ramin Gholami Taghizadeh
Reza Gholami Taghizadeh
Fahimeh Khakpash
Mohammadreza Binesh Marvasti
Seyyed Amir Asghari
Publikationsdatum
04.02.2020
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 11/2020
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-020-03188-z

Weitere Artikel der Ausgabe 11/2020

The Journal of Supercomputing 11/2020 Zur Ausgabe