nach oben

Cluster Computing

Erschienen in:

01.03.2015

De-Frag: an efficient scheme to improve deduplication performance via reducing data placement de-linearization

verfasst von: Yujuan Tan, Zhichao Yan, Dan Feng, Xubin He, Qiang Zou, Lei Yang

Erschienen in: Cluster Computing | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Data deduplication has become a commodity in large-scale storage systems, especially in data backup and archival systems. However, due to the removal of redundant data, data deduplication de-linearizes data placement and forces the data chunks of the same data object to be divided into multiple separate units. In our preliminary study, we found that the de-linearization of data placement compromises the data spatial locality that is used to improve data read performance, deduplication throughput and deduplication efficiency in some deduplication approaches, which significantly affects deduplication performance and makes some deduplication approaches become less effective. In this paper, we first analyze the negative effect of data placement de-linearization to deduplication performance, and then propose an effective approach called De-Frag to reduce the de-linearization of data placement. The key idea of De-Frag is to choose some redundant data to be written to the disks rather than be removed. It quantifies the spatial locality of each chunk group by spatial locality level (SPL for short) and writes the redundant chunks to disks when SPL value is smaller than a preset value, thus to reduce the de-linearization of data placement and enhance the spatial locality. As shown in our experimental results driven by real world datasets, De-Frag effectively enhances data spatial locality and improves deduplication throughput, deduplication efficiency, and data read performance, at the cost of slightly lower compression ratios.

Vorheriger Artikel A semantic enhanced Power Budget Calculator for distributed computing using IEEE 802.3az

Nächster Artikel GPU-based fast error recovery for high speed data communication in media technology

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Zhu, B., Li, K., Patterson, H.: Avoiding the disk bottleneck in the Data Domain deduplication file system, in FAST’08, Feb. 2008

Lillibridge, M., Eshghi, K., Bhagwat, D., Deolalikar,V., Trezise, G., Campbell, P.: Sparse Indexing: Large scale, inline deduplication using sampling and locality, in FAST’09, Feb. 2009

Bhagwat, D., Eshghi, K., Long, D.D., Lillibridge, M.: Extreme binning: scalable, parallel deduplication for chunk-based file backup, HP Laboratories, Tech. Rep. HPL-2009-10R2, Sep. 2009.

Srinivasan, K., Bisson, T., Goodson, G., Voruganti, K.: iDedup: latency-aware, inline data deduplication for primary storage, in FAST’12, Feb. 2012.

Nam, Y.J., Park, D., Du, D.: Assuring demanded read performance of data deduplication storage with backup datasets, in MASCOTS’12, Aug. 2012.

Kaczmarczyk, M., Barczynski, M., Kilian, W., Dubnicki, C.: Reducing impact of data fragmentation caused by in-line deduplication, in SYSTOR’12, Jun. 2012.

Li, X., Lillibridge, M., Uysal, M.: Reliability analysis of deduplicated and erasure-coded storage. ACM SIGMETRICS Perform Eval Rev 38(3), 4–9 (2011)CrossRef

Liu, C., Gu, Y., Sun, L., Yan, B., Wang, D.: R-ADMAD: high reliability provision for large-scale de-duplication archival storage systems, in ICS’09, Jun. 2010.

Bhagwat, D., Pollack, K., Long, D.D.E., Schwarz, T., Miller, E.L., èaris, J.P.: providing high reliability in a minimum redundancy archival storage system, in MASCOTS’06, Sep. 2006.

10.

Xia, W., Jiang, H., Feng, D., Hua, Y.: SiLo: a similarity-locality based near-exact deduplication scheme with low RAM overhead and high throughput, in USENIX’11, Jun. 2011.

11.

Rabin, M.O.: Fingerprinting by random polynomials, Center for Research in Computing Technology, Technical Report, Harvard University, TR-15-81, 1981.

12.

NIST, “Secure Hash Standard”, in FIPS PUB 180–1, May 1993.

13.

Dong, W., Douglis, F., Li, K., Patterson, H.,: TradeOffs in scalable data routing for deduplication clusters, in FAST’11, Feb. 2011.

14.

Tan, Y., Jiang, H., Feng, D., Tian, L., Yan, Z., Zhou, G.: SAM: A semantic-aware multi-tiered source de-duplication framework for cloud backup, in ICPP’10, Sep. 2010.

15.

Clements, A.T., Ahmad, I., Vilayannur, M., Li, J.: Decentralized deduplication in SAN cluster file systems, in USENIX’09, Jan. 2009.

16.

Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., Welnicki, M.: Hydrastor: a scalable secondary storage. in FAST’09, Feb. 2009.

17.

You, L.L., Pollack, K.T., Long, D.D.E.: Deep Store: An archival storage system architecture, in ICDE’05, Apr. 2005.

18.

Vrable, M., Savage, S., Voelker, G.M.: Cumulus: Filesystem backup to the cloud, in FAST’09, Feb. 2009.

19.

Tan, Y., Jiang, H., Feng, D., Tian, L., Yan, Z.: CABdedupe: A Causality-based deduplication performance booster for cloud backup services, in IPDPS’11, May. 2011.

20.

Adya, A., Bolosky, W.J., Castro, M., Cermak, G., Chaiken, R., Douceur, J.R., Howell, J., Lorch, J.R., Theimer, M., Wattenhofer, R. P.: FARSITE: federated, available, and reliable storage for an incompletely trusted environment, in OSDI’02, Dec. 2002.

21.

Bolosky, W.J., Corbin, S., Goebel, D., Douceur, J.R.: Single instance storage in windows 2000, in USENIX ’00, Aug. 2000.

22.

E. CORPORATION.: EMC Centera: Content Addressed Storage System, 2003.

23.

Quinlan, S., Dorward, S.: Venti: A new approach to archival storage, in FAST’02, Jan. 2002.

24.

Muthitacharoen, A., Chen, B., Mazières, D.: A low-bandwidth network file system, in SOSP’01, Oct. 2001.

25.

Deepak, R., Bobbar, J., Suresh, J.: Improving duplicate elimination in storage systems, ACM Trans Storage, 2(4), 2006.

26.

Eshghi, K.: A framework for analyzing and improving content based chunking algorithms, Hewlett Packard Laboratories, Tech. Rep. HPL-2005-30, Feb. 2005.

27.

Liu, C., Gu, Y., Sun, L., Yan, B., Wang, D.: ADMAD: Application-driven metadata aware de-deduplication archival storage systems, in the 25th IEEE Conference on Mass Storage Systems and Technologies, Sep. 2008.

28.

Rhea, S., Cox, R., Pesterev, A.: Fast, inexpensive content-addressed storage in Foundation, in USENIX’08, Jun. 2008.

29.

Debnath, B., Senguptaz, S., Li, J.: ChunkStash: speeding up inline storage deduplication using flash memory, in USENIX’10, Jun. 2010.

30.

Guo, F., Efstathopoulos, P.: Building a high-performance deduplication system, in USENIX’11, Jun. 2011.

31.

Tan, Y., Yan, Z., Feng, D., Sha, E.H.M.: Reducing the de-linearization of data placement to improve deduplication performance, in International Workshop on Data-Intensive Scalable Computing Systems (DISCS, in conjunction with the 2012 ACM/IEEE Supercomputing Conference), Nov. 2012.

Titel: De-Frag: an efficient scheme to improve deduplication performance via reducing data placement de-linearization
verfasst von: Yujuan Tan
Zhichao Yan
Dan Feng
Xubin He
Qiang Zou
Lei Yang
Publikationsdatum: 01.03.2015
Verlag: Springer US
Erschienen in: Cluster Computing / Ausgabe 1/2015
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI: https://doi.org/10.1007/s10586-014-0397-5

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2015

Matching the business perspectives of providers and customers in future cloud markets

A highly-accurate and low-overhead prediction model for transfer throughput optimization

A blog ranking algorithm using analysis of both blog influence and characteristics of blog posts

A framework to address inconstant user requirements in cloud SLAs management

Mobile, ubiquitous multimedia and digital convergence

DI-MMAP—a scalable memory-map runtime for out-of-core data-intensive applications