Skip to main content

2018 | OriginalPaper | Buchkapitel

Improving Restore Performance of Deduplication Systems by Leveraging the Chunk Sequence in Backup Stream

verfasst von : Ru Yang, Yuhui Deng, Cheng Hu, Lei Si

Erschienen in: Algorithms and Architectures for Parallel Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Traditional deduplication based backup systems normally employ containers to reduce the chunk fragmentation, thus improving the restore performance. However, the shared chunks belonging to a single backup grows with the increase of the number of backups. Those shared chunks are normally distributed across multiple containers. This feature increases chunk fragmentation and significantly degrades the restore performance. In order to improve the restore performance, some schemes are proposed to optimize the replacement strategy of the restore cache, such as the ones using LRU and OPT. However, LRU is inefficient and OPT consumes additional computational overhead. By analyzing the backup and restore process, we observe that the sequence of the chunks in the backup stream is consistent to that in the restore stream. Based on this observation, this paper proposes an off-line optimal replacement strategy—OFL for the restore cache. The OFL records the chunk sequence of backup process, and then uses this sequence to calculate the exact information of the required chunks in advance for the restore process. Finally, accurate prefetch will be employed by leveraging the above information to reduce the impact of chunk fragmentation. Real data sets are employed to evaluate the proposed OFL. The experimental results demonstrate that OFL improves the restore performance over 8% in contrast to the traditional LRU and OPT.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Dubois, L., Amaldas, M., Sheppard, E.: Key considerations as deduplication evolves into primary storage. White Paper (2011) Dubois, L., Amaldas, M., Sheppard, E.: Key considerations as deduplication evolves into primary storage. White Paper (2011)
2.
Zurück zum Zitat Deng, Y.: What is the future of disk drives, death or rebirth? ACM Comput. Surv. 43(3), 23:1–23:27 (2011)CrossRef Deng, Y.: What is the future of disk drives, death or rebirth? ACM Comput. Surv. 43(3), 23:1–23:27 (2011)CrossRef
3.
Zurück zum Zitat Zhou, K., Hu, S., Huang, P., Zhao, Y.: LX-SSD: enhancing the lifespan of NAND flash-based memory via recycling invalid pages. In: Proceedings of the 33rd International Conference on Massive Storage Systems and Technology, MSST 2017 (2017) Zhou, K., Hu, S., Huang, P., Zhao, Y.: LX-SSD: enhancing the lifespan of NAND flash-based memory via recycling invalid pages. In: Proceedings of the 33rd International Conference on Massive Storage Systems and Technology, MSST 2017 (2017)
4.
Zurück zum Zitat Wei, J., Jiang, H., Zhou, K., Feng, D.: Efficiently representing membership for variable large data sets. IEEE Trans. Parallel Distrib. Syst. 25(4), 960–970 (2014)CrossRef Wei, J., Jiang, H., Zhou, K., Feng, D.: Efficiently representing membership for variable large data sets. IEEE Trans. Parallel Distrib. Syst. 25(4), 960–970 (2014)CrossRef
5.
Zurück zum Zitat Benjamin, Z., Kai, L., Patterson, R.H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies, FAST 2008, vol. 8, pp. 269–282 (2008) Benjamin, Z., Kai, L., Patterson, R.H.: Avoiding the disk bottleneck in the data domain deduplication file system. In: Proceedings of the 6th USENIX Conference on File and Storage Technologies, FAST 2008, vol. 8, pp. 269–282 (2008)
6.
Zurück zum Zitat Bhagwat, D., Eshghi, K., Long, D.D.E., Lillibridge, M.: Extreme binning: scalable, parallel deduplication for chunk-based file backup. In: Proceedings of the 2009 IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems, pp. 1–9 (2009) Bhagwat, D., Eshghi, K., Long, D.D.E., Lillibridge, M.: Extreme binning: scalable, parallel deduplication for chunk-based file backup. In: Proceedings of the 2009 IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems, pp. 1–9 (2009)
7.
Zurück zum Zitat Mark, L., Kave, E., Deepavali, B., Vinay, D., Greg, T., Peter, C.: Sparse indexing: large scale, inline deduplication using sampling and locality. In: Proceedings of the 7th USENIX Conference on File and Storage Technologies, Fast 2009, vol. 9, pp. 111–123 (2009) Mark, L., Kave, E., Deepavali, B., Vinay, D., Greg, T., Peter, C.: Sparse indexing: large scale, inline deduplication using sampling and locality. In: Proceedings of the 7th USENIX Conference on File and Storage Technologies, Fast 2009, vol. 9, pp. 111–123 (2009)
8.
Zurück zum Zitat Wen, X., Hong, J., Dan, F., Yu, H.: SiLo: a similarity-locality based near-exact deduplication scheme with low ram overhead and high throughput. In: Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference, USENIXATC 2011, pp. 26–28 (2011) Wen, X., Hong, J., Dan, F., Yu, H.: SiLo: a similarity-locality based near-exact deduplication scheme with low ram overhead and high throughput. In: Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference, USENIXATC 2011, pp. 26–28 (2011)
9.
Zurück zum Zitat Zhou, Y., Deng, Y., Yang, L.T., Yang, R., Si, L.: LDFS: a low latency in-line data deduplication file system. IEEE Access 6, 15 743–15 753 (2018)CrossRef Zhou, Y., Deng, Y., Yang, L.T., Yang, R., Si, L.: LDFS: a low latency in-line data deduplication file system. IEEE Access 6, 15 743–15 753 (2018)CrossRef
10.
Zurück zum Zitat Erik, K., Cristian, U., Cezary, D.: Bimodal content defined chunking for backup streams. In: Proceedings of the 8th USENIX Conference on File and Storage Technologies, FAST 2010, pp. 239–252 (2010) Erik, K., Cristian, U., Cezary, D.: Bimodal content defined chunking for backup streams. In: Proceedings of the 8th USENIX Conference on File and Storage Technologies, FAST 2010, pp. 239–252 (2010)
11.
Zurück zum Zitat Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of the Conference on File Storage Technologies, FAST 2002, vol. 2, pp. 89–101 (2002) Quinlan, S., Dorward, S.: Venti: a new approach to archival storage. In: Proceedings of the Conference on File Storage Technologies, FAST 2002, vol. 2, pp. 89–101 (2002)
12.
Zurück zum Zitat Athicha, M., Benjie, C., David, M.: A low-bandwidth network file system. In: Proceedings of the 18th ACM Symposium on Operating Systems Principles, vol. 35, no. 5, pp. 174–187. ACM (2001) Athicha, M., Benjie, C., David, M.: A low-bandwidth network file system. In: Proceedings of the 18th ACM Symposium on Operating Systems Principles, vol. 35, no. 5, pp. 174–187. ACM (2001)
13.
Zurück zum Zitat Nam, Y.J., Park, D., Du, D.H.: Assuring demanded read performance of data deduplication storage with backup datasets. In: Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2012, pp. 201–208. IEEE (2012) Nam, Y.J., Park, D., Du, D.H.: Assuring demanded read performance of data deduplication storage with backup datasets. In: Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2012, pp. 201–208. IEEE (2012)
14.
Zurück zum Zitat Deng, Y., Huang, X., Song, L., Zhou, Y., Wang, F.: Memory deduplication: an effective approach to improve the memory system. J. Inf. Sci. Eng. 33(5), 1103–1120 (2017) Deng, Y., Huang, X., Song, L., Zhou, Y., Wang, F.: Memory deduplication: an effective approach to improve the memory system. J. Inf. Sci. Eng. 33(5), 1103–1120 (2017)
15.
Zurück zum Zitat Deng, Y., Hu, Y., Meng, X., Zhu, Y., Zhang, Z., Han, J.: Predictively booting nodes to minimize performance degradation of a power-aware web cluster. Cluster Comput. 17(4), 1309–1322 (2014)CrossRef Deng, Y., Hu, Y., Meng, X., Zhu, Y., Zhang, Z., Han, J.: Predictively booting nodes to minimize performance degradation of a power-aware web cluster. Cluster Comput. 17(4), 1309–1322 (2014)CrossRef
16.
Zurück zum Zitat Qu, Z., Chen, Y.: Efficient data restoration for a disk-based network backup system. In: Proceedings of the IEEE International Conference, vol. 1, pp. 584–590 (2004) Qu, Z., Chen, Y.: Efficient data restoration for a disk-based network backup system. In: Proceedings of the IEEE International Conference, vol. 1, pp. 584–590 (2004)
17.
Zurück zum Zitat Schulman, R.R.: Disaster recovery issues and solutions. Hitachi Data Systems White Paper, p. 23 (2004) Schulman, R.R.: Disaster recovery issues and solutions. Hitachi Data Systems White Paper, p. 23 (2004)
18.
Zurück zum Zitat Xie, J., Deng, Y., Min, G., Zhou, Y.: An incrementally scalable and cost-efficient interconnection structure for datacenters. IEEE Trans. Parallel Distrib. Syst. 28(6), 1578–1592 (2017)CrossRef Xie, J., Deng, Y., Min, G., Zhou, Y.: An incrementally scalable and cost-efficient interconnection structure for datacenters. IEEE Trans. Parallel Distrib. Syst. 28(6), 1578–1592 (2017)CrossRef
19.
Zurück zum Zitat Kaczmarczyk, M., Barczynski, M., Kilian, W., Dubnicki, C.: Reducing impact of data fragmentation caused by in-line deduplication. In: Proceedings of the 5th Annual International Systems and Storage Conference, SYSTOR 2012, pp. 15:1–15:12 (2012) Kaczmarczyk, M., Barczynski, M., Kilian, W., Dubnicki, C.: Reducing impact of data fragmentation caused by in-line deduplication. In: Proceedings of the 5th Annual International Systems and Storage Conference, SYSTOR 2012, pp. 15:1–15:12 (2012)
20.
Zurück zum Zitat Lillibridge, M., Eshghi, K., Bhagwat, D.: Improving restore speed for backup systems that use inline chunk-based deduplication. In: Proceedings of the 11th USENIX Conference on File and Storage Technologies, FAST 2013, pp. 183–198 (2013) Lillibridge, M., Eshghi, K., Bhagwat, D.: Improving restore speed for backup systems that use inline chunk-based deduplication. In: Proceedings of the 11th USENIX Conference on File and Storage Technologies, FAST 2013, pp. 183–198 (2013)
21.
Zurück zum Zitat Fu, M., et al.: Accelerating restore and garbage collection in deduplication-based backup systems via exploiting historical information. In: Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC 2014, pp. 181–192 (2014) Fu, M., et al.: Accelerating restore and garbage collection in deduplication-based backup systems via exploiting historical information. In: Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC 2014, pp. 181–192 (2014)
22.
Zurück zum Zitat Srinivasan, K., Bisson, T., Goodson, G.R., Voruganti, K.: iDedup: latency-aware, inline data deduplication for primary storage. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, FAST 2012, vol. 12, pp. 1–14 (2012) Srinivasan, K., Bisson, T., Goodson, G.R., Voruganti, K.: iDedup: latency-aware, inline data deduplication for primary storage. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies, FAST 2012, vol. 12, pp. 1–14 (2012)
23.
Zurück zum Zitat EMC: Achieving storage efficiency through EMC celerra data deduplication. White Paper (2010) EMC: Achieving storage efficiency through EMC celerra data deduplication. White Paper (2010)
24.
Zurück zum Zitat Adlercohen, C., Czarnowicki, T., Dreiher, J., Ruzicka, T., Ingber, A., Harari, M.: NetApp deduplication for FAS and V-series deployment and implementation guide. Technical report, vol. 2009, no. 1, pp. 141 753–141 753 (2011) Adlercohen, C., Czarnowicki, T., Dreiher, J., Ruzicka, T., Ingber, A., Harari, M.: NetApp deduplication for FAS and V-series deployment and implementation guide. Technical report, vol. 2009, no. 1, pp. 141 753–141 753 (2011)
25.
Zurück zum Zitat Min, F., et al.: Design tradeoffs for data deduplication performance in backup workloads. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST 2015, pp. 331–344 (2015) Min, F., et al.: Design tradeoffs for data deduplication performance in backup workloads. In: Proceedings of the 13th USENIX Conference on File and Storage Technologies, FAST 2015, pp. 331–344 (2015)
26.
Zurück zum Zitat Belady, L.A.: A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5(2), 78–101 (1966)CrossRef Belady, L.A.: A study of replacement algorithms for a virtual-storage computer. IBM Syst. J. 5(2), 78–101 (1966)CrossRef
27.
Zurück zum Zitat Meister, D., Brinkmann, A., Süß, T.: File recipe compression in data deduplication systems. In: Proceedings of the 11th USENIX Conference on File and Storage Technologies, FAST 2013, pp. 175–182 (2013) Meister, D., Brinkmann, A., Süß, T.: File recipe compression in data deduplication systems. In: Proceedings of the 11th USENIX Conference on File and Storage Technologies, FAST 2013, pp. 175–182 (2013)
28.
Zurück zum Zitat Agrawal, N., Bolosky, W.J., Douceur, J.R., Lorch, J.R.: A five-year study of file-system metadata. Trans. Storage 3(3), 9 (2007)CrossRef Agrawal, N., Bolosky, W.J., Douceur, J.R., Lorch, J.R.: A five-year study of file-system metadata. Trans. Storage 3(3), 9 (2007)CrossRef
29.
Zurück zum Zitat Meyer, D.T., Bolosky, W.J.: A study of practical deduplication. Trans. Storage 7(4), 14:1–14:20 (2012)CrossRef Meyer, D.T., Bolosky, W.J.: A study of practical deduplication. Trans. Storage 7(4), 14:1–14:20 (2012)CrossRef
30.
Zurück zum Zitat Rabin, M.: Fingerprinting by random polynomials (1981) Rabin, M.: Fingerprinting by random polynomials (1981)
Metadaten
Titel
Improving Restore Performance of Deduplication Systems by Leveraging the Chunk Sequence in Backup Stream
verfasst von
Ru Yang
Yuhui Deng
Cheng Hu
Lei Si
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-05051-1_26

Premium Partner