Skip to main content
Top

2018 | OriginalPaper | Chapter

Developing Cost-Effective Data Rescue Schemes to Tackle Disk Failures in Data Centers

Authors : Zhi Qiao, Jacob Hochstetler, Shuwen Liang, Song Fu, Hsing-bung Chen, Bradley Settlemyer

Published in: Big Data – BigData 2018

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Ensuring the reliability of large-scale storage systems remains a challenge, especially when there are millions of disk drives deployed. Post-failure disk rebuild takes much longer time nowadays due to the ever-increasing disk capacity, which also increases the risk of service unavailability and even data loss. In this paper, we present a proactive data protection (PDP) framework in the ZFS file system to rescue data from disks before actual failure onset. By reducing the risk of data loss and mitigating the prolonged disk rebuilds caused by disk failures, PDP is designed to enhance the overall storage reliability. We extensively evaluate the recovery performance of ZFS with diverse configurations, and further explore disk failure prediction techniques to develop a proactive data protection mechanism in ZFS. We further compare the performance of different data protection strategies, including post-failure disk recovery, proactive disk cloning, and proactive data recovery. We propose an analytic model that uses storage utilization and contextual system information to select the best data protection strategy to achieve cost-effective and enhanced storage reliability.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Gibson, G.A., Patterson, D.A.: Designing disk arrays for high data reliability. J. Parallel Distrib. Comput. 17(1–2), 4–27 (1993)CrossRef Gibson, G.A., Patterson, D.A.: Designing disk arrays for high data reliability. J. Parallel Distrib. Comput. 17(1–2), 4–27 (1993)CrossRef
2.
go back to reference Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: Proceedings of the ICANN/ICONIP (2003) Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: Proceedings of the ICANN/ICONIP (2003)
3.
go back to reference Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)MathSciNetMATH Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)MathSciNetMATH
4.
go back to reference Pinheiro, E., Weber, W.D., Barroso, L.A.: Failure trends in a large disk drive population. In: Proceedings of the 8th USENIX Conference on File and Storage Technologies (2007) Pinheiro, E., Weber, W.D., Barroso, L.A.: Failure trends in a large disk drive population. In: Proceedings of the 8th USENIX Conference on File and Storage Technologies (2007)
5.
go back to reference Mahdisoltani, F., Stefanovici, I.A., Schroeder, B.: Proactive error prediction to improve storage system reliability. In: USENIX Annual Technical Conference (2017) Mahdisoltani, F., Stefanovici, I.A., Schroeder, B.: Proactive error prediction to improve storage system reliability. In: USENIX Annual Technical Conference (2017)
6.
go back to reference Bonwick, J., Ahrens, M., Henson, V., Maybee, M., Shellenbaum, M.: The zettabyte file system. In: Proceedings of the 2nd USENIX Conference on File and Storage Technologies, vol. 215 (2003) Bonwick, J., Ahrens, M., Henson, V., Maybee, M., Shellenbaum, M.: The zettabyte file system. In: Proceedings of the 2nd USENIX Conference on File and Storage Technologies, vol. 215 (2003)
7.
go back to reference Heger, D.A.: Workload dependent performance evaluation of the Btrfs and ZFS filesystems. In: Proceedings of the International Conference of CMG (2009) Heger, D.A.: Workload dependent performance evaluation of the Btrfs and ZFS filesystems. In: Proceedings of the International Conference of CMG (2009)
8.
go back to reference Phromchana, V., Nupairoj, N., Piromsopa, K.: Performance evaluation of ZFS and LVM (with ext4) for scalable storage system. In: 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 250–253. IEEE (2011) Phromchana, V., Nupairoj, N., Piromsopa, K.: Performance evaluation of ZFS and LVM (with ext4) for scalable storage system. In: 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 250–253. IEEE (2011)
9.
go back to reference Mohr, R., Peltz Jr., P.: Benchmarking SSD-based lustre file system configurations. In: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment. ACM (2014). Article no. 32 Mohr, R., Peltz Jr., P.: Benchmarking SSD-based lustre file system configurations. In: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment. ACM (2014). Article no. 32
10.
go back to reference Goldszmidt, M.: Finding soon-to-fail disks in a haystack. In: Proceedings of the HotStorage (2012) Goldszmidt, M.: Finding soon-to-fail disks in a haystack. In: Proceedings of the HotStorage (2012)
11.
go back to reference Huang, S., Fu, S., Zhang, Q., Shi, W.: Characterizing disk failures with quantified disk degradation signatures: an early experience. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 150–159. IEEE (2015) Huang, S., Fu, S., Zhang, Q., Shi, W.: Characterizing disk failures with quantified disk degradation signatures: an early experience. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 150–159. IEEE (2015)
12.
go back to reference Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining ACM SIGKDD, pp. 39–48. ACM (2016) Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining ACM SIGKDD, pp. 39–48. ACM (2016)
13.
go back to reference Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S., et al.: Erasure coding in windows azure storage. In: USENIX ATC, Boston, MA, pp. 15–26 (2012) Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S., et al.: Erasure coding in windows azure storage. In: USENIX ATC, Boston, MA, pp. 15–26 (2012)
14.
go back to reference Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., Welnicki, M.: HYDRAstor: a scalable secondary storage. In: FAST 2009, pp. 197–210 (2009) Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., Welnicki, M.: HYDRAstor: a scalable secondary storage. In: FAST 2009, pp. 197–210 (2009)
15.
go back to reference Chen, H.B., Fu, S.: Improving coding performance and energy efficiency of erasure coding process for storage systems-a parallel and scalable approach. In: 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), pp. 933–936. IEEE (2016) Chen, H.B., Fu, S.: Improving coding performance and energy efficiency of erasure coding process for storage systems-a parallel and scalable approach. In: 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), pp. 933–936. IEEE (2016)
16.
go back to reference Chen, H.B., Fu, S.: Parallel erasure coding: exploring task parallelism in erasure coding for enhanced bandwidth and energy efficiency. In: 2016 IEEE International Conference on Networking, Architecture and Storage (NAS), pp. 1–4. IEEE (2016) Chen, H.B., Fu, S.: Parallel erasure coding: exploring task parallelism in erasure coding for enhanced bandwidth and energy efficiency. In: 2016 IEEE International Conference on Networking, Architecture and Storage (NAS), pp. 1–4. IEEE (2016)
17.
go back to reference Plank, J.S., Simmerman, S., Schuman, C.D.: Jerasure: a library in c/c++ facilitating erasure coding for storage applications-version 1.2. University of Tennessee, Technical report CS-08-627 23 (2008) Plank, J.S., Simmerman, S., Schuman, C.D.: Jerasure: a library in c/c++ facilitating erasure coding for storage applications-version 1.2. University of Tennessee, Technical report CS-08-627 23 (2008)
18.
go back to reference Blaum, M., Brady, J., Bruck, J., Menon, J.: EVENODD: an efficient scheme for tolerating double disk failures in raid architectures. IEEE Trans. Comput. 44(2), 192–202 (1995)CrossRef Blaum, M., Brady, J., Bruck, J., Menon, J.: EVENODD: an efficient scheme for tolerating double disk failures in raid architectures. IEEE Trans. Comput. 44(2), 192–202 (1995)CrossRef
19.
go back to reference Alvarez, G.A., Burkhard, W.A., Cristian, F.: Tolerating multiple failures in raid architectures with optimal storage and uniform declustering. ACM SIGARCH Comput. Archit. News 25, 62–72 (1997)CrossRef Alvarez, G.A., Burkhard, W.A., Cristian, F.: Tolerating multiple failures in raid architectures with optimal storage and uniform declustering. ACM SIGARCH Comput. Archit. News 25, 62–72 (1997)CrossRef
20.
go back to reference Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., Sankar, S.: Row-diagonal parity for double disk failure correction. In: Proceedings of the 8th USENIX Conference on File and Storage Technologies (2004) Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., Sankar, S.: Row-diagonal parity for double disk failure correction. In: Proceedings of the 8th USENIX Conference on File and Storage Technologies (2004)
21.
go back to reference Tsai, W.J., Lee, S.Y.: Multi-partition raid: a new method for improving performance of disk arrays under failure. Comput. J. 40(1), 30–42 (1997)CrossRef Tsai, W.J., Lee, S.Y.: Multi-partition raid: a new method for improving performance of disk arrays under failure. Comput. J. 40(1), 30–42 (1997)CrossRef
22.
go back to reference Wu, S., Jiang, H., Feng, D., Tian, L., Mao, B.: Improving availability of raid-structured storage systems by workload outsourcing. IEEE Trans. Comput. 60(1), 64–79 (2011)MathSciNetCrossRef Wu, S., Jiang, H., Feng, D., Tian, L., Mao, B.: Improving availability of raid-structured storage systems by workload outsourcing. IEEE Trans. Comput. 60(1), 64–79 (2011)MathSciNetCrossRef
23.
go back to reference Holland, M., Gibson, G.A.: Parity declustering for continuous operation in redundant disk arrays, vol. 27. ACM (1992)CrossRef Holland, M., Gibson, G.A.: Parity declustering for continuous operation in redundant disk arrays, vol. 27. ACM (1992)CrossRef
24.
go back to reference Chau, S.C., Fu, A.W.C.: A gracefully degradable declustered raid architecture. Cluster Comput. 5(1), 97–105 (2002)CrossRef Chau, S.C., Fu, A.W.C.: A gracefully degradable declustered raid architecture. Cluster Comput. 5(1), 97–105 (2002)CrossRef
Metadata
Title
Developing Cost-Effective Data Rescue Schemes to Tackle Disk Failures in Data Centers
Authors
Zhi Qiao
Jacob Hochstetler
Shuwen Liang
Song Fu
Hsing-bung Chen
Bradley Settlemyer
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-94301-5_15

Premium Partner