Skip to main content

2018 | OriginalPaper | Buchkapitel

Developing Cost-Effective Data Rescue Schemes to Tackle Disk Failures in Data Centers

verfasst von : Zhi Qiao, Jacob Hochstetler, Shuwen Liang, Song Fu, Hsing-bung Chen, Bradley Settlemyer

Erschienen in: Big Data – BigData 2018

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Ensuring the reliability of large-scale storage systems remains a challenge, especially when there are millions of disk drives deployed. Post-failure disk rebuild takes much longer time nowadays due to the ever-increasing disk capacity, which also increases the risk of service unavailability and even data loss. In this paper, we present a proactive data protection (PDP) framework in the ZFS file system to rescue data from disks before actual failure onset. By reducing the risk of data loss and mitigating the prolonged disk rebuilds caused by disk failures, PDP is designed to enhance the overall storage reliability. We extensively evaluate the recovery performance of ZFS with diverse configurations, and further explore disk failure prediction techniques to develop a proactive data protection mechanism in ZFS. We further compare the performance of different data protection strategies, including post-failure disk recovery, proactive disk cloning, and proactive data recovery. We propose an analytic model that uses storage utilization and contextual system information to select the best data protection strategy to achieve cost-effective and enhanced storage reliability.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Gibson, G.A., Patterson, D.A.: Designing disk arrays for high data reliability. J. Parallel Distrib. Comput. 17(1–2), 4–27 (1993)CrossRef Gibson, G.A., Patterson, D.A.: Designing disk arrays for high data reliability. J. Parallel Distrib. Comput. 17(1–2), 4–27 (1993)CrossRef
2.
Zurück zum Zitat Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: Proceedings of the ICANN/ICONIP (2003) Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: Proceedings of the ICANN/ICONIP (2003)
3.
Zurück zum Zitat Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)MathSciNetMATH Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)MathSciNetMATH
4.
Zurück zum Zitat Pinheiro, E., Weber, W.D., Barroso, L.A.: Failure trends in a large disk drive population. In: Proceedings of the 8th USENIX Conference on File and Storage Technologies (2007) Pinheiro, E., Weber, W.D., Barroso, L.A.: Failure trends in a large disk drive population. In: Proceedings of the 8th USENIX Conference on File and Storage Technologies (2007)
5.
Zurück zum Zitat Mahdisoltani, F., Stefanovici, I.A., Schroeder, B.: Proactive error prediction to improve storage system reliability. In: USENIX Annual Technical Conference (2017) Mahdisoltani, F., Stefanovici, I.A., Schroeder, B.: Proactive error prediction to improve storage system reliability. In: USENIX Annual Technical Conference (2017)
6.
Zurück zum Zitat Bonwick, J., Ahrens, M., Henson, V., Maybee, M., Shellenbaum, M.: The zettabyte file system. In: Proceedings of the 2nd USENIX Conference on File and Storage Technologies, vol. 215 (2003) Bonwick, J., Ahrens, M., Henson, V., Maybee, M., Shellenbaum, M.: The zettabyte file system. In: Proceedings of the 2nd USENIX Conference on File and Storage Technologies, vol. 215 (2003)
7.
Zurück zum Zitat Heger, D.A.: Workload dependent performance evaluation of the Btrfs and ZFS filesystems. In: Proceedings of the International Conference of CMG (2009) Heger, D.A.: Workload dependent performance evaluation of the Btrfs and ZFS filesystems. In: Proceedings of the International Conference of CMG (2009)
8.
Zurück zum Zitat Phromchana, V., Nupairoj, N., Piromsopa, K.: Performance evaluation of ZFS and LVM (with ext4) for scalable storage system. In: 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 250–253. IEEE (2011) Phromchana, V., Nupairoj, N., Piromsopa, K.: Performance evaluation of ZFS and LVM (with ext4) for scalable storage system. In: 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE), pp. 250–253. IEEE (2011)
9.
Zurück zum Zitat Mohr, R., Peltz Jr., P.: Benchmarking SSD-based lustre file system configurations. In: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment. ACM (2014). Article no. 32 Mohr, R., Peltz Jr., P.: Benchmarking SSD-based lustre file system configurations. In: Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment. ACM (2014). Article no. 32
10.
Zurück zum Zitat Goldszmidt, M.: Finding soon-to-fail disks in a haystack. In: Proceedings of the HotStorage (2012) Goldszmidt, M.: Finding soon-to-fail disks in a haystack. In: Proceedings of the HotStorage (2012)
11.
Zurück zum Zitat Huang, S., Fu, S., Zhang, Q., Shi, W.: Characterizing disk failures with quantified disk degradation signatures: an early experience. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 150–159. IEEE (2015) Huang, S., Fu, S., Zhang, Q., Shi, W.: Characterizing disk failures with quantified disk degradation signatures: an early experience. In: IEEE International Symposium on Workload Characterization (IISWC), pp. 150–159. IEEE (2015)
12.
Zurück zum Zitat Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining ACM SIGKDD, pp. 39–48. ACM (2016) Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining ACM SIGKDD, pp. 39–48. ACM (2016)
13.
Zurück zum Zitat Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S., et al.: Erasure coding in windows azure storage. In: USENIX ATC, Boston, MA, pp. 15–26 (2012) Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., Yekhanin, S., et al.: Erasure coding in windows azure storage. In: USENIX ATC, Boston, MA, pp. 15–26 (2012)
14.
Zurück zum Zitat Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., Welnicki, M.: HYDRAstor: a scalable secondary storage. In: FAST 2009, pp. 197–210 (2009) Dubnicki, C., Gryz, L., Heldt, L., Kaczmarczyk, M., Kilian, W., Strzelczak, P., Szczepkowski, J., Ungureanu, C., Welnicki, M.: HYDRAstor: a scalable secondary storage. In: FAST 2009, pp. 197–210 (2009)
15.
Zurück zum Zitat Chen, H.B., Fu, S.: Improving coding performance and energy efficiency of erasure coding process for storage systems-a parallel and scalable approach. In: 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), pp. 933–936. IEEE (2016) Chen, H.B., Fu, S.: Improving coding performance and energy efficiency of erasure coding process for storage systems-a parallel and scalable approach. In: 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), pp. 933–936. IEEE (2016)
16.
Zurück zum Zitat Chen, H.B., Fu, S.: Parallel erasure coding: exploring task parallelism in erasure coding for enhanced bandwidth and energy efficiency. In: 2016 IEEE International Conference on Networking, Architecture and Storage (NAS), pp. 1–4. IEEE (2016) Chen, H.B., Fu, S.: Parallel erasure coding: exploring task parallelism in erasure coding for enhanced bandwidth and energy efficiency. In: 2016 IEEE International Conference on Networking, Architecture and Storage (NAS), pp. 1–4. IEEE (2016)
17.
Zurück zum Zitat Plank, J.S., Simmerman, S., Schuman, C.D.: Jerasure: a library in c/c++ facilitating erasure coding for storage applications-version 1.2. University of Tennessee, Technical report CS-08-627 23 (2008) Plank, J.S., Simmerman, S., Schuman, C.D.: Jerasure: a library in c/c++ facilitating erasure coding for storage applications-version 1.2. University of Tennessee, Technical report CS-08-627 23 (2008)
18.
Zurück zum Zitat Blaum, M., Brady, J., Bruck, J., Menon, J.: EVENODD: an efficient scheme for tolerating double disk failures in raid architectures. IEEE Trans. Comput. 44(2), 192–202 (1995)CrossRef Blaum, M., Brady, J., Bruck, J., Menon, J.: EVENODD: an efficient scheme for tolerating double disk failures in raid architectures. IEEE Trans. Comput. 44(2), 192–202 (1995)CrossRef
19.
Zurück zum Zitat Alvarez, G.A., Burkhard, W.A., Cristian, F.: Tolerating multiple failures in raid architectures with optimal storage and uniform declustering. ACM SIGARCH Comput. Archit. News 25, 62–72 (1997)CrossRef Alvarez, G.A., Burkhard, W.A., Cristian, F.: Tolerating multiple failures in raid architectures with optimal storage and uniform declustering. ACM SIGARCH Comput. Archit. News 25, 62–72 (1997)CrossRef
20.
Zurück zum Zitat Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., Sankar, S.: Row-diagonal parity for double disk failure correction. In: Proceedings of the 8th USENIX Conference on File and Storage Technologies (2004) Corbett, P., English, B., Goel, A., Grcanac, T., Kleiman, S., Leong, J., Sankar, S.: Row-diagonal parity for double disk failure correction. In: Proceedings of the 8th USENIX Conference on File and Storage Technologies (2004)
21.
Zurück zum Zitat Tsai, W.J., Lee, S.Y.: Multi-partition raid: a new method for improving performance of disk arrays under failure. Comput. J. 40(1), 30–42 (1997)CrossRef Tsai, W.J., Lee, S.Y.: Multi-partition raid: a new method for improving performance of disk arrays under failure. Comput. J. 40(1), 30–42 (1997)CrossRef
22.
Zurück zum Zitat Wu, S., Jiang, H., Feng, D., Tian, L., Mao, B.: Improving availability of raid-structured storage systems by workload outsourcing. IEEE Trans. Comput. 60(1), 64–79 (2011)MathSciNetCrossRef Wu, S., Jiang, H., Feng, D., Tian, L., Mao, B.: Improving availability of raid-structured storage systems by workload outsourcing. IEEE Trans. Comput. 60(1), 64–79 (2011)MathSciNetCrossRef
23.
Zurück zum Zitat Holland, M., Gibson, G.A.: Parity declustering for continuous operation in redundant disk arrays, vol. 27. ACM (1992)CrossRef Holland, M., Gibson, G.A.: Parity declustering for continuous operation in redundant disk arrays, vol. 27. ACM (1992)CrossRef
24.
Zurück zum Zitat Chau, S.C., Fu, A.W.C.: A gracefully degradable declustered raid architecture. Cluster Comput. 5(1), 97–105 (2002)CrossRef Chau, S.C., Fu, A.W.C.: A gracefully degradable declustered raid architecture. Cluster Comput. 5(1), 97–105 (2002)CrossRef
Metadaten
Titel
Developing Cost-Effective Data Rescue Schemes to Tackle Disk Failures in Data Centers
verfasst von
Zhi Qiao
Jacob Hochstetler
Shuwen Liang
Song Fu
Hsing-bung Chen
Bradley Settlemyer
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-94301-5_15