Skip to main content

2020 | OriginalPaper | Buchkapitel

Predicting Hard Drive Failures for Cloud Storage Systems

verfasst von : Dongshi Liu, Bo Wang, Peng Li, Rebecca J. Stones, Trent G. Marbach, Gang Wang, Xiaoguang Liu, Zhongwei Li

Erschienen in: Algorithms and Architectures for Parallel Processing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

To improve reactive hard-drive fault-tolerance techniques, many statistical and machine learning methods have been proposed for failure prediction based on SMART attributes. However, disparate datasets and metrics have been used to experimentally evaluate these models, so a direct comparison between them cannot readily be made.
In this paper, we provide an improvement to the Recurrent Neural Network model, which experimentally achieves a 98.06% migration rate and a 0.0% mismigration rate, outperforming the state-of-the-art Gradient-Boosted Regression Tree model, and achieves 100.0% failure detection rate at a 0.02% false alarm rate, outperforming the unmodified Recurrent Neural Network model in terms of prediction accuracy. We also experimentally compare five families of prediction models (nine models in total), and simulate the practical use.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Here and throughout the paper, despite the grammatical mismatch between “good” vs. “failed”, for brevity we use “failed” as an adjective to describe hard drives which fail during data collection; all other hard drives are “good”. This awkward nomenclature is consistent with many papers on this topic.
 
Literatur
1.
Zurück zum Zitat Allen, B.: Monitoring hard disks with SMART. Linux J. (117), 74–77 (2004) Allen, B.: Monitoring hard disks with SMART. Linux J. (117), 74–77 (2004)
2.
Zurück zum Zitat Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of SIGKDD, pp. 39–48 (2016) Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of SIGKDD, pp. 39–48 (2016)
3.
Zurück zum Zitat Chaves, I.C., de Paula, M.R.P., Leite, L.G.M., Gomes, J.P.P., Machado, J.C.: Hard disk drive failure prediction method based on a Bayesian network. In: Proceedings of IJCNN, pp. 1–7 (2018) Chaves, I.C., de Paula, M.R.P., Leite, L.G.M., Gomes, J.P.P., Machado, J.C.: Hard disk drive failure prediction method based on a Bayesian network. In: Proceedings of IJCNN, pp. 1–7 (2018)
4.
Zurück zum Zitat Chaves, I.C., de Paula, M.R.P., Leite, L.G., Queiroz, L.P., Gomes, J.P.P., Machado, J.C.: BaNHFaP: a Bayesian network based failure prediction approach for hard disk drives. In: Proceedings of BRACIS, pp. 427–432 (2016) Chaves, I.C., de Paula, M.R.P., Leite, L.G., Queiroz, L.P., Gomes, J.P.P., Machado, J.C.: BaNHFaP: a Bayesian network based failure prediction approach for hard disk drives. In: Proceedings of BRACIS, pp. 427–432 (2016)
5.
Zurück zum Zitat Ganguly, S., Consul, A., Khan, A., Bussone, B., Richards, J., Miguel, A.: A practical approach to hard disk failure prediction in cloud platforms: big data model for failure management in datacenters. In: Proceedings of BigDataService, pp. 105–116 (2016) Ganguly, S., Consul, A., Khan, A., Bussone, B., Richards, J., Miguel, A.: A practical approach to hard disk failure prediction in cloud platforms: big data model for failure management in datacenters. In: Proceedings of BigDataService, pp. 105–116 (2016)
6.
Zurück zum Zitat Garcia, M., et al.: Review of techniques for predicting hard drive failure with smart attributes. Int. J. Mach. Intell. Sens. Signal Process. 2(2), 159–172 (2018) Garcia, M., et al.: Review of techniques for predicting hard drive failure with smart attributes. Int. J. Mach. Intell. Sens. Signal Process. 2(2), 159–172 (2018)
7.
Zurück zum Zitat Goldszmidt, M.: Finding soon-to-fail disks in a haystack. In: Proceedings of HotStorage (2012) Goldszmidt, M.: Finding soon-to-fail disks in a haystack. In: Proceedings of HotStorage (2012)
8.
Zurück zum Zitat Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives. In: Proceedings of ICML, pp. 202–209 (2001) Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives. In: Proceedings of ICML, pp. 202–209 (2001)
9.
Zurück zum Zitat Hughes, G.F., Murray, J.F., Kreutz-Delgado, K., Elkan, C.: Improved disk-drive failure warnings. IEEE Trans. Rel. 51(3), 350–357 (2002)CrossRef Hughes, G.F., Murray, J.F., Kreutz-Delgado, K., Elkan, C.: Improved disk-drive failure warnings. IEEE Trans. Rel. 51(3), 350–357 (2002)CrossRef
10.
Zurück zum Zitat Li, J., et al.: Hard drive failure prediction using classification and regression trees. In: Proceedings of DSN, pp. 383–394 (2014) Li, J., et al.: Hard drive failure prediction using classification and regression trees. In: Proceedings of DSN, pp. 383–394 (2014)
11.
Zurück zum Zitat Li, J., Stones, R.J., Wang, G., Li, Z., Liu, X., Xiao, K.: Being accurate is not enough: new metrics for disk failure prediction. In: Proceedings of SRDS, pp. 71–80 (2016) Li, J., Stones, R.J., Wang, G., Li, Z., Liu, X., Xiao, K.: Being accurate is not enough: new metrics for disk failure prediction. In: Proceedings of SRDS, pp. 71–80 (2016)
12.
Zurück zum Zitat Li, J., Stones, R.J., Wang, G., Liu, X., Li, Z., Xu, M.: Hard drive failure prediction using decision trees. Reliab. Eng. Syst. Saf. 164, 55–65 (2017)CrossRef Li, J., Stones, R.J., Wang, G., Liu, X., Li, Z., Xu, M.: Hard drive failure prediction using decision trees. Reliab. Eng. Syst. Saf. 164, 55–65 (2017)CrossRef
13.
Zurück zum Zitat Mahdisoltani, F., Stefanovici, I., Schroeder, B.: Proactive error prediction to improve storage system reliability. In: Proceedings of USENIX ATC, pp. 391–402 (2017) Mahdisoltani, F., Stefanovici, I., Schroeder, B.: Proactive error prediction to improve storage system reliability. In: Proceedings of USENIX ATC, pp. 391–402 (2017)
14.
Zurück zum Zitat Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: Proceedings of ICANN/ICONIP (2003) Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Hard drive failure prediction using non-parametric statistical methods. In: Proceedings of ICANN/ICONIP (2003)
15.
Zurück zum Zitat Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)MathSciNetMATH Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)MathSciNetMATH
16.
Zurück zum Zitat Pang, S., Jia, Y., Stones, R., Wang, G., Liu, X.: A combined Bayesian network method for predicting drive failure times from SMART attributes. In: Proceedings of IJCNN, pp. 4850–4856 (2016) Pang, S., Jia, Y., Stones, R., Wang, G., Liu, X.: A combined Bayesian network method for predicting drive failure times from SMART attributes. In: Proceedings of IJCNN, pp. 4850–4856 (2016)
17.
Zurück zum Zitat Pinheiro, E., Weber, W.D., Barroso, L.A.: Failure trends in a large disk drive population. In: Proceedings of FAST (2007) Pinheiro, E., Weber, W.D., Barroso, L.A.: Failure trends in a large disk drive population. In: Proceedings of FAST (2007)
18.
Zurück zum Zitat Pitakrat, T., van Hoorn, A., Grunske, L.: A comparison of machine learning algorithms for proactive hard disk drive failure detection. In: Proceedings of SIGSoft Symposium on Architecting Critical Systems, pp. 1–10 (2013) Pitakrat, T., van Hoorn, A., Grunske, L.: A comparison of machine learning algorithms for proactive hard disk drive failure detection. In: Proceedings of SIGSoft Symposium on Architecting Critical Systems, pp. 1–10 (2013)
19.
Zurück zum Zitat Qian, J., Skelton, S., Moore, J., Jiang, H.: P3: priority based proactive prediction for soon-to-fail disks. In: Proceedings of NAS, pp. 81–86 (2015) Qian, J., Skelton, S., Moore, J., Jiang, H.: P3: priority based proactive prediction for soon-to-fail disks. In: Proceedings of NAS, pp. 81–86 (2015)
20.
Zurück zum Zitat Queiroz, L.P., Rodrigues, F.C.M., Gomes, J.P.P., et al.: A fault detection method for hard disk drives based on mixture of Gaussians and nonparametric statistics. IEEE Trans. Ind. Inform. 13(2), 542–550 (2017)CrossRef Queiroz, L.P., Rodrigues, F.C.M., Gomes, J.P.P., et al.: A fault detection method for hard disk drives based on mixture of Gaussians and nonparametric statistics. IEEE Trans. Ind. Inform. 13(2), 542–550 (2017)CrossRef
21.
Zurück zum Zitat Rincón, C.C.A., Pâris, J.F., Vilalta, R., Cheng, A.M., Long, D.D.: Disk failure prediction in heterogeneous environments. In: Proceedings of SPECTS, pp. 1–7 (2017) Rincón, C.C.A., Pâris, J.F., Vilalta, R., Cheng, A.M., Long, D.D.: Disk failure prediction in heterogeneous environments. In: Proceedings of SPECTS, pp. 1–7 (2017)
22.
Zurück zum Zitat Schroeder, B., Gibson, G.A.: Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you? In: Proceedings of FAST, vol. 7, pp. 1–16 (2007) Schroeder, B., Gibson, G.A.: Disk failures in the real world: what does an MTTF of 1,000,000 hours mean to you? In: Proceedings of FAST, vol. 7, pp. 1–16 (2007)
23.
Zurück zum Zitat Vishwanath, K.V., Nagappan, N.: Characterizing cloud computing hardware reliability. In: Proceedings of SoCC, pp. 193–204 (2010) Vishwanath, K.V., Nagappan, N.: Characterizing cloud computing hardware reliability. In: Proceedings of SoCC, pp. 193–204 (2010)
24.
Zurück zum Zitat Wang, Y., Ma, E.W., Chow, T.W., Tsui, K.L.: A two-step parametric method for failure prediction in hard disk drives. IEEE Trans. Ind. Inform. 10, 419–430 (2014)CrossRef Wang, Y., Ma, E.W., Chow, T.W., Tsui, K.L.: A two-step parametric method for failure prediction in hard disk drives. IEEE Trans. Ind. Inform. 10, 419–430 (2014)CrossRef
25.
Zurück zum Zitat Wang, Y., Miao, Q., Ma, E.W., Tsui, K.L., Pecht, M.G.: Online anomaly detection for hard disk drives based on Mahalanobis distance. IEEE Trans. Rel. 62, 136–145 (2013)CrossRef Wang, Y., Miao, Q., Ma, E.W., Tsui, K.L., Pecht, M.G.: Online anomaly detection for hard disk drives based on Mahalanobis distance. IEEE Trans. Rel. 62, 136–145 (2013)CrossRef
26.
Zurück zum Zitat Wang, Y., Miao, Q., Pecht, M.: Health monitoring of hard disk drive based on Mahalanobis distance. In: Proceedings of PHM-Shenzhen, pp. 1–8 (2011) Wang, Y., Miao, Q., Pecht, M.: Health monitoring of hard disk drive based on Mahalanobis distance. In: Proceedings of PHM-Shenzhen, pp. 1–8 (2011)
27.
Zurück zum Zitat Xiao, J., Xiong, Z., Wu, S., Yi, Y., Jin, H., Hu, K.: Disk failure prediction in data centers via online learning. In: Proceedings of ICPP, p. 35 (2018) Xiao, J., Xiong, Z., Wu, S., Yi, Y., Jin, H., Hu, K.: Disk failure prediction in data centers via online learning. In: Proceedings of ICPP, p. 35 (2018)
28.
Zurück zum Zitat Xu, C., Wang, G., Liu, X., Guo, D., Liu, T.Y.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65(11), 3502–3508 (2016)MathSciNetCrossRef Xu, C., Wang, G., Liu, X., Guo, D., Liu, T.Y.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65(11), 3502–3508 (2016)MathSciNetCrossRef
29.
Zurück zum Zitat Xu, Y., et al.: Improving service availability of cloud systems by predicting disk error. In: Proceedings of USENIX ATC, pp. 481–494 (2018) Xu, Y., et al.: Improving service availability of cloud systems by predicting disk error. In: Proceedings of USENIX ATC, pp. 481–494 (2018)
31.
Zurück zum Zitat Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., Ma, J.: Proactive drive failure prediction for large scale storage systems. In: Proceedings of MSST, pp. 1–5 (2013) Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., Ma, J.: Proactive drive failure prediction for large scale storage systems. In: Proceedings of MSST, pp. 1–5 (2013)
Metadaten
Titel
Predicting Hard Drive Failures for Cloud Storage Systems
verfasst von
Dongshi Liu
Bo Wang
Peng Li
Rebecca J. Stones
Trent G. Marbach
Gang Wang
Xiaoguang Liu
Zhongwei Li
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-38991-8_25