Skip to main content
Top

2020 | OriginalPaper | Chapter

To Fail or Not to Fail: Predicting Hard Disk Drive Failure Time Windows

Authors : Marwin Züfle, Christian Krupitzer, Florian Erhard, Johannes Grohmann, Samuel Kounev

Published in: Measurement, Modelling and Evaluation of Computing Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Due to the increasing size of today’s data centers as well as the expectation of 24/7 availability, the complexity in the administration of hardware continuously increases. Techniques as the Self-Monitoring, Analysis, and Reporting Technology (S.M.A.R.T.) support the monitoring of the hardware. However, those techniques often lack algorithms for intelligent data analytics. Especially, the integration of machine learning to identify potential failures in advance seems to be promising to reduce administration overhead. In this work, we present three machine learning approaches to (i) identify imminent failures, (ii) predict time windows for failures, as well as (iii) predict the exact time-to-failure. In a case study with real data from 369 hard disks, we achieve an F1-score of up to 98.0% and 97.6% for predicting potential failures with two or multiple time windows, respectively, and a hit rate of 84.9% (with a mean absolute error of 4.5 h) for predicting the time-to-failure.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Weak learners are classification methods that correlate rather weakly with the true classification, while strong learners correlate very well with the true classification.
 
2
Sampling with replacement means that instances can be selected multiple times in the same sample.
 
3
The handling of multiple classes is not explicitly required for comparing data preparation for binary classification. However, it is necessary for multi-class classification in Sect. 4.2 and to maintain comparability between the approaches.
 
4
This does not apply if the data set is sufficiently large to still be large enough after undersampling.
 
Literature
1.
go back to reference Aussel, N., Jaulin, S., Gandon, G., Petetin, Y., Fazli, E., Chabridon, S.: Predictive models of hard drive failures based on operational data. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 619–625. IEEE (2017) Aussel, N., Jaulin, S., Gandon, G., Petetin, Y., Fazli, E., Chabridon, S.: Predictive models of hard drive failures based on operational data. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 619–625. IEEE (2017)
2.
go back to reference Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 39–48. ACM (2016) Botezatu, M.M., Giurgiu, I., Bogojeska, J., Wiesmann, D.: Predicting disk replacement towards reliable data centers. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 39–48. ACM (2016)
3.
go back to reference Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MATH Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)MATH
6.
go back to reference Cao, H., Li, X.L., Woon, D.Y.K., Ng, S.K.: Integrated oversampling for imbalanced time series classification. IEEE Trans. Knowl. Data Eng. 25(12), 2809–2822 (2013)CrossRef Cao, H., Li, X.L., Woon, D.Y.K., Ng, S.K.: Integrated oversampling for imbalanced time series classification. IEEE Trans. Knowl. Data Eng. 25(12), 2809–2822 (2013)CrossRef
7.
go back to reference Chaves, I.C., de Paula, M.R.P., Leite, L.G., Gomes, J.P.P., Machado, J.C.: Hard disk drive failure prediction method based on a Bayesian network. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2018) Chaves, I.C., de Paula, M.R.P., Leite, L.G., Gomes, J.P.P., Machado, J.C.: Hard disk drive failure prediction method based on a Bayesian network. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–7. IEEE (2018)
8.
go back to reference Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRef
11.
go back to reference Hamerly, G., Elkan, C., et al.: Bayesian approaches to failure prediction for disk drives. In: ICML, vol. 1, pp. 202–209 (2001) Hamerly, G., Elkan, C., et al.: Bayesian approaches to failure prediction for disk drives. In: ICML, vol. 1, pp. 202–209 (2001)
12.
go back to reference Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995) Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995)
13.
go back to reference Krupitzer, C., Roth, F.M., VanSyckel, S., Schiele, G., Becker, C.: A survey on engineering approaches for self-adaptive systems. Pervasive Mob. Comput. J. 17(Part B), 184–206 (2015)CrossRef Krupitzer, C., Roth, F.M., VanSyckel, S., Schiele, G., Becker, C.: A survey on engineering approaches for self-adaptive systems. Pervasive Mob. Comput. J. 17(Part B), 184–206 (2015)CrossRef
14.
go back to reference Li, J., et al.: Hard drive failure prediction using classification and regression trees. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 383–394. IEEE (2014) Li, J., et al.: Hard drive failure prediction using classification and regression trees. In: 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 383–394. IEEE (2014)
15.
go back to reference Li, J., Stones, R.J., Wang, G., Liu, X., Li, Z., Xu, M.: Hard drive failure prediction using decision trees. Reliab. Eng. Syst. Saf. 164, 55–65 (2017)CrossRef Li, J., Stones, R.J., Wang, G., Liu, X., Li, Z., Xu, M.: Hard drive failure prediction using decision trees. Reliab. Eng. Syst. Saf. 164, 55–65 (2017)CrossRef
16.
go back to reference Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6(May), 783–816 (2005)MathSciNetMATH Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6(May), 783–816 (2005)MathSciNetMATH
17.
go back to reference Ottem, E., Plummer, J.: Playing it smart: The emergence of reliability prediction technology. Technical report, Seagate Technology Paper (1995) Ottem, E., Plummer, J.: Playing it smart: The emergence of reliability prediction technology. Technical report, Seagate Technology Paper (1995)
18.
go back to reference Pinheiro, E., Weber, W.D., Barroso, L.A.: Failure trends in a large disk drive population. In: 5th USENIX Conference on File and Storage Technologies (FAST 2007), pp. 17–29 (2007) Pinheiro, E., Weber, W.D., Barroso, L.A.: Failure trends in a large disk drive population. In: 5th USENIX Conference on File and Storage Technologies (FAST 2007), pp. 17–29 (2007)
19.
go back to reference Pitakrat, T., Van Hoorn, A., Grunske, L.: A comparison of machine learning algorithms for proactive hard disk drive failure detection. In: Proceedings of the 4th International ACM SIGSOFT Symposium on Architecting Critical Systems, pp. 1–10. ACM (2013) Pitakrat, T., Van Hoorn, A., Grunske, L.: A comparison of machine learning algorithms for proactive hard disk drive failure detection. In: Proceedings of the 4th International ACM SIGSOFT Symposium on Architecting Critical Systems, pp. 1–10. ACM (2013)
20.
go back to reference dos Santos Lima, F.D., Pereira, F.L.F., Chaves, I.C., Gomes, J.P.P., de Castro Machado, J.: Evaluation of recurrent neural networks for hard disk drives failure prediction. In: 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pp. 85–90. IEEE (2018) dos Santos Lima, F.D., Pereira, F.L.F., Chaves, I.C., Gomes, J.P.P., de Castro Machado, J.: Evaluation of recurrent neural networks for hard disk drives failure prediction. In: 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), pp. 85–90. IEEE (2018)
21.
go back to reference Seagate Product Marketing: Get S.M.A.R.T. for reliability. Technical report, Seagate Technology Paper (1999) Seagate Product Marketing: Get S.M.A.R.T. for reliability. Technical report, Seagate Technology Paper (1999)
22.
go back to reference Shen, J., Wan, J., Lim, S.J., Yu, L.: Random-forest-based failure prediction for hard disk drives. Int. J. Distrib. Sens. Netw. 14(11), 1550147718806480 (2018)CrossRef Shen, J., Wan, J., Lim, S.J., Yu, L.: Random-forest-based failure prediction for hard disk drives. Int. J. Distrib. Sens. Netw. 14(11), 1550147718806480 (2018)CrossRef
23.
go back to reference Sun, X., et al.: System-level hardware failure prediction using deep learning. In: Proceedings of the 56th Annual Design Automation Conference 2019, p. 20. ACM (2019) Sun, X., et al.: System-level hardware failure prediction using deep learning. In: Proceedings of the 56th Annual Design Automation Conference 2019, p. 20. ACM (2019)
24.
go back to reference Wang, Y., Ma, E.W., Chow, T.W., Tsui, K.L.: A two-step parametric method for failure prediction in hard disk drives. IEEE Trans. Industr. Inf. 10(1), 419–430 (2013)CrossRef Wang, Y., Ma, E.W., Chow, T.W., Tsui, K.L.: A two-step parametric method for failure prediction in hard disk drives. IEEE Trans. Industr. Inf. 10(1), 419–430 (2013)CrossRef
25.
go back to reference Xiao, J., Xiong, Z., Wu, S., Yi, Y., Jin, H., Hu, K.: Disk failure prediction in data centers via online learning. In: Proceedings of the 47th International Conference on Parallel Processing, p. 35. ACM (2018) Xiao, J., Xiong, Z., Wu, S., Yi, Y., Jin, H., Hu, K.: Disk failure prediction in data centers via online learning. In: Proceedings of the 47th International Conference on Parallel Processing, p. 35. ACM (2018)
26.
go back to reference Xu, C., Wang, G., Liu, X., Guo, D., Liu, T.Y.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65(11), 3502–3508 (2016)MathSciNetCrossRef Xu, C., Wang, G., Liu, X., Guo, D., Liu, T.Y.: Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Trans. Comput. 65(11), 3502–3508 (2016)MathSciNetCrossRef
27.
go back to reference Yang, W., Hu, D., Liu, Y., Wang, S., Jiang, T.: Hard drive failure prediction using big data. In: 2015 IEEE 34th Symposium on Reliable Distributed Systems Workshop (SRDSW), pp. 13–18. IEEE (2015) Yang, W., Hu, D., Liu, Y., Wang, S., Jiang, T.: Hard drive failure prediction using big data. In: 2015 IEEE 34th Symposium on Reliable Distributed Systems Workshop (SRDSW), pp. 13–18. IEEE (2015)
29.
go back to reference Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., Ma, J.: Proactive drive failure prediction for large scale storage systems. In: 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–5. IEEE (2013) Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., Ma, J.: Proactive drive failure prediction for large scale storage systems. In: 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–5. IEEE (2013)
30.
go back to reference Züfle, M., et al.: Autonomic forecasting method selection: examination and ways ahead. In: Proceedings of the 16th IEEE International Conference on Autonomic Computing (ICAC). IEEE (2019) Züfle, M., et al.: Autonomic forecasting method selection: examination and ways ahead. In: Proceedings of the 16th IEEE International Conference on Autonomic Computing (ICAC). IEEE (2019)
Metadata
Title
To Fail or Not to Fail: Predicting Hard Disk Drive Failure Time Windows
Authors
Marwin Züfle
Christian Krupitzer
Florian Erhard
Johannes Grohmann
Samuel Kounev
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-43024-5_2

Premium Partner