Skip to main content
Erschienen in: Cluster Computing 2/2019

21.03.2019

Failure prediction using machine learning in a virtualised HPC system and application

verfasst von: Bashir Mohammed, Irfan Awan, Hassan Ugail, Muhammad Younas

Erschienen in: Cluster Computing | Ausgabe 2/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Failure is an increasingly important issue in high performance computing and cloud systems. As large-scale systems continue to grow in scale and complexity, mitigating the impact of failure and providing accurate predictions with sufficient lead time remains a challenging research problem. Traditional existing fault-tolerance strategies such as regular check-pointing and replication are not adequate because of the emerging complexities of high performance computing systems. This necessitates the importance of having an effective as well as proactive failure management approach in place aimed at minimizing the effect of failure within the system. With the advent of machine learning techniques, the ability to learn from past information to predict future pattern of behaviours makes it possible to predict potential system failure more accurately. Thus, in this paper, we explore the predictive abilities of machine learning by applying a number of algorithms to improve the accuracy of failure prediction. We have developed a failure prediction model using time series and machine learning, and performed comparison based tests on the prediction accuracy. The primary algorithms we considered are the support vector machine (SVM), random forest (RF), k-nearest neighbors (KNN), classification and regression trees (CART) and linear discriminant analysis (LDA). Experimental results indicates that the average prediction accuracy of our model using SVM when predicting failure is 90% accurate and effective compared to other algorithms. This finding implies that our method can effectively predict all possible future system and application failures within the system.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Beaumont, O., Eyraud-Dubois, L., Lorenzo-Del-Castillo, J.A.: Analyzing real cluster data for formulating allocation algorithms in cloud platforms. Parallel Comput. 54, 83–96 (2016)MathSciNetCrossRef Beaumont, O., Eyraud-Dubois, L., Lorenzo-Del-Castillo, J.A.: Analyzing real cluster data for formulating allocation algorithms in cloud platforms. Parallel Comput. 54, 83–96 (2016)MathSciNetCrossRef
2.
Zurück zum Zitat Singh, K., Smallen, S., Tilak, S., Saul, L.: Failure analysis and prediction for the CIPRES science gateway Kritika. Concurr. Comput. Pract. Exp. 22(6), 685–701 (2016) Singh, K., Smallen, S., Tilak, S., Saul, L.: Failure analysis and prediction for the CIPRES science gateway Kritika. Concurr. Comput. Pract. Exp. 22(6), 685–701 (2016)
3.
Zurück zum Zitat Garraghan, P., Townend, P., Xu, J.: An empirical failure-analysis of a large-scale cloud computing environment. In: Proceedings of IEEE 15th International Symposium on High Assurance Systems Engineering HASE 2014, pp. 113–120 (2014) Garraghan, P., Townend, P., Xu, J.: An empirical failure-analysis of a large-scale cloud computing environment. In: Proceedings of IEEE 15th International Symposium on High Assurance Systems Engineering HASE 2014, pp. 113–120 (2014)
4.
Zurück zum Zitat Elliott, J., Kharbas, K., Fiala, D., Mueller, F., Ferreira, K., Engelmann, C.: Combining partial redundancy and checkpointing for HPC. In: Proceedings of International Conference on Distributed Computing Systems, pp. 615–626 (2012) Elliott, J., Kharbas, K., Fiala, D., Mueller, F., Ferreira, K., Engelmann, C.: Combining partial redundancy and checkpointing for HPC. In: Proceedings of International Conference on Distributed Computing Systems, pp. 615–626 (2012)
5.
Zurück zum Zitat Mohammed, B., Kiran, M., Maiyama, K.M., Kamala, M.M., Awan, I.-U.: Failover strategy for fault tolerance in cloud computing environment. Softw. Pract. Exp. 47(9), 1243–1247 (2017)CrossRef Mohammed, B., Kiran, M., Maiyama, K.M., Kamala, M.M., Awan, I.-U.: Failover strategy for fault tolerance in cloud computing environment. Softw. Pract. Exp. 47(9), 1243–1247 (2017)CrossRef
6.
Zurück zum Zitat Pantic, Z., Babar, M.: Guidelines for building a private cloud infrastructure. In: ITU Tech. Rep.—TR-2012-153TR-2012-153 (2012) Pantic, Z., Babar, M.: Guidelines for building a private cloud infrastructure. In: ITU Tech. Rep.—TR-2012-153TR-2012-153 (2012)
7.
Zurück zum Zitat Sefraoui, O., Aissaoui, M., Eleuldj, M.: Cloud computing migration and IT resources rationalization. In: International Conference on Multimedia Computing and Systems, pp. 1164–1168 (2014) Sefraoui, O., Aissaoui, M., Eleuldj, M.: Cloud computing migration and IT resources rationalization. In: International Conference on Multimedia Computing and Systems, pp. 1164–1168 (2014)
8.
Zurück zum Zitat Sen, A., Madria, S.: Off-line risk assessment of cloud service provider. In: 2014 IEEE World Congress on Services, pp. 58–65 (2014) Sen, A., Madria, S.: Off-line risk assessment of cloud service provider. In: 2014 IEEE World Congress on Services, pp. 58–65 (2014)
9.
Zurück zum Zitat Yadav, S.: Comparative study on open source software for cloud computing platform: eucalyptus. In: Openstack and Opennebula, Res. Inven. Int. J. Eng. Sci. vol. 3, no. 10, pp. 51–54 (2013) Yadav, S.: Comparative study on open source software for cloud computing platform: eucalyptus. In: Openstack and Opennebula, Res. Inven. Int. J. Eng. Sci. vol. 3, no. 10, pp. 51–54 (2013)
10.
Zurück zum Zitat Bontempi, G., Ben Taieb, S., Le Borgne, Y.A.: Machine learning strategies for time series forecasting. In: Lecture Notes in Business Information Processing (LNBIP), vol. 138, pp. 62–77 (2013) Bontempi, G., Ben Taieb, S., Le Borgne, Y.A.: Machine learning strategies for time series forecasting. In: Lecture Notes in Business Information Processing (LNBIP), vol. 138, pp. 62–77 (2013)
11.
Zurück zum Zitat Chigurupati, A., Thibaux, R., Lassar, N.: Predicting hardware failure using machine learning. In: 2016 Annual Reliability and Maintainability Symposium, p. 16 (2016) Chigurupati, A., Thibaux, R., Lassar, N.: Predicting hardware failure using machine learning. In: 2016 Annual Reliability and Maintainability Symposium, p. 16 (2016)
12.
Zurück zum Zitat Fulp, E., Fink, G., Haack, J.: Predicting computer system failures using support vector machines. In: Proceedings of First USENIX Conference Anal. Syst. logs, p. 55 (2008) Fulp, E., Fink, G., Haack, J.: Predicting computer system failures using support vector machines. In: Proceedings of First USENIX Conference Anal. Syst. logs, p. 55 (2008)
13.
Zurück zum Zitat Schroeder, B., Gibson, G.: A large-scale study of failures in high-performance computing systems. IEEE Trans. Dependable Secur. Comput. 7(4), 337–350 (2010)CrossRef Schroeder, B., Gibson, G.: A large-scale study of failures in high-performance computing systems. IEEE Trans. Dependable Secur. Comput. 7(4), 337–350 (2010)CrossRef
14.
Zurück zum Zitat Sahoo, R.K., Squillante, M.S., Sivasubramaniam, A., Zhang, Y.Z.Y.: Failure data analysis of a large-scale heterogeneous server environment. Int. Conf. Dependable Syst. Netw. 2004, 110 (2004) Sahoo, R.K., Squillante, M.S., Sivasubramaniam, A., Zhang, Y.Z.Y.: Failure data analysis of a large-scale heterogeneous server environment. Int. Conf. Dependable Syst. Netw. 2004, 110 (2004)
15.
Zurück zum Zitat Vishwanath, K.V., Nagappan, N.: Characterizing cloud computing hardware reliability. In: Proceedings of the 1st ACM symposium on cloud computing–SoCC 10, p. 193 (2010) Vishwanath, K.V., Nagappan, N.: Characterizing cloud computing hardware reliability. In: Proceedings of the 1st ACM symposium on cloud computing–SoCC 10, p. 193 (2010)
16.
Zurück zum Zitat Kavulya, S., Tany, J., Gandhi, R., Narasimhan, P.: An analysis of traces from a production MapReduce cluster. In: CCGrid 2010—10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp. 94–103 (2010) Kavulya, S., Tany, J., Gandhi, R., Narasimhan, P.: An analysis of traces from a production MapReduce cluster. In: CCGrid 2010—10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, pp. 94–103 (2010)
17.
Zurück zum Zitat Abu-Samah, A., Shahzad, M. K., Zamai, E., Ben Said, A.: Failure prediction methodology for improved proactive maintenance using Bayesian approach. In: IFAC Proceedings, vol. 48, no. 21, pp. 844–851 (2015) Abu-Samah, A., Shahzad, M. K., Zamai, E., Ben Said, A.: Failure prediction methodology for improved proactive maintenance using Bayesian approach. In: IFAC Proceedings, vol. 48, no. 21, pp. 844–851 (2015)
18.
Zurück zum Zitat Khan, A., Bussone, B., Richards, J., Miguel, A.: A practical approach to hard disk failure prediction in cloud platforms. In: 2016 IEEE Second International Conference on Big Data Computing Service and Applications, pp. 105–116 (2016) Khan, A., Bussone, B., Richards, J., Miguel, A.: A practical approach to hard disk failure prediction in cloud platforms. In: 2016 IEEE Second International Conference on Big Data Computing Service and Applications, pp. 105–116 (2016)
19.
Zurück zum Zitat Thomas, G.H., Gungl, K.P.: Patent US9319030—integrated circuit failure prediction using clock duty cycle recording (2016) Thomas, G.H., Gungl, K.P.: Patent US9319030—integrated circuit failure prediction using clock duty cycle recording (2016)
20.
Zurück zum Zitat Choi, J., Kim, Y.: Adaptive resource provisioning method using application-aware machine learning based on job history in heterogeneous infrastructures. Clust. Comput. 20(4), 35373549 (2017)CrossRef Choi, J., Kim, Y.: Adaptive resource provisioning method using application-aware machine learning based on job history in heterogeneous infrastructures. Clust. Comput. 20(4), 35373549 (2017)CrossRef
21.
Zurück zum Zitat Li, Z.: An adaptive overload threshold selection process using Markov decision processes of virtual machine in cloud data center. Cluster Comput. 1–13 (2018) Li, Z.: An adaptive overload threshold selection process using Markov decision processes of virtual machine in cloud data center. Cluster Comput. 1–13 (2018)
22.
Zurück zum Zitat Jayanthi, R., Florence, L.: Software defect prediction techniques using metrics based on neural network classifier. Cluster Comput. 1–12 (2018) Jayanthi, R., Florence, L.: Software defect prediction techniques using metrics based on neural network classifier. Cluster Comput. 1–12 (2018)
23.
Zurück zum Zitat Kumaresan, K., Ganeshkumar, P.: Software reliability modeling using increased failure interval. Clust. Comput. 1–18 (2018) Kumaresan, K., Ganeshkumar, P.: Software reliability modeling using increased failure interval. Clust. Comput. 1–18 (2018)
24.
Zurück zum Zitat Padhy, N., Singh, R.P., Satapathy, S.C.: Cost-effective and fault-resilient reusability prediction model by using adaptive genetic algorithm based neural network for web-of-service applications. Clust. Comput. 9, 1–23 (2018) Padhy, N., Singh, R.P., Satapathy, S.C.: Cost-effective and fault-resilient reusability prediction model by using adaptive genetic algorithm based neural network for web-of-service applications. Clust. Comput. 9, 1–23 (2018)
25.
Zurück zum Zitat Manjula, C., Florence, L.: Deep neural network based hybrid approach for software defect prediction using software metrics. Clust. Comput. 1–17 (2018) Manjula, C., Florence, L.: Deep neural network based hybrid approach for software defect prediction using software metrics. Clust. Comput. 1–17 (2018)
26.
Zurück zum Zitat Keke, G., Qiu, M., Elnagdy, S.A.: Security-aware information classifications using supervised learning for cloud-based cyber risk management in financial big data. In: 2016 IEEE 2nd International Conference on Big Data Security on Cloud, IEEE International Conference on High Performance and Smart Computing, IEEE International Conference on Intelligent Data and Security, pp. 197–202 (2016) Keke, G., Qiu, M., Elnagdy, S.A.: Security-aware information classifications using supervised learning for cloud-based cyber risk management in financial big data. In: 2016 IEEE 2nd International Conference on Big Data Security on Cloud, IEEE International Conference on High Performance and Smart Computing, IEEE International Conference on Intelligent Data and Security, pp. 197–202 (2016)
27.
Zurück zum Zitat Zhang, L., Rao, K., Wang, R., Jia, Y.: Risk prediction model based on improved AdaBoost method for cloud users. Open Cybern. Syst. J. 9, 44–49 (2015)CrossRef Zhang, L., Rao, K., Wang, R., Jia, Y.: Risk prediction model based on improved AdaBoost method for cloud users. Open Cybern. Syst. J. 9, 44–49 (2015)CrossRef
28.
Zurück zum Zitat Pop, D.: Machine learning and cloud computing: survey of distributed and SaaS solutions. Inst. e-Austria Timisoara, Tech. Rep 1 (2012) Pop, D.: Machine learning and cloud computing: survey of distributed and SaaS solutions. Inst. e-Austria Timisoara, Tech. Rep 1 (2012)
29.
Zurück zum Zitat Bsch, S., Nissen, V., Wnscher, A.: Automatic classification of data-warehouse-data for information lifecycle management using machine learning techniques. Inf. Syst. Front. 19(5), 1085–1099 (2016)CrossRef Bsch, S., Nissen, V., Wnscher, A.: Automatic classification of data-warehouse-data for information lifecycle management using machine learning techniques. Inf. Syst. Front. 19(5), 1085–1099 (2016)CrossRef
30.
Zurück zum Zitat Fall, D., Okuda, T., Kadobayashi, Y., Yamaguchi, S.: Risk adaptive authorization mechanism (RAdAM) for cloud computing. J. Inf. Process. 24(2), 371380 (2016) Fall, D., Okuda, T., Kadobayashi, Y., Yamaguchi, S.: Risk adaptive authorization mechanism (RAdAM) for cloud computing. J. Inf. Process. 24(2), 371380 (2016)
31.
Zurück zum Zitat Guo, C., Liu, Y., Huang, M.: Obtaining evidence model of an expert system based on machine learning in cloud environment. J. Internet Technol. 16(7), 13391349 (2015) Guo, C., Liu, Y., Huang, M.: Obtaining evidence model of an expert system based on machine learning in cloud environment. J. Internet Technol. 16(7), 13391349 (2015)
32.
Zurück zum Zitat Amin, Z., Sethi, N., Singh, H.: Review on fault tolerance techniques in cloud computing. Int. J. Comput. Appl. 116(18), 1117 (2015) Amin, Z., Sethi, N., Singh, H.: Review on fault tolerance techniques in cloud computing. Int. J. Comput. Appl. 116(18), 1117 (2015)
33.
Zurück zum Zitat Pellegrini, A., Di Sanzo, P., Avresky, D.R.: Proactive cloud management for highly heterogeneous multi-cloud infrastructures. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1311–1318 (2016) Pellegrini, A., Di Sanzo, P., Avresky, D.R.: Proactive cloud management for highly heterogeneous multi-cloud infrastructures. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1311–1318 (2016)
34.
Zurück zum Zitat Thakur, K.S.S.P.P, Godavarthi, T.R.: 10.1.1.416.6042. vol. 3, no. 6, pp. 698–703 (2013) Thakur, K.S.S.P.P, Godavarthi, T.R.: 10.1.1.416.6042. vol. 3, no. 6, pp. 698–703 (2013)
35.
Zurück zum Zitat Shen, C., Tong, W., Choo, K. K. R., Kausar, S.: Performance prediction of parallel computing models to analyze cloud-based big data applications. Clust. Comput. pp. 1–16 (2017) Shen, C., Tong, W., Choo, K. K. R., Kausar, S.: Performance prediction of parallel computing models to analyze cloud-based big data applications. Clust. Comput. pp. 1–16 (2017)
36.
Zurück zum Zitat Kwon, D., Kim, H., Kim, J., Suh, S. C., Kim, I., Kim, K. J.: A survey of deep learning-based network anomaly detection. Clust. Comput. pp. 1–13 (2017) Kwon, D., Kim, H., Kim, J., Suh, S. C., Kim, I., Kim, K. J.: A survey of deep learning-based network anomaly detection. Clust. Comput. pp. 1–13 (2017)
37.
Zurück zum Zitat Muthusankar, D., Kalaavathi, B., Kaladevi, P.: High performance feature selection algorithms using filter method for cloud-based recommendation system. Clust. Comput. 0(i), 1–12 (2018) Muthusankar, D., Kalaavathi, B., Kaladevi, P.: High performance feature selection algorithms using filter method for cloud-based recommendation system. Clust. Comput. 0(i), 1–12 (2018)
38.
Zurück zum Zitat Madni, S.H.H., Latiff, M.S.A., Coulibaly, Y., Abdulhamid, S.M.: Recent advancements in resource allocation techniques for cloud computing environment: a systematic review. Clust. Comput. 20(3), 24892533 (2017)CrossRef Madni, S.H.H., Latiff, M.S.A., Coulibaly, Y., Abdulhamid, S.M.: Recent advancements in resource allocation techniques for cloud computing environment: a systematic review. Clust. Comput. 20(3), 24892533 (2017)CrossRef
39.
Zurück zum Zitat Schroeder, B., Gibson, G.: The computer failure data repository (CFDR): collecting, sharing and analyzing failure data. In: SC 06 Proceedings of 2006 ACM/IEEE Conference Supercomputing, March, p. 154 (2006) Schroeder, B., Gibson, G.: The computer failure data repository (CFDR): collecting, sharing and analyzing failure data. In: SC 06 Proceedings of 2006 ACM/IEEE Conference Supercomputing, March, p. 154 (2006)
40.
Zurück zum Zitat Schroeder, B., Gibson, G.: The computer failure data repository (CFDR). In: Workshop on Reliability Analysis of System Failure Data (RAF’07), MSR Cambridge, p. 6 (2007) Schroeder, B., Gibson, G.: The computer failure data repository (CFDR). In: Workshop on Reliability Analysis of System Failure Data (RAF’07), MSR Cambridge, p. 6 (2007)
41.
Zurück zum Zitat Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)CrossRef Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)CrossRef
42.
Zurück zum Zitat Medeiros, M.C., Veiga, A., Resende, M.G.C.: A combinatorial approach to piecewise linear time series analysis. J. Comput. Graph. Stat. 11(1), 236–258 (2002)MathSciNetCrossRef Medeiros, M.C., Veiga, A., Resende, M.G.C.: A combinatorial approach to piecewise linear time series analysis. J. Comput. Graph. Stat. 11(1), 236–258 (2002)MathSciNetCrossRef
43.
Zurück zum Zitat Zhou, Y.: Failure trend analysis using time series model. In: 2017 29th Chinese Control and Decision Conference, no. 1, pp. 859–862 (2017) Zhou, Y.: Failure trend analysis using time series model. In: 2017 29th Chinese Control and Decision Conference, no. 1, pp. 859–862 (2017)
44.
Zurück zum Zitat Ho, S., Xie, M., Goh, T.: A comparative study of neural network and Box-Jenkins ARIMA modeling in time series prediction. Comput. Ind. Eng. 42(24), 371–375 (2002)CrossRef Ho, S., Xie, M., Goh, T.: A comparative study of neural network and Box-Jenkins ARIMA modeling in time series prediction. Comput. Ind. Eng. 42(24), 371–375 (2002)CrossRef
45.
Zurück zum Zitat Casalicchio, E.: A study on performance measures for auto-scaling CPU-intensive containerized applications. Clust. Comput. 1–12 (2019) Casalicchio, E.: A study on performance measures for auto-scaling CPU-intensive containerized applications. Clust. Comput. 1–12 (2019)
46.
Zurück zum Zitat Nussbaum, L., Anhalt, F., Mornard, O., Gelas, J., Nussbaum, L., Anhalt, F., Mornard, O., Linux-based, J. G., Nussbaum, L., Mornard, O.: Linux-based virtualization for HPC clusters. In: Montreal Linux Symposium (2009) Nussbaum, L., Anhalt, F., Mornard, O., Gelas, J., Nussbaum, L., Anhalt, F., Mornard, O., Linux-based, J. G., Nussbaum, L., Mornard, O.: Linux-based virtualization for HPC clusters. In: Montreal Linux Symposium (2009)
47.
Zurück zum Zitat Benedicic, L., Cruz, F.A., Madonna, A., Mariotti, K.: Portable, High-Performance Containers for HPC. Cornell University, Ithaca (2017) Benedicic, L., Cruz, F.A., Madonna, A., Mariotti, K.: Portable, High-Performance Containers for HPC. Cornell University, Ithaca (2017)
48.
Zurück zum Zitat Nanda, S., Hacker, T.J.: Racc: resource-aware container consolidation using a deep learning approach. In: Proceedings of First Workshop on Machine Learning Computing System— MLCS18, pp. 1–5 (2018) Nanda, S., Hacker, T.J.: Racc: resource-aware container consolidation using a deep learning approach. In: Proceedings of First Workshop on Machine Learning Computing System— MLCS18, pp. 1–5 (2018)
50.
Zurück zum Zitat Dwyer, T., Fedorova, A., Blagodurov, S., Roth, M., Gaud, F., Pei, J.: A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) (2012) Dwyer, T., Fedorova, A., Blagodurov, S., Roth, M., Gaud, F., Pei, J.: A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC) (2012)
51.
Zurück zum Zitat Buyya, R., Ranjan, R., Calheiros, R.N.: Modeling and simulation of scalable cloud computing environments and the cloudsim toolkit: challenges and opportunities. In: Proceedings of 2009 International Conference on High Performance Computing Simulation, HPCS 2009, pp. 1–11 (2009) Buyya, R., Ranjan, R., Calheiros, R.N.: Modeling and simulation of scalable cloud computing environments and the cloudsim toolkit: challenges and opportunities. In: Proceedings of 2009 International Conference on High Performance Computing Simulation, HPCS 2009, pp. 1–11 (2009)
Metadaten
Titel
Failure prediction using machine learning in a virtualised HPC system and application
verfasst von
Bashir Mohammed
Irfan Awan
Hassan Ugail
Muhammad Younas
Publikationsdatum
21.03.2019
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 2/2019
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-019-02917-1

Weitere Artikel der Ausgabe 2/2019

Cluster Computing 2/2019 Zur Ausgabe