Skip to main content

2017 | OriginalPaper | Buchkapitel

Self-tuning Filers — Overload Prediction and Preventive Tuning Using Pruned Random Forest

verfasst von : Kumar Dheenadayalan, Gopalakrishnan Srinivasaraghavan, V. N. Muralidhara

Erschienen in: Advances in Knowledge Discovery and Data Mining

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The holy-grail of large complex storage systems in enterprises today is for these systems to be self-governing. We propose a self-tuning scheme for large storage filers, on which very little work has been done in the past. Our system uses the performance counters generated by a filer to assess its health in real-time and modify the workload and/or tune the system parameters for optimizing the operational metrics. We use a Pruned Random Forest based solution to predict overload in real-time — the model is run on every snapshot of counter values. Large number of trees in a random forest model has an immediate adverse effect on the time to take a decision. A large random forest is therefore not viable in a real-time scenario. Our solution uses a pruned random forest that performs as well as the original forest. A saliency analysis is carried out to identify components of the system that require tuning in case an overload situation is predicted. This allows us to initiate some ‘action’ on the bottleneck components. The ‘action’ we have explored in our experiments is ‘throttling’ the bottleneck component to prevent overload situations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Almuallim, H., Dietterich, T.G.: Learning boolean concepts in the presence of many irrelevant features. Artif. Intell. 69(1–2), 279–305 (1994)MathSciNetCrossRefMATH Almuallim, H., Dietterich, T.G.: Learning boolean concepts in the presence of many irrelevant features. Artif. Intell. 69(1–2), 279–305 (1994)MathSciNetCrossRefMATH
3.
Zurück zum Zitat Contributions, M.K.: caret: Classification and Regression Training, r package version 5.15-044 (2012) Contributions, M.K.: caret: Classification and Regression Training, r package version 5.15-044 (2012)
4.
Zurück zum Zitat Dheenadayalan, K., Muralidhara, V.N., Datla, P., Srinivasaraghavan, G., Shah, M.: Premonition of storage response class using skyline ranked ensemble method. In: 2014 21st International Conference on High Performance Computing (HiPC), pp. 1–10, December 2014 Dheenadayalan, K., Muralidhara, V.N., Datla, P., Srinivasaraghavan, G., Shah, M.: Premonition of storage response class using skyline ranked ensemble method. In: 2014 21st International Conference on High Performance Computing (HiPC), pp. 1–10, December 2014
5.
6.
Zurück zum Zitat Fawagreh, K., Gaber, M.M., Elyan, E.: On extreme pruning of random forest ensembles for real-time predictive applications. CoRR abs/1503.04996 (2015) Fawagreh, K., Gaber, M.M., Elyan, E.: On extreme pruning of random forest ensembles for real-time predictive applications. CoRR abs/1503.04996 (2015)
7.
Zurück zum Zitat Ganapathi, A.S.: Predicting and Optimizing System Utilization and Performance via Statistical Machine Learning. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2009 Ganapathi, A.S.: Predicting and Optimizing System Utilization and Performance via Statistical Machine Learning. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2009
8.
Zurück zum Zitat Ganger, G.R., Strunk, J.D., Klosterman, A.J.: Self-*storage: Brick-based storage with automated administration. Technical report, Carnegie Mellon University, School of Computer Science, Technical report (2003) Ganger, G.R., Strunk, J.D., Klosterman, A.J.: Self-*storage: Brick-based storage with automated administration. Technical report, Carnegie Mellon University, School of Computer Science, Technical report (2003)
9.
Zurück zum Zitat Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29–43. ACM (2003) Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, pp. 29–43. ACM (2003)
10.
Zurück zum Zitat Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc. (2000) Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc. (2000)
11.
Zurück zum Zitat Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives, pp. 202–209. Morgan Kaufmann Publishers Inc. (2001) Hamerly, G., Elkan, C.: Bayesian approaches to failure prediction for disk drives, pp. 202–209. Morgan Kaufmann Publishers Inc. (2001)
12.
Zurück zum Zitat Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRefMATH Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRefMATH
13.
Zurück zum Zitat Lee, E.K.: Performance Modeling and Analysis of Disk Arrays. Ph.D. thesis, EECS Department, University of California, Berkeley, August 1993 Lee, E.K.: Performance Modeling and Analysis of Disk Arrays. Ph.D. thesis, EECS Department, University of California, Berkeley, August 1993
14.
Zurück zum Zitat Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002) Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
15.
Zurück zum Zitat Martinez-Munoz, G., Hernandez-Lobato, D., Suarez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Patt. Anal. Mach. Intell. 31(2), 245–259 (2009)CrossRef Martinez-Munoz, G., Hernandez-Lobato, D., Suarez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Patt. Anal. Mach. Intell. 31(2), 245–259 (2009)CrossRef
16.
Zurück zum Zitat Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)MathSciNetMATH Murray, J.F., Hughes, G.F., Kreutz-Delgado, K.: Machine learning methods for predicting failures in hard drives: a multiple-instance application. J. Mach. Learn. Res. 6, 783–816 (2005)MathSciNetMATH
18.
Zurück zum Zitat Opitz, D.W.: Feature selection for ensembles. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence, pp. 379–384. American Association for Artificial Intelligence (1999) Opitz, D.W.: Feature selection for ensembles. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence, pp. 379–384. American Association for Artificial Intelligence (1999)
19.
Zurück zum Zitat Pollack, K.T., Uttamchandani, S.M.: Genesis: a scalable self-evolving performance management framework for storage systems. In: 26th IEEE International Conference on Distributed Computing Systems, p. 33 (2006) Pollack, K.T., Uttamchandani, S.M.: Genesis: a scalable self-evolving performance management framework for storage systems. In: 26th IEEE International Conference on Distributed Computing Systems, p. 33 (2006)
20.
Zurück zum Zitat Powers, D.M.W.: Evaluation: from precision, recall and f-measure to roc., informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)MathSciNet Powers, D.M.W.: Evaluation: from precision, recall and f-measure to roc., informedness, markedness & correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)MathSciNet
21.
Zurück zum Zitat Schwing, A.G., Zach, C., Zheng, Y., Pollefeys, M.: Adaptive random forest - how many “experts” to ask before making a decision? In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1377–1384. IEEE Computer Society (2011) Schwing, A.G., Zach, C., Zheng, Y., Pollefeys, M.: Adaptive random forest - how many “experts” to ask before making a decision? In: Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1377–1384. IEEE Computer Society (2011)
22.
Zurück zum Zitat Tamon, C., Xiang, J.: On the boosting pruning problem. In: López de Mántaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 404–412. Springer, Heidelberg (2000). doi:10.1007/3-540-45164-1_41 CrossRef Tamon, C., Xiang, J.: On the boosting pruning problem. In: López de Mántaras, R., Plaza, E. (eds.) ECML 2000. LNCS (LNAI), vol. 1810, pp. 404–412. Springer, Heidelberg (2000). doi:10.​1007/​3-540-45164-1_​41 CrossRef
23.
Zurück zum Zitat Tang, H., Gulbeden, A., Zhou, J., Strathearn, W., Yang, T., Chu, L.: A self-organizing storage cluster for parallel data-intensive applications. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 52. IEEE Computer Society (2004) Tang, H., Gulbeden, A., Zhou, J., Strathearn, W., Yang, T., Chu, L.: A self-organizing storage cluster for parallel data-intensive applications. In: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, p. 52. IEEE Computer Society (2004)
24.
Zurück zum Zitat Tsoumakas, G., Partalas, I., Vlahavas, I.: An ensemble pruning primer. In: Okun, O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble Methods. SCI, vol. 245, pp. 1–13. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03999-7_1 CrossRef Tsoumakas, G., Partalas, I., Vlahavas, I.: An ensemble pruning primer. In: Okun, O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble Methods. SCI, vol. 245, pp. 1–13. Springer, Heidelberg (2009). doi:10.​1007/​978-3-642-03999-7_​1 CrossRef
25.
Zurück zum Zitat Zhu, Y., Jiang, H., Wang, J., Xian, F.: Hba: distributed metadata management for large cluster based storage systems. IEEE Trans. Parallel Distrib. Syst. 19(6), 750–763 (2008)CrossRef Zhu, Y., Jiang, H., Wang, J., Xian, F.: Hba: distributed metadata management for large cluster based storage systems. IEEE Trans. Parallel Distrib. Syst. 19(6), 750–763 (2008)CrossRef
Metadaten
Titel
Self-tuning Filers — Overload Prediction and Preventive Tuning Using Pruned Random Forest
verfasst von
Kumar Dheenadayalan
Gopalakrishnan Srinivasaraghavan
V. N. Muralidhara
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-57529-2_39