Skip to main content
Top
Published in: Journal of Network and Systems Management 3/2017

25-01-2017

Decreasing the Management Burden in Multi-tier Systems Through Partial Correlation-Based Monitoring

Authors: Otto J. A. Pinno, Sand L. Correa, Aldri L. dos Santos, Kleber V. Cardoso

Published in: Journal of Network and Systems Management | Issue 3/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Modern web applications often consist of hundreds of services distributed in different servers or tiers. On one hand, this architecture may provide easy abstraction and modularity for software development and reuse. On the other hand, such architecture makes difficult to predict the behavior of the systems, as each tier has its own functionality, configuration, and demands for computing resources. Thus, anomaly detection becomes an important aspect for the management and operation of multi-tier web systems. In order to track their operation and aid on their behavior analysis, web systems expose numerous metrics in all the tiers. However, collecting and analyzing all available metrics reduces the system performance due to a non-negligible overhead on communication, storage, and processing. Another concern is the nature of the workload of these systems, which may fluctuate widely over time. One of the approaches to support anomaly detection in web systems is to use stable correlations among monitoring metrics. This approach, called correlation-based monitoring, does not require any deep understanding about the system internals or metric semantic, and also does not demand the existence of data about the faults. In addition, as only the metrics involved in stable correlations are periodically collected, the monitoring overhead is reduced. Stable correlations also have the desired property of holding for long period of time before becoming invalid due to workload fluctuations. The challenge, however, is to identify the stable correlations. In this work, we address this challenge by proposing three novel strategies based on partial correlation, a statistical tool commonly employed to summarize the relevant information of complex systems. We evaluate our strategies using traces obtained from an e-commerce, web transaction benchmark deployed in our testbed. Results show that our best strategy allows the construction of a monitoring network with less metrics than a state-of-the-art solution while achieving larger fault coverage. They also show that the correlations are reasonably stable, and the models can be applied for sufficiently long periods of time (at least 50 times the training time) before they become invalid.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Huang, D., He, B., Miao, C.: A survey of resource management in multi-tier Web applications. Commun. Surv. Tutor. IEEE 16(3), 1574–1590 (2014)CrossRef Huang, D., He, B., Miao, C.: A survey of resource management in multi-tier Web applications. Commun. Surv. Tutor. IEEE 16(3), 1574–1590 (2014)CrossRef
3.
go back to reference Ghanbari, S., Soundararajan, G., Amza, C.: A query language and runtime tool for evaluating behavior of multi-tier servers. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 131–142 (2010) Ghanbari, S., Soundararajan, G., Amza, C.: A query language and runtime tool for evaluating behavior of multi-tier servers. In: Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp. 131–142 (2010)
4.
go back to reference Wang, C., Kavulya, S.P., Tan, J., Liting, H., Kutare, M., Kasick, M., Schwan, K., Narasimhan, P., Gandhi, R.: Performance troubleshooting in data centers: an annotated bibliography? SIGOPS Oper. Syst. Rev. 47(3), 50–62 (2013)CrossRef Wang, C., Kavulya, S.P., Tan, J., Liting, H., Kutare, M., Kasick, M., Schwan, K., Narasimhan, P., Gandhi, R.: Performance troubleshooting in data centers: an annotated bibliography? SIGOPS Oper. Syst. Rev. 47(3), 50–62 (2013)CrossRef
5.
go back to reference Wang, T., Wei, J., Zhang, W., Zhong, H., Huang, T.: Workload-aware anomaly detection for Web applications. J. Syst. Softw. 89, 19–32 (2014)CrossRef Wang, T., Wei, J., Zhang, W., Zhong, H., Huang, T.: Workload-aware anomaly detection for Web applications. J. Syst. Softw. 89, 19–32 (2014)CrossRef
7.
go back to reference Avizienis, A., Laprie, J.-C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable secur. Comput. 1(1), 11–33 (2004)CrossRef Avizienis, A., Laprie, J.-C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable secur. Comput. 1(1), 11–33 (2004)CrossRef
8.
go back to reference Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15:1–15:58 (2009)CrossRef Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 15:1–15:58 (2009)CrossRef
9.
go back to reference Oppenheimer, D., Ganapathi, A., Patterson, D.A.: Why do internet services fail, and what can be done about it? In: Proceedings of the 4th Conference on USENIX Symposium on Internet Technologies and Systems, vol 4, p. 1 (2003) Oppenheimer, D., Ganapathi, A., Patterson, D.A.: Why do internet services fail, and what can be done about it? In: Proceedings of the 4th Conference on USENIX Symposium on Internet Technologies and Systems, vol 4, p. 1 (2003)
10.
go back to reference Chen, M.Y., Accardi, A., Kiciman, E., Lloyd, J., Patterson, D., Fox, A., Brewer, E.: Path-based faliure and evolution management. In: Proceedings of the 1st Conference on Symposium on Networked Systems Design and Implementation, vol 1, NSDI’04, pp. 23–23 (2004) Chen, M.Y., Accardi, A., Kiciman, E., Lloyd, J., Patterson, D., Fox, A., Brewer, E.: Path-based faliure and evolution management. In: Proceedings of the 1st Conference on Symposium on Networked Systems Design and Implementation, vol 1, NSDI’04, pp. 23–23 (2004)
11.
go back to reference Rabl, T., Gómez-Villamor, S., Sadoghi, M., Muntés-Mulero, V., Jacobsen, H.-A., Mankovskii, S.: Solving big data challenges for enterprise application performance management. Proc. VLDB Endow. 5(12), 1724–1735 (2012)CrossRef Rabl, T., Gómez-Villamor, S., Sadoghi, M., Muntés-Mulero, V., Jacobsen, H.-A., Mankovskii, S.: Solving big data challenges for enterprise application performance management. Proc. VLDB Endow. 5(12), 1724–1735 (2012)CrossRef
12.
go back to reference Jiang, M., Munawar, M.A., Reidemeister, T., Ward, P.A.S.: System monitoring with metric-correlation models: problems and solutions. In: Proceedings of the 6th International Conference on Autonomic Computing, pp. 13–22 (2009) Jiang, M., Munawar, M.A., Reidemeister, T., Ward, P.A.S.: System monitoring with metric-correlation models: problems and solutions. In: Proceedings of the 6th International Conference on Autonomic Computing, pp. 13–22 (2009)
13.
go back to reference Jiang, G., Chen, H., Yoshihira, K.: Modeling and tracking of transaction flow dynamics for fault detection in complex systems. IEEE Trans. Dependable Secur. Comput. 3(4), 312–326 (2006)CrossRef Jiang, G., Chen, H., Yoshihira, K.: Modeling and tracking of transaction flow dynamics for fault detection in complex systems. IEEE Trans. Dependable Secur. Comput. 3(4), 312–326 (2006)CrossRef
14.
go back to reference Magalhães, João P., Silva, L.M.: Root-cause analysis of performance anomalies in Web-based applications. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 209–216 (2011) Magalhães, João P., Silva, L.M.: Root-cause analysis of performance anomalies in Web-based applications. In: Proceedings of the 2011 ACM Symposium on Applied Computing, pp. 209–216 (2011)
15.
go back to reference Munawar, M.A., Jiang, M., Reidemeister, T., Ward, P.A.S.: Filtering system metrics for minimal correlation-based self-monitoring. In: Third IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO), pp. 233–242 (2009) Munawar, M.A., Jiang, M., Reidemeister, T., Ward, P.A.S.: Filtering system metrics for minimal correlation-based self-monitoring. In: Third IEEE International Conference on Self-Adaptive and Self-Organizing Systems (SASO), pp. 233–242 (2009)
16.
go back to reference Peiris, M., Hill, J.H., Thelin, J., Bykov, S., Kliot, G., Konig, C.: PAD: performance anomaly detection in multi-server distributed systems. In: Proceedings of the 2014 IEEE International Conference on Cloud Computing, pp. 769–776 (2014) Peiris, M., Hill, J.H., Thelin, J., Bykov, S., Kliot, G., Konig, C.: PAD: performance anomaly detection in multi-server distributed systems. In: Proceedings of the 2014 IEEE International Conference on Cloud Computing, pp. 769–776 (2014)
17.
go back to reference Munawar, M.A., Ward, P.A.S.: A comparative study of pairwise regression techniques for problem determination. In: Proceedings of the 2007 Conference of the Center for Advanced Studies on Collaborative Research, pp. 152–166 (2007) Munawar, M.A., Ward, P.A.S.: A comparative study of pairwise regression techniques for problem determination. In: Proceedings of the 2007 Conference of the Center for Advanced Studies on Collaborative Research, pp. 152–166 (2007)
18.
go back to reference Guo, Z., Jiang, G., Chen, H., Yoshihira, K.: Tracking probabilistic correlation of monitoring data for fault detection in complex systems. In: Dependable Systems and Networks, 2006. DSN 2006. International Conference on, pp. 259–268 (2006) Guo, Z., Jiang, G., Chen, H., Yoshihira, K.: Tracking probabilistic correlation of monitoring data for fault detection in complex systems. In: Dependable Systems and Networks, 2006. DSN 2006. International Conference on, pp. 259–268 (2006)
19.
go back to reference Baba, K., Shibata, R., Sibuya, M.: Partial correlation and conditional correlation as measures of conditional independence. Aust. N. Z. J. Stat. 46(4), 657–664 (2004)MathSciNetCrossRefMATH Baba, K., Shibata, R., Sibuya, M.: Partial correlation and conditional correlation as measures of conditional independence. Aust. N. Z. J. Stat. 46(4), 657–664 (2004)MathSciNetCrossRefMATH
20.
go back to reference De La Fuente, A., Bing, N., Hoeschele, I., Mendes, P.: Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20(18), 3565–3574 (2004)CrossRef De La Fuente, A., Bing, N., Hoeschele, I., Mendes, P.: Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20(18), 3565–3574 (2004)CrossRef
21.
go back to reference Kenett, D.Y., Tumminello, M., Madi, A., Gur-Gershgoren, G., Mantegna, R.N., Ben-Jacob, E.: Dominating clasp of the financial sector revealed by partial correlation analysis of the stock market. PLoS ONE 5(12), 1–14 (2010)CrossRef Kenett, D.Y., Tumminello, M., Madi, A., Gur-Gershgoren, G., Mantegna, R.N., Ben-Jacob, E.: Dominating clasp of the financial sector revealed by partial correlation analysis of the stock market. PLoS ONE 5(12), 1–14 (2010)CrossRef
22.
go back to reference Menasce, D.: TPC-W: a benchmark for e-commerce. Internet Comput. IEEE 6(3), 83–87 (2002)CrossRef Menasce, D.: TPC-W: a benchmark for e-commerce. Internet Comput. IEEE 6(3), 83–87 (2002)CrossRef
23.
go back to reference Mi, N., Casale, G., Cherkasova, L., Smirni, E.: Sizing multi-tier systems with temporal dependence: benchmarks and analytic models. J. Internet Serv. Appl. 1(2), 117–134 (2010)CrossRef Mi, N., Casale, G., Cherkasova, L., Smirni, E.: Sizing multi-tier systems with temporal dependence: benchmarks and analytic models. J. Internet Serv. Appl. 1(2), 117–134 (2010)CrossRef
24.
go back to reference Munawar, M.A., Ward, P.A.S.: Leveraging many simple statistical models to adaptively monitor software systems. Int. J. High Perform Comput. Netw. 7(1), 29–39 (2011)CrossRef Munawar, M.A., Ward, P.A.S.: Leveraging many simple statistical models to adaptively monitor software systems. Int. J. High Perform Comput. Netw. 7(1), 29–39 (2011)CrossRef
25.
go back to reference Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press, Cambridge (2000)MATH Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn. MIT Press, Cambridge (2000)MATH
26.
go back to reference Pearl, J.: Causality: Models, Reasoning and Inference, 2nd edn. Cambridge University Press, Cambridge (2009)CrossRefMATH Pearl, J.: Causality: Models, Reasoning and Inference, 2nd edn. Cambridge University Press, Cambridge (2009)CrossRefMATH
27.
go back to reference Jiang, M., Munawar, M.A., Reidemeister, T., Ward, P.A.S.: Automatic fault detection and diagnosis in complex software systems by information-theoretic monitoring. In: Dependable Systems Networks, 2009. DSN ’09. IEEE/IFIP International Conference on, pp. 285–294 (2009) Jiang, M., Munawar, M.A., Reidemeister, T., Ward, P.A.S.: Automatic fault detection and diagnosis in complex software systems by information-theoretic monitoring. In: Dependable Systems Networks, 2009. DSN ’09. IEEE/IFIP International Conference on, pp. 285–294 (2009)
28.
go back to reference Sprinthall, R.C.: Basic Statistical Analysis, 9th edn. Pearson, London (2011) Sprinthall, R.C.: Basic Statistical Analysis, 9th edn. Pearson, London (2011)
29.
go back to reference Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Lawrence Erlbaum Associates, New Jersey (1988)MATH Cohen, J.: Statistical Power Analysis for the Behavioral Sciences, 2nd edn. Lawrence Erlbaum Associates, New Jersey (1988)MATH
32.
go back to reference Cohen, I., Zhang, S., Goldszmidt, M., Symons, J., Kelly, T., Fox, A.: Capturing, indexing, clustering, and retrieving system history. In: Proceedings of the Twentieth ACM Symposium on Operating Systems Principles, pp. 105–118 (2005) Cohen, I., Zhang, S., Goldszmidt, M., Symons, J., Kelly, T., Fox, A.: Capturing, indexing, clustering, and retrieving system history. In: Proceedings of the Twentieth ACM Symposium on Operating Systems Principles, pp. 105–118 (2005)
33.
go back to reference Chen, H., Jiang, G., Yoshihira, K., Saxena, A.: Invariants based failure diagnosis in distributed computing systems. In: Reliable Distributed Systems, 2010 29th IEEE Symposium on, pp. 160–166 (2010) Chen, H., Jiang, G., Yoshihira, K., Saxena, A.: Invariants based failure diagnosis in distributed computing systems. In: Reliable Distributed Systems, 2010 29th IEEE Symposium on, pp. 160–166 (2010)
34.
go back to reference Ghanbari, S., Amza, C.: Semantic-driven model composition for accurate anomaly diagnosis. In: Autonomic Computing, 2008. ICAC ’08. International Conference on, pp. 35–44 (2008) Ghanbari, S., Amza, C.: Semantic-driven model composition for accurate anomaly diagnosis. In: Autonomic Computing, 2008. ICAC ’08. International Conference on, pp. 35–44 (2008)
35.
go back to reference Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)MathSciNetCrossRefMATH Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)MathSciNetCrossRefMATH
36.
go back to reference Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATH
37.
go back to reference Malik, H., Hemmati, H., Hassan, A.E.: Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 1012–1021 (2013) Malik, H., Hemmati, H., Hassan, A.E.: Automatic detection of performance deviations in the load testing of large scale systems. In: Proceedings of the 2013 International Conference on Software Engineering, pp. 1012–1021 (2013)
38.
go back to reference Magalhaes, J.P., Silva, L.M.: Detection of performance anomalies in Web-based applications. In: Network Computing and Applications (NCA), 2010 9th IEEE International Symposium on, pp. 60–67 (2010) Magalhaes, J.P., Silva, L.M.: Detection of performance anomalies in Web-based applications. In: Network Computing and Applications (NCA), 2010 9th IEEE International Symposium on, pp. 60–67 (2010)
39.
go back to reference Mantegna, R.N.: Hierarchical structure in financial markets. Eur. Phys. J. B Condens. Matter Complex Syst. 11(1), 193–197 (1999)CrossRef Mantegna, R.N.: Hierarchical structure in financial markets. Eur. Phys. J. B Condens. Matter Complex Syst. 11(1), 193–197 (1999)CrossRef
40.
go back to reference Bonanno, G., Caldarelli, G., Lillo, F., Mantegna, R.N.: Topology of correlation-based minimal spanning trees in real and model markets. Phys. Rev. E 68(4), 046130 (2003)CrossRef Bonanno, G., Caldarelli, G., Lillo, F., Mantegna, R.N.: Topology of correlation-based minimal spanning trees in real and model markets. Phys. Rev. E 68(4), 046130 (2003)CrossRef
41.
go back to reference Tumminello, M., Coronnello, C., Lillo, F., Miccichè, S., Mantegna, R.N.: Spanning trees and bootstrap reliability estimation in correlation-based networks. Int. J. Bifurc. chaos 17, 2319–2329 (2007)CrossRefMATH Tumminello, M., Coronnello, C., Lillo, F., Miccichè, S., Mantegna, R.N.: Spanning trees and bootstrap reliability estimation in correlation-based networks. Int. J. Bifurc. chaos 17, 2319–2329 (2007)CrossRefMATH
42.
go back to reference Wang, C., Talwar, V., Schwan, K., Ranganathan, P.: Online detection of utility cloud anomalies using metric distributions. In: Network Operations and Management Symposium (NOMS), 2010 IEEE, pp. 96–103 (2010) Wang, C., Talwar, V., Schwan, K., Ranganathan, P.: Online detection of utility cloud anomalies using metric distributions. In: Network Operations and Management Symposium (NOMS), 2010 IEEE, pp. 96–103 (2010)
43.
go back to reference Kang, H., Zhu, X., Wong, J.L.: DAPA: diagnosing application performance anomalies for virtualized infrastructures. In: 2nd USENIX Workshop on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services (2012) Kang, H., Zhu, X., Wong, J.L.: DAPA: diagnosing application performance anomalies for virtualized infrastructures. In: 2nd USENIX Workshop on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services (2012)
47.
go back to reference Zhang, Q., Cherkasova, L., Smirni, E.: A regression-based analytic model for dynamic resource provisioning of multi-tier applications. In: Proceedings of the Fourth International Conference on Autonomic Computing, p. 27 (2007) Zhang, Q., Cherkasova, L., Smirni, E.: A regression-based analytic model for dynamic resource provisioning of multi-tier applications. In: Proceedings of the Fourth International Conference on Autonomic Computing, p. 27 (2007)
48.
go back to reference Kim, M., Sumbaly, R., Shah, S.: Root cause detection in a service-oriented architecture. SIGMETRICS Perform. Eval. Rev. 41(1), 93–104 (2013)CrossRef Kim, M., Sumbaly, R., Shah, S.: Root cause detection in a service-oriented architecture. SIGMETRICS Perform. Eval. Rev. 41(1), 93–104 (2013)CrossRef
49.
go back to reference Weinreich, H., Obendorf, H., Herder, E., Mayer, M.: Not quite the average: an empirical study of Web use. ACM Trans. Web 2(1), 5:1–5:31 (2008)CrossRef Weinreich, H., Obendorf, H., Herder, E., Mayer, M.: Not quite the average: an empirical study of Web use. ACM Trans. Web 2(1), 5:1–5:31 (2008)CrossRef
Metadata
Title
Decreasing the Management Burden in Multi-tier Systems Through Partial Correlation-Based Monitoring
Authors
Otto J. A. Pinno
Sand L. Correa
Aldri L. dos Santos
Kleber V. Cardoso
Publication date
25-01-2017
Publisher
Springer US
Published in
Journal of Network and Systems Management / Issue 3/2017
Print ISSN: 1064-7570
Electronic ISSN: 1573-7705
DOI
https://doi.org/10.1007/s10922-017-9402-7

Other articles of this Issue 3/2017

Journal of Network and Systems Management 3/2017 Go to the issue

Premium Partner