Skip to main content

2016 | OriginalPaper | Buchkapitel

LS-ADT: Lightweight and Scalable Anomaly Detection for Cloud Datacentres

verfasst von : Sakil Barbhuiya, Zafeirios Papazachos, Peter Kilpatrick, Dimitrios S. Nikolopoulos

Erschienen in: Cloud Computing and Services Science

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Cloud data centres are implemented as large-scale clusters with demanding requirements for service performance, availability and cost of operation. As a result of scale and complexity, data centres typically exhibit large numbers of system anomalies resulting from operator error, resource over/under provisioning, hardware or software failures and security issus anomalies are inherently difficult to identify and resolve promptly via human inspection. Therefore, it is vital in a cloud system to have automatic system monitoring that detects potential anomalies and identifies their source. In this paper we present a lightweight anomaly detection tool for Cloud data centres which combines extended log analysis and rigorous correlation of system metrics, implemented by an efficient correlation algorithm which does not require training or complex infrastructure set up. The LADT algorithm is based on the premise that there is a strong correlation between node level and VM level metrics in a cloud system. This correlation will drop significantly in the event of any performance anomaly at the node-level and a continuous drop in the correlation can indicate the presence of a true anomaly in the node. The log analysis of LADT assists in determining whether the correlation drop could be caused by naturally occurring cloud management activity such as VM migration, creation, suspension, termination or resizing. In this way, any potential anomaly alerts are reasoned about to prevent false positives that could be caused by the cloud operator’s activity. We demonstrate LADT with log analysis in a Cloud environment to show how the log analysis is combined with the correlation of systems metrics to achieve accurate anomaly detection.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Lou, J.G., Fu, Q., Yang, S., Xu, Y., Li, J.: Mining invariants from console logs for system problem detection. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC 2010, pp. 24–24. USENIX Association, Berkeley, CA, USA (2010) Lou, J.G., Fu, Q., Yang, S., Xu, Y., Li, J.: Mining invariants from console logs for system problem detection. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC 2010, pp. 24–24. USENIX Association, Berkeley, CA, USA (2010)
2.
Zurück zum Zitat Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.I.: Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP 2009, pp. 117–132. ACM, New York, NY, USA (2009) Xu, W., Huang, L., Fox, A., Patterson, D., Jordan, M.I.: Detecting large-scale system problems by mining console logs. In: Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP 2009, pp. 117–132. ACM, New York, NY, USA (2009)
3.
Zurück zum Zitat Tan, J., Kavulya, S., Gandhi, R., Narasimhan, P.: Light-weight black-box failure detection for distributed systems. In: Proceedings of the 2012 Workshop on Management of Big Data Systems, MBDS 2012, pp. 13–18. ACM, New York (2012) Tan, J., Kavulya, S., Gandhi, R., Narasimhan, P.: Light-weight black-box failure detection for distributed systems. In: Proceedings of the 2012 Workshop on Management of Big Data Systems, MBDS 2012, pp. 13–18. ACM, New York (2012)
4.
Zurück zum Zitat Wang, C.: Ebat: Online methods for detecting utility cloud anomalies. In: Proceedings of the 6th Middleware Doctoral Symposium, MDS 2009, pp. 4:1–4:6. ACM, New York (2009) Wang, C.: Ebat: Online methods for detecting utility cloud anomalies. In: Proceedings of the 6th Middleware Doctoral Symposium, MDS 2009, pp. 4:1–4:6. ACM, New York (2009)
5.
Zurück zum Zitat Ward, J.S., Barker, A.: Varanus: In situ monitoring for large scale cloud systems. In: Proceedings of the 2013 IEEE International Conference on Cloud Computing Technology and Science, CLOUDCOM 2013, Computer Society, vol. 02, pp. 341–344. IEEE, Washington, DC (2013) Ward, J.S., Barker, A.: Varanus: In situ monitoring for large scale cloud systems. In: Proceedings of the 2013 IEEE International Conference on Cloud Computing Technology and Science, CLOUDCOM 2013, Computer Society, vol. 02, pp. 341–344. IEEE, Washington, DC (2013)
6.
Zurück zum Zitat Kang, H., Chen, H., Jiang, G.: Peerwatch: a fault detection and diagnosis tool for virtualized consolidation systems. In: Proceedings of the 7th International Conference on Autonomic Computing, ICAC 2010, pp. 119–128. ACM, New York (2010) Kang, H., Chen, H., Jiang, G.: Peerwatch: a fault detection and diagnosis tool for virtualized consolidation systems. In: Proceedings of the 7th International Conference on Autonomic Computing, ICAC 2010, pp. 119–128. ACM, New York (2010)
7.
Zurück zum Zitat Jiang, M., Munawar, M.A., Reidemeister, T., Ward, P.A.: System monitoring with metric-correlation models: problems and solutions. In: Proceedings of the 6th International Conference on Autonomic Computing, ICAC 2009, pp. 13–22. ACM, New York (2009) Jiang, M., Munawar, M.A., Reidemeister, T., Ward, P.A.: System monitoring with metric-correlation models: problems and solutions. In: Proceedings of the 6th International Conference on Autonomic Computing, ICAC 2009, pp. 13–22. ACM, New York (2009)
8.
Zurück zum Zitat Barbhuiya, S., Papazachos, Z., Kilpatrick, P., Nikolopoulos, D.: In: A Lightweight Tool for Anomaly Detection in Cloud Data Centres, SCITEPRESS Digital Library, pp. 343–351 (2015) Barbhuiya, S., Papazachos, Z., Kilpatrick, P., Nikolopoulos, D.: In: A Lightweight Tool for Anomaly Detection in Cloud Data Centres, SCITEPRESS Digital Library, pp. 343–351 (2015)
9.
Zurück zum Zitat Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 1099–1110. ACM, New York (2008) Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 1099–1110. ACM, New York (2008)
10.
Zurück zum Zitat Oppenheimer, D., Ganapathi, A., Patterson, D.A.: Why do internet services fail, and what can be done about it? In: Proceedings of the 4th Conference on USENIX Symposium on Internet Technologies and Systems, USITS 2003, vol. 4, p. 1. USENIX Association, Berkeley, CA, USA (2003) Oppenheimer, D., Ganapathi, A., Patterson, D.A.: Why do internet services fail, and what can be done about it? In: Proceedings of the 4th Conference on USENIX Symposium on Internet Technologies and Systems, USITS 2003, vol. 4, p. 1. USENIX Association, Berkeley, CA, USA (2003)
11.
Zurück zum Zitat Kumar, V., Cooper, B.F., Eisenhauer, G., Schwan, K.: iManage: policy-driven self-management for enterprise-scale systems. In: Cerqueira, R., Campbell, R.H. (eds.) Middleware 2007. LNCS, vol. 4834, pp. 287–307. Springer, Heidelberg (2007)CrossRef Kumar, V., Cooper, B.F., Eisenhauer, G., Schwan, K.: iManage: policy-driven self-management for enterprise-scale systems. In: Cerqueira, R., Campbell, R.H. (eds.) Middleware 2007. LNCS, vol. 4834, pp. 287–307. Springer, Heidelberg (2007)CrossRef
12.
Zurück zum Zitat Pertet, S., Narasimhan, P.: Causes of failure in web applications. Technical report, CMU-PDL-05-109 (2005) Pertet, S., Narasimhan, P.: Causes of failure in web applications. Technical report, CMU-PDL-05-109 (2005)
13.
Zurück zum Zitat Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Computer 36, 41–50 (2003)CrossRef Kephart, J.O., Chess, D.M.: The vision of autonomic computing. Computer 36, 41–50 (2003)CrossRef
14.
Zurück zum Zitat Rouillard, J.P.: Refereed papers: real-time log file analysis using the simple event correlator (sec). In: Proceedings of the 18th USENIX Conference on System Administration, LISA 2004, pp. 133–150. USENIX Association, Berkeley, CA, USA (2004) Rouillard, J.P.: Refereed papers: real-time log file analysis using the simple event correlator (sec). In: Proceedings of the 18th USENIX Conference on System Administration, LISA 2004, pp. 133–150. USENIX Association, Berkeley, CA, USA (2004)
15.
Zurück zum Zitat Prewett, J.E.: Analyzing cluster log files using logsurfer. In: in Proceedings of the 4th Annual Conference on Linux Clusters (2003) Prewett, J.E.: Analyzing cluster log files using logsurfer. In: in Proceedings of the 4th Annual Conference on Linux Clusters (2003)
16.
Zurück zum Zitat Hansen, S.E., Atkins, E.T.: Automated system monitoring and notification with swatch. In: Proceedings of the 7th USENIX Conference on System Administration, LISA 1993, pp. 145–152. USENIX Association, Berkeley, CA, USA (1993) Hansen, S.E., Atkins, E.T.: Automated system monitoring and notification with swatch. In: Proceedings of the 7th USENIX Conference on System Administration, LISA 1993, pp. 145–152. USENIX Association, Berkeley, CA, USA (1993)
17.
Zurück zum Zitat Azmandian, F., Moffie, M., Alshawabkeh, M., Dy, J., Aslam, J., Kaeli, D.: Virtual machine monitor-based lightweight intrusion detection. ACM SIGOPS Operating Syst. Rev. 45, 38–53 (2011)CrossRef Azmandian, F., Moffie, M., Alshawabkeh, M., Dy, J., Aslam, J., Kaeli, D.: Virtual machine monitor-based lightweight intrusion detection. ACM SIGOPS Operating Syst. Rev. 45, 38–53 (2011)CrossRef
18.
Zurück zum Zitat Rabkin, A., Katz, R.: Chukwa: a system for reliable large-scale log collection. In: Proceedings of the 24th International Conference on Large Installation System Administration, LISA 2010, pp. 1–15. USENIX Association, Berkeley, CA, USA (2010) Rabkin, A., Katz, R.: Chukwa: a system for reliable large-scale log collection. In: Proceedings of the 24th International Conference on Large Installation System Administration, LISA 2010, pp. 1–15. USENIX Association, Berkeley, CA, USA (2010)
19.
Zurück zum Zitat Vora, M.: Hadoop-hbase for large-scale data. In: 2011 International Conference on Computer Science and Network Technology (ICCSNT), vol. 1, pp. 601–605 (2011) Vora, M.: Hadoop-hbase for large-scale data. In: 2011 International Conference on Computer Science and Network Technology (ICCSNT), vol. 1, pp. 601–605 (2011)
22.
Zurück zum Zitat Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST 2010, Computer Society, pp. 1–10. IEEE, Washington, DC (2010) Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST 2010, Computer Society, pp. 1–10. IEEE, Washington, DC (2010)
23.
Zurück zum Zitat Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2012, pp. 37–48. ACM, New York (2012) Ferdman, M., Adileh, A., Kocberber, O., Volos, S., Alisafaee, M., Jevdjic, D., Kaynak, C., Popescu, A.D., Ailamaki, A., Falsafi, B.: Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2012, pp. 37–48. ACM, New York (2012)
24.
Zurück zum Zitat Dahbur, K., Mohammad, B., Tarakji, A.B.: A survey of risks, threats and vulnerabilities in cloud computing. In: Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications, ISWSA 2011, pp. 12:1–12:6. ACM, New York (2011) Dahbur, K., Mohammad, B., Tarakji, A.B.: A survey of risks, threats and vulnerabilities in cloud computing. In: Proceedings of the 2011 International Conference on Intelligent Semantic Web-Services and Applications, ISWSA 2011, pp. 12:1–12:6. ACM, New York (2011)
25.
Zurück zum Zitat Antunes, J., Neves, N., Verissimo, P.: Detection and prediction of resource-exhaustion vulnerabilities. In: 19th International Symposium on Software Reliability Engineering, ISSRE 2008, pp. 87–96 (2008) Antunes, J., Neves, N., Verissimo, P.: Detection and prediction of resource-exhaustion vulnerabilities. In: 19th International Symposium on Software Reliability Engineering, ISSRE 2008, pp. 87–96 (2008)
26.
Zurück zum Zitat Li, D., Jin, H., Liao, X., Zhang, Y., Zhou, B.: Improving disk i/o performance in a virtualized system. J. Comput. Syst. Sci. 79, 187–200 (2013)MathSciNetCrossRef Li, D., Jin, H., Liao, X., Zhang, Y., Zhou, B.: Improving disk i/o performance in a virtualized system. J. Comput. Syst. Sci. 79, 187–200 (2013)MathSciNetCrossRef
Metadaten
Titel
LS-ADT: Lightweight and Scalable Anomaly Detection for Cloud Datacentres
verfasst von
Sakil Barbhuiya
Zafeirios Papazachos
Peter Kilpatrick
Dimitrios S. Nikolopoulos
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-29582-4_8

Premium Partner