Skip to main content
Top

2021 | OriginalPaper | Chapter

A Systematic Mapping Study in AIOps

Authors : Paolo Notaro, Jorge Cardoso, Michael Gerndt

Published in: Service-Oriented Computing – ICSOC 2020 Workshops

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

IT systems of today are becoming larger and more complex, rendering their human supervision more difficult. Artificial Intelligence for IT Operations (AIOps) has been proposed to tackle modern IT administration challenges thanks to AI and Big Data. However, past AIOps contributions are scattered, unorganized and missing a common terminology convention, which renders their discovery and comparison impractical. In this work, we conduct an in-depth mapping study to collect and organize the numerous scattered contributions to AIOps in a unique reference index. We create an AIOps taxonomy to build a foundation for future contributions and allow an efficient comparison of AIOps papers treating similar problems. We investigate temporal trends and classify AIOps contributions based on the choice of algorithms, data sources and the target components. Our results show a recent and growing interest towards AIOps, specifically to those contributions treating failure-related tasks (62%), such as anomaly detection and root cause analysis.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
4.
go back to reference Attariyan, M., Chow, M., Flinn, J.: X-ray: automating root-cause diagnosis of performance anomalies in production software. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, pp. 307–320, October 2012. https://doi.org/10.5555/2387880.2387910 Attariyan, M., Chow, M., Flinn, J.: X-ray: automating root-cause diagnosis of performance anomalies in production software. In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, pp. 307–320, October 2012. https://​doi.​org/​10.​5555/​2387880.​2387910
5.
go back to reference Bahl, P., Chandra, R., Greenberg, A., Kandula, S., Maltz, D.A., Zhang, M.: Towards highly reliable enterprise network services via inference of multi-level dependencies. In: Proceedings of the 2007 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications - SIGCOMM (2007). https://doi.org/10.1145/1282380.1282383 Bahl, P., Chandra, R., Greenberg, A., Kandula, S., Maltz, D.A., Zhang, M.: Towards highly reliable enterprise network services via inference of multi-level dependencies. In: Proceedings of the 2007 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications - SIGCOMM (2007). https://​doi.​org/​10.​1145/​1282380.​1282383
6.
7.
10.
go back to reference Chow, M., Meisner, D., Flinn, J., Peek, D., Wenisch, T.F.: The mystery machine: end-to-end performance analysis of large-scale internet services. In: OSDI 2014: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, pp. 217–231 (2014). https://doi.org/10.5555/2685048.2685066 Chow, M., Meisner, D., Flinn, J., Peek, D., Wenisch, T.F.: The mystery machine: end-to-end performance analysis of large-scale internet services. In: OSDI 2014: Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, pp. 217–231 (2014). https://​doi.​org/​10.​5555/​2685048.​2685066
11.
go back to reference Cohen, I., Goldszmidt, M., Kelly, T., Symons, J., Chase, J.S.: Correlating instrumentation data to system states: a building block for automated diagnosis and control. In: Proceedings of the 6th USENIX Conference on Symposium on Operating Systems Design & Implementation, OSDI 2004 (2004). https://doi.org/10.5555/1251254.1251270 Cohen, I., Goldszmidt, M., Kelly, T., Symons, J., Chase, J.S.: Correlating instrumentation data to system states: a building block for automated diagnosis and control. In: Proceedings of the 6th USENIX Conference on Symposium on Operating Systems Design & Implementation, OSDI 2004 (2004). https://​doi.​org/​10.​5555/​1251254.​1251270
12.
go back to reference Costa, C.H., Park, Y., Rosenburg, B.S., Cher, C.Y., Ryu, K.D.: A system software approach to proactive memory-error avoidance. In: SC 2014: International Conference for High Performance Computing, Networking, Storage and Analysis, November 2014. https://doi.org/10.1109/sc.2014.63 Costa, C.H., Park, Y., Rosenburg, B.S., Cher, C.Y., Ryu, K.D.: A system software approach to proactive memory-error avoidance. In: SC 2014: International Conference for High Performance Computing, Networking, Storage and Analysis, November 2014. https://​doi.​org/​10.​1109/​sc.​2014.​63
14.
go back to reference Davis, N.A., Rezgui, A., Soliman, H., Manzanares, S., Coates, M.: FailureSim: a system for predicting hardware failures in cloud data centers using neural networks. In: IEEE 10th International Conference on Cloud Computing (CLOUD), Jun 2017. https://doi.org/10.1109/cloud.2017.75 Davis, N.A., Rezgui, A., Soliman, H., Manzanares, S., Coates, M.: FailureSim: a system for predicting hardware failures in cloud data centers using neural networks. In: IEEE 10th International Conference on Cloud Computing (CLOUD), Jun 2017. https://​doi.​org/​10.​1109/​cloud.​2017.​75
16.
go back to reference Garg, S., van Moorsel, A., Vaidyanathan, K., Trivedi, K.: A methodology for detection and estimation of software aging. In: Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257). IEEE Computer Society (1998). https://doi.org/10.1109/issre.1998.730892 Garg, S., van Moorsel, A., Vaidyanathan, K., Trivedi, K.: A methodology for detection and estimation of software aging. In: Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257). IEEE Computer Society (1998). https://​doi.​org/​10.​1109/​issre.​1998.​730892
18.
go back to reference Jalali, S., Wohlin, C.: Systematic literature studies: database searches vs. backward snowballing. In: Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 29–38, September 2012. https://doi.org/10.1145/2372251.2372257 Jalali, S., Wohlin, C.: Systematic literature studies: database searches vs. backward snowballing. In: Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 29–38, September 2012. https://​doi.​org/​10.​1145/​2372251.​2372257
21.
go back to reference Lakhina, A., Crovella, M., Diot, C.: Diagnosing network-wide traffic anomalies. In: Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications - SIGCOMM 2004. ACM Press (2004). https://doi.org/10.1145/1015467.1015492 Lakhina, A., Crovella, M., Diot, C.: Diagnosing network-wide traffic anomalies. In: Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications - SIGCOMM 2004. ACM Press (2004). https://​doi.​org/​10.​1145/​1015467.​1015492
22.
go back to reference Lakhina, A., Crovella, M., Diot, C.: Mining anomalies using traffic feature distributions. In: Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications - SIGCOMM 2005. ACM Press (2005). https://doi.org/10.1145/1080091.1080118 Lakhina, A., Crovella, M., Diot, C.: Mining anomalies using traffic feature distributions. In: Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications - SIGCOMM 2005. ACM Press (2005). https://​doi.​org/​10.​1145/​1080091.​1080118
26.
28.
go back to reference Chen, M.Y., Accardi, A., Kiciman, E., Lloyd, J., Patterson, D., Fox, A., Brewer, E.: Path-based failure and evolution management. In: Proceedings of the 1st Conference on Symposium on Networked Systems Design and Implementation, NSDI 2004, San Francisco, California, vol. 1, p. 23, March 2004. https://doi.org/10.5555/1251175.1251198 Chen, M.Y., Accardi, A., Kiciman, E., Lloyd, J., Patterson, D., Fox, A., Brewer, E.: Path-based failure and evolution management. In: Proceedings of the 1st Conference on Symposium on Networked Systems Design and Implementation, NSDI 2004, San Francisco, California, vol. 1, p. 23, March 2004. https://​doi.​org/​10.​5555/​1251175.​1251198
29.
go back to reference Moody, A., Bronevetsky, G., Mohror, K., Supinski, B.R.D.: Design, modeling, and evaluation of a scalable multi-level checkpointing system. In: 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, November 2010. https://doi.org/10.1109/sc.2010.18 Moody, A., Bronevetsky, G., Mohror, K., Supinski, B.R.D.: Design, modeling, and evaluation of a scalable multi-level checkpointing system. In: 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, November 2010. https://​doi.​org/​10.​1109/​sc.​2010.​18
38.
go back to reference Samir, A., Pahl, C.: A controller architecture for anomaly detection, root cause analysis and self-adaptation for cluster architectures. In: International Conference on Adaptive and Self-Adaptive Systems and Applications (2019). 10993/42062 Samir, A., Pahl, C.: A controller architecture for anomaly detection, root cause analysis and self-adaptation for cluster architectures. In: International Conference on Adaptive and Self-Adaptive Systems and Applications (2019). 10993/42062
39.
40.
go back to reference Sharma, A.B., Chen, H., Ding, M., Yoshihira, K., Jiang, G.: Fault detection and localization in distributed systems using invariant relationships. In: 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), June 2013. https://doi.org/10.1109/dsn.2013.6575304 Sharma, A.B., Chen, H., Ding, M., Yoshihira, K., Jiang, G.: Fault detection and localization in distributed systems using invariant relationships. In: 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), June 2013. https://​doi.​org/​10.​1109/​dsn.​2013.​6575304
Metadata
Title
A Systematic Mapping Study in AIOps
Authors
Paolo Notaro
Jorge Cardoso
Michael Gerndt
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-76352-7_15

Premium Partner