Abstract
AIOps (Artificial Intelligence for IT Operations) leverages machine learning models to help practitioners handle the massive data produced during the operations of large-scale systems. However, due to the nature of the operation data, AIOps modeling faces several data splitting-related challenges, such as imbalanced data, data leakage, and concept drift. In this work, we study the data leakage and concept drift challenges in the context of AIOps and evaluate the impact of different modeling decisions on such challenges. Specifically, we perform a case study on two commonly studied AIOps applications: (1) predicting job failures based on trace data from a large-scale cluster environment and (2) predicting disk failures based on disk monitoring data from a large-scale cloud storage environment. First, we observe that the data leakage issue exists in AIOps solutions. Using a time-based splitting of training and validation datasets can significantly reduce such data leakage, making it more appropriate than using a random splitting in the AIOps context. Second, we show that AIOps solutions suffer from concept drift. Periodically updating AIOps models can help mitigate the impact of such concept drift, while the performance benefit and the modeling cost of increasing the update frequency depend largely on the application data and the used models. Our findings encourage future studies and practices on developing AIOps solutions to pay attention to their data-splitting decisions to handle the data leakage and concept drift challenges.
- Giuseppe Aceto, Domenico Ciuonzo, Antonio Montieri, and Antonio Pescapè. 2019. MIMETIC: Mobile encrypted traffic classification using multimodal deep learning. Comput. Netw. 165 (2019), 106Google ScholarCross Ref
- Amritanshu Agrawal and Tim Menzies. 2018. Is “Better data” better than “better data miners”?: On the benefits of tuning SMOTE for defect prediction. In Proceedings of the 40th International Conference on Software Engineering (ICSE’18). 1050–1061. Google ScholarDigital Library
- Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In Proceedings of the IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP’19). 291–300. Google ScholarDigital Library
- James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 10 (2012), 281–305. Google ScholarDigital Library
- Mirela Madalina Botezatu, Ioana Giurgiu, Jasmina Bogojeska, and Dorothea Wiesmann. 2016. Predicting disk replacement towards reliable data centers. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). 39–48. Google ScholarDigital Library
- Sabri Boughorbel, Fethi Jarray, and Mohammed El-Anbari. 2017. Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PLoS One 12, 6 (2017), 1–17.Google ScholarCross Ref
- Leo Breiman. 2001. Random forests. Mach. Learn. 45, 1 (2001), 5–32. Google ScholarDigital Library
- Dariusz Brzezinski and Jerzy Stefanowski. 2014. Prequential AUC for classifier evaluation and drift detection in evolving data streams. In Proceedings of the 3rd International Conference on New Frontiers in Mining Complex Patterns (NFMCP’14). 87–101. Google ScholarDigital Library
- Dariusz Brzezinski and Jerzy Stefanowski. 2014. Reacting to different types of concept drift: The accuracy updated ensemble algorithm. IEEE Trans. Neural Netw. Learn. Syst. 25, 1 (2014), 81–94.Google ScholarCross Ref
- Dariusz Brzezinski and Jerzy Stefanowski. 2017. Prequential AUC: Properties of the area under the ROC curve for data streams with concept drift. Knowl. Inf. Syst. 52 (2017), 531–562. Google ScholarDigital Library
- Alberto Cano and Bartosz Krawczyk. 2019. Evolving rule-based classifiers with genetic programming on GPUs for drifting data streams. Pattern Recog. 87 (2019), 248–268.Google ScholarCross Ref
- Alberto Cano and Bartosz Krawczyk. 2020. Kappa updated ensemble for drifting data stream mining. Mach. Learn. 109, 1 (2020), 175–218.Google ScholarDigital Library
- Nitesh V. Chawla, Kevin Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16 (2002), 321–357. Google ScholarDigital Library
- Xin Chen, Charng-Da Lu, and Karthik Pattabiraman. 2014. Failure prediction of jobs in compute clouds: A Google cluster case study. In Proceedings of the IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW’14). 341–346. Google ScholarDigital Library
- Yong Chen, Ruping Pan, and Alexander Pfeifer. 2017. Regulation of brown and beige fat by MicroRNAs. Pharmacol. Therap. 170 (2017), 1–7.Google ScholarCross Ref
- Yujun Chen, Xian Yang, Qingwei Lin, Hongyu Zhang, Feng Gao, Zhangwei Xu, Yingnong Dang, Dongmei Zhang, Hang Dong, Yong Xu, Hao Li, and Yu Kang. 2019. Outage prediction and diagnosis for cloud service systems. In Proceedings of the World Wide Web Conference (WWW’19). 2659–2665. Google ScholarDigital Library
- Yingnong Dang, Qingwei Lin, and Peng Huang. 2019. AIOps: Real-World challenges and research innovations. In Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion’19). 4–5. Google ScholarDigital Library
- Rui Ding, Qiang Fu, Jian-Guang Lou, Qingwei Lin, Dongmei Zhang, Jiajun Shen, and Tao Xie. 2012. Healing online service systems via mining historical issue repositories. In Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering (ASE’12). 318–321. Google ScholarDigital Library
- Rui Ding, Qiang Fu, Jian Guang Lou, Qingwei Lin, Dongmei Zhang, and Tao Xie. 2014. Mining historical issue repositories to heal large-scale online service systems. In Proceedings of the 44th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’14). 311–322. Google ScholarDigital Library
- Pedro Domingos and Geoff Hulten. 2000. Mining high-speed data streams. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’00). 71–80. Google ScholarDigital Library
- Priyanka B. Dongre and Latesh G. Malik. 2014. A review on real time data stream classification and adapting to various concept drift scenarios. In Proceedings of the IEEE International Advance Computing Conference (IACC’14). 533–537.Google Scholar
- Jayalath Ekanayake, Jonas Tappolet, Harald C. Gall, and Abraham Bernstein. 2009. Tracking concept drift of software projects using defect prediction quality. In Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories (MSR’09). 51–60. Google ScholarDigital Library
- Jayalath Ekanayake, Jonas Tappolet, Harald C. Gall, and Abraham Bernstein. 2012. Time variance and defect prediction in software projects—Towards an exploitation of periods of stability and change as well as a notion of concept drift in software projects. Empir. Softw. Eng. 17, 4–5 (2012), 348–389. Google ScholarDigital Library
- Nosayba El-Sayed, Hongyu Zhu, and Bianca Schroeder. 2017. Learning from failure across multiple clusters: A trace-driven approach to understanding, predicting, and mitigating job terminations. In Proceedings of the 37th IEEE International Conference on Distributed Computing Systems (ICDCS’17). 1333–1344.Google ScholarCross Ref
- James D. Evans. 1996. Straightforward Statistics for the Behavioral Sciences. Brooks/Cole Publishing Co.Google Scholar
- João Gama, Pedro Medas, Gladys Castillo, and Pedro Rodrigues. 2004. Learning with drift detection. In Proceedings of the Advances in Artificial Intelligence Conference (SBIA’04). 286–295.Google ScholarCross Ref
- João Gama, Raquel Sebastião, and Pedro Rodrigues. 2013. On evaluating stream learning algorithms. Mach. Learn. 90 (2013), 317–346. Google ScholarDigital Library
- João Gama, Indrė Žliobaitė, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM Comput. Surv. 46, 4 (2014). Google ScholarDigital Library
- Baljinder Ghotra, Shane McIntosh, and Ahmed E. Hassan. 2015. Revisiting the impact of classification techniques on the performance of defect prediction models. In Proceedings of the 37th International Conference on Software Engineering (ICSE’15). 789–800. Google ScholarDigital Library
- Maayan Harel, Koby Crammer, Ran El-Yaniv, and Shie Mannor. 2014. Concept drift detection through resampling. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 (ICML’14). II–1009–II–1017. Google ScholarDigital Library
- Shilin He, Qingwei Lin, Jian-Guang Lou, Hongyu Zhang, Michael R. Lyu, and Dongmei Zhang. 2018. Identifying impactful service system problems via log analysis. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’18). 60–70. Google ScholarDigital Library
- T. Ryan Hoens, Robi Polikar, and Nitesh V. Chawla. 2012. Learning from streaming data with concept drift and imbalance: An overview. Prog. Artif. Intell. 1, 1 (2012), 89–101.Google ScholarCross Ref
- Geoff Hulten, Laurie Spencer, and Pedro Domingos. 2001. Mining time-changing data streams. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’01). 97–106. Google ScholarDigital Library
- Backblaze Inc.2020. Backblaze Hard Drive Stats. Backblaze B2 Cloud Storage. Retrieved from https://www.backblaze.com/b2/hard-drive-test-data.html.Google Scholar
- Arya Iranmehr, Hamed Masnadi-Shirazi, and Nuno Vasconcelos. 2019. Cost-sensitive support vector machines. Neurocomputing 343 (2019), 50–64.Google ScholarDigital Library
- Rie Johnson and Tong Zhang. 2013. Learning nonlinear functions using regularized greedy forest. IEEE Trans. Pattern Anal. Mach. Intell. 36, 5 (2013), 942–954.Google ScholarCross Ref
- Yasutaka Kamei, Takafumi Fukushima, Shane McIntosh, Kazuhiro Yamashita, Naoyasu Ubayashi, and Ahmed E. Hassan. 2016. Studying just-in-time defect prediction using cross-project models. Empir. Softw. Eng. 21 (2016), 2072–2106. Google ScholarDigital Library
- Andrej Karpathy. 2019. “We see more significant improvements from training data distribution search (data splits + oversampling factor ratios) than neural architecture search. The latter is so overrated :)”. Retrieved from https://twitter.com/karpathy/status/1175138379198914560.Google Scholar
- Shachar Kaufman, Saharon Rosset, and Claudia Perlich. 2011. Leakage in data mining: Formulation, detection, and avoidance. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’11), Vol. 6. 556–563. Google ScholarDigital Library
- Shachar Kaufman, Saharon Rosset, Claudia Perlich, and Ori Stitelman. 2012. Leakage in data mining: Formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data 6, 4 (2012), 1–21. Google ScholarDigital Library
- Foutse Khomh, Bram Adams, Jinghui Cheng, Marios Fokaefs, and Giuliano Antoniol. 2018. Software engineering for machine-learning applications: The road ahead. IEEE Softw. 35, 5 (2018), 81–84.Google ScholarCross Ref
- Ralf Klinkenberg and Thorsten Joachims. 2000. Detecting concept drift with support vector machines. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). 487–494. Google ScholarDigital Library
- Bartosz Krawczyk and Alberto Cano. 2018. Online ensemble learning with abstaining classifiers for drifting and noisy data streams. Appl. Soft Comput. 68 (2018), 677–692.Google ScholarDigital Library
- Max Kuhn and Kjell Johnson. 2013. Applied Predictive Modeling. Vol. 26. Springer.Google Scholar
- Mark Last. 2002. Online classification of nonstationary data streams. Intell. Data Anal. 6, 2 (2002), 129–147. Google ScholarDigital Library
- Heng Li, Tse-Hsun Peter Chen, Ahmed E. Hassan, Mohamed Nasser, and Parminder Flora. 2018. Adopting autonomic computing capabilities in existing large-scale systems: An industrial experience report. In Proceedings of the 40th International Conference on Software Engineering (ICSE-SEIP’18). 1–10. Google ScholarDigital Library
- Heng Li, Weiyi Shang, and Ahmed E. Hassan. 2017. Which log level should developers choose for a new logging statement?Empir. Softw. Eng. 22, 4 (2017), 1684–1716. Google ScholarDigital Library
- Heng Li, Weiyi Shang, Ying Zou, and Ahmed E. Hassan. 2017. Towards just-in-time suggestions for log changes. Empir. Softw. Eng. 22, 4 (2017), 1831–1865. Google ScholarDigital Library
- Yangguang Li, Zhen Ming Jiang, Heng Li, Ahmed E. Hassan, Cheng He, Ruirui Huang, Zhengda Zeng, Mian Wang, and Pinan Chen. 2020. Predicting node failures in an ultra-large-scale cloud computing platform: An AIOps solution. ACM Trans. Softw. Eng. Methodol. 29, 2 (2020), 1–24. Google ScholarDigital Library
- Meng-Hui Lim, Jian-Guang Lou, Hongyu Zhang, Qiang Fu, Andrew Beng Jin Teoh, Qingwei Lin, Rui Ding, and Dongmei Zhang. 2014. Identifying recurrent and unknown performance issues. In Proceedings of the IEEE International Conference on Data Mining (ICDM’14). 320–329. Google ScholarDigital Library
- Qingwei Lin, Ken Hsieh, Yingnong Dang, Hongyu Zhang, Kaixin Sui, Yong Xu, Jian-Guang Lou, Chenggang Li, Youjiang Wu, Randolph Yao, et al. 2018. Predicting node failure in cloud service systems. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE’18). 480–490. Google ScholarDigital Library
- Jian-Guang Lou, Qingwei Lin, Rui Ding, Qiang Fu, Dongmei Zhang, and Tao Xie. 2013. Software analytics for incident management of online services: An experience report. In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE’13). 475–485. Google ScholarDigital Library
- Jian-Guang Lou, Qingwei Lin, Rui Ding, Qiang Fu, Dongmei Zhang, and Tao Xie. 2017. experience report on applying software analytics in incident management of online service. Autom. Softw. Eng. 24, 4 (2017), 905–941. Google ScholarDigital Library
- Nicola Lunardon, Giovanna Menardi, and Nicola Torelli. 2014. ROSE: A package for binary imbalanced learning. R. Journal 6 (2014), 79–89.Google Scholar
- Chen Luo, Jian-Guang Lou, Qingwei Lin, Qiang Fu, Rui Ding, Dongmei Zhang, and Zhe Wang. 2014. Correlating events with time series for incident diagnosis. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). 1583–1592. Google ScholarDigital Library
- Guillermo Eduardo Macbeth, Eugenia Razumiejczyk, and Rubén Daniel Ledesma. 2011. Cliff’s delta calculator: A non-parametric effect size program for two groups of observations. Univers. Psychol. 10 (2011), 545–555.Google ScholarCross Ref
- Farzaneh Mahdisoltani, Ioan Stefanovici, and Bianca Schroeder. 2017. Proactive error prediction to improve storage system reliability. In Proceedings of the USENIX Annual Technical Conference (ATC’17). 391–402. Google ScholarDigital Library
- Shane Mcintosh, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan. 2016. An empirical study of the impact of modern code review practices on software quality. Empir. Softw. Eng. 21, 5 (2016), 2146–2189. Google ScholarDigital Library
- Giovanna Menardi and Nicola Torelli. 2014. Training and assessing classification rules with imbalanced data. Data Mining Knowl. Discov. 28, 1 (2014), 92–122. Google ScholarDigital Library
- Leandro L. Minku, Allan P. White, and Xin Yao. 2009. The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans. Knowl. Data Eng. 22, 5 (2009), 730–742. Google ScholarDigital Library
- Leandro L. Minku and Xin Yao. 2011. DDD: A new ensemble approach for dealing with concept drift. IEEE Trans. Knowl. Data Eng. 24, 4 (2011), 619–633. Google ScholarDigital Library
- Leon Moonen, Stefano Di Alesio, David Binkley, and Thomas Rolfsnes. 2016. Practical guidelines for change recommendation using association rule mining. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE’16). 732–743. Google ScholarDigital Library
- Andrew Ng. 2019. “The rise of Software Engineering required inventing processes like version control, code review, agile, to help teams work effectively. The rise of AI & Machine Learning Engineering is now requiring new processes, like how we split train/dev/test, model zoos, etc.”Retrieved from https://twitter.com/andrewyng/status/10808864393 80869122.Google Scholar
- Kyosuke Nishida and Koichiro Yamauchi. 2007. Detecting concept drift using statistical testing. In Discovery Science. Springer, 264–269. Google ScholarDigital Library
- Claudia Perlich. 2014. Lessons learned from data competitions: Data leakage and model evaluation. In Doing Data Science: Straight Talk from the Frontline. O’Reilly Media, Inc., 303–320.Google Scholar
- Teerat Pitakrat, André van Hoorn, and Lars Grunske. 2013. A comparison of machine learning algorithms for proactive hard disk drive failure detection. In Proceedings of the 4th International ACM Sigsoft Symposium on Architecting Critical Systems (ISARCS’13). 1–10. Google ScholarDigital Library
- Pankaj Prasad and Charley Rich. 2018. Market Guide for AIOps Platforms. Gartner Research. Retrieved from https://www.gartner.com/doc/3892967/market-guide-aiops-platforms.Google Scholar
- Thomas Rolfsnes, Leon Moonen, and David Binkley. 2017. Predicting relevance of change recommendations. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE’17). 694–705. Google ScholarDigital Library
- Jeanine Romano, Jeffrey D. Kromrey, Jesse Coraggio, and Jeff Skowronek. 2006. Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen’d for evaluating group differences on the NSSE and other surveys. In Proceedings of the Meeting of the Florida Association of Institutional Research. 1–3.Google Scholar
- Andrea Rosà, Lydia Y. Chen, and Walter Binder. 2015. Catching failures of failures at big-data clusters: A two-level neural network approach. In Proceedings of the IEEE 23rd International Symposium on Quality of Service (IWQoS’15). 231–236.Google ScholarCross Ref
- Andrea Rosà, Lydia Y. Chen, and Walter Binder. 2015. Predicting and mitigating jobs failures in big data clusters. In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid’15). 221–230. Google ScholarDigital Library
- Saharon Rosset, Claudia Perlich, Grzergorz Świrszcz, Prem Melville, and Yan Liu. 2010. Medical data mining: Insights from winning two competitions. Data Mining Knowl. Discov. 20, 3 (2010), 439–468. Google ScholarDigital Library
- Sebastian Schelter, Felix Biessmann, Tim Januschowski, David Salinas, Stephan Seufert, Gyuri Szarvas, Manasi Vartak, Samuel Madden, Hui Miao, Amol Deshpande, et al. 2018. On challenges in machine learning model management. IEEE Data Eng. Bull. 41, 4 (2018), 5–15.Google Scholar
- Conlan Scientific. 2020. Avoiding Data Leakage in Machine Learning. Conlan Scientific. Retrieved from https://conlanscientific.com/posts/category/blog/post/avoiding-data-leakage-machine-learning/.Google Scholar
- Andrew Jhon Scott and M. Knott. 1974. A cluster analysis method for grouping means in the analysis of variance. Biometrics 30, 3 (1974), 507–512.Google ScholarCross Ref
- Liyan Song, Leandro L. Minku, and Xin Yao. 2013. The impact of parameter tuning on software effort estimation using learning machines. In Proceedings of the 9th International Conference on Predictive Models in Software Engineering (PROMISE’13). Google ScholarDigital Library
- W. Nick Street and YongSeog Kim. 2001. A streaming ensemble algorithm (SEA) for large-scale classification. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’01). New York, NY, 377–382. Google ScholarDigital Library
- Ming Tan, Lin Tan, Sashank Dara, and Caleb Mayeux. 2015. Online defect prediction for imbalanced data. In Proceedings of the IEEE/ACM 37th IEEE International Conference on Software Engineering (ICSE’15), Vol. 2. 99–108. Google ScholarDigital Library
- Chakkrit Tantithamthavorn and Ahmed E. Hassan. 2018. An experience report on defect modelling in practice: Pitfalls and challenges. In Proceedings of the 40th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP’18). 286–295. Google ScholarDigital Library
- Chakkrit Tantithamthavorn, Ahmed E. Hassan, and Kenichi Matsumoto. 2020. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans. Softw. Eng. 46, 11 (2020), 1200–1219.Google ScholarCross Ref
- Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto. 2016. Automated parameter optimization of classification techniques for defect prediction models. In Proceedings of the 38th International Conference on Software Engineering (ICSE’16). 321–332. Google ScholarDigital Library
- Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto. 2017. An empirical comparison of model validation techniques for defect prediction models. IEEE Trans. Softw. Eng. 43, 1 (2017), 1–18. Google ScholarDigital Library
- Chakkrit Tantithamthavorn, Shane McIntosh, Ahmed E. Hassan, and Kenichi Matsumoto. 2018. The Impact of automated parameter optimization on defect prediction models. IEEE Trans. Softw. Eng. 45, 7 (2018), 683–711.Google ScholarCross Ref
- Alexey Tsymbal. 2004. The problem of concept drift: Definitions and related work. Comput. Sci. Depart., Trinity Coll. Dublin 106, 2 (2004), 58.Google Scholar
- Haixun Wang, Wei Fan, Philip S. Yu, and Jiawei Han. 2003. Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’03). 226–235. Google ScholarDigital Library
- Haixun Wang, Philip S. Yu, and Jiawei Han. 2010. Mining concept-drifting data streams. In Data Mining and Knowledge Discovery Handbook. Springer, 789–802.Google Scholar
- Shuo Wang, Leandro L. Minku, Davide Ghezzi, Daniele Caltabiano, Peter Tino, and Xin Yao. 2013. Concept drift detection for online class imbalance learning. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’13). 1–10.Google ScholarCross Ref
- John Wilkes. 2020. Google Cluster-Usage Traces V3. Technical Report. Google Inc.Retrieved from https://github.com/google/cluster-data/blob/master/ClusterData2019.md.Google Scholar
- Yong Xu, Kaixin Sui, Randolph Yao, Hongyu Zhang, Qingwei Lin, Yingnong Dang, Peng Li, Keceng Jiang, Wenchi Zhang, Jian-Guang Lou, Murali Chintalapati, and Dongmei Zhang. 2018. Improving service availability of cloud systems by predicting disk error. In Proceedings of the USENIX Annual Technical Conference (ATC’15). 481–494. Google ScholarDigital Library
- Ji Xue, Robert Birke, Lydia Y. Chen, and Evgenia Smirni. 2016. Managing data center tickets: Prediction and active sizing. In Proceedings of the 46th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’16). 335–346.Google ScholarCross Ref
- Ji Xue, Robert Birke, Lydia Y. Chen, and Evgenia Smirni. 2018. Spatial-temporal prediction models for active ticket managing in data centers. IEEE Trans. Netw. Serv. Manag. 15, 1 (2018), 39–52.Google ScholarCross Ref
- Indrė Žliobaitė, Mykola Pechenizkiy, and João Gama. 2016. An overview of concept drift applications. In Big Data Analysis: New Algorithms for a New Society. Springer, 91–114.Google Scholar
Index Terms
- An Empirical Study of the Impact of Data Splitting Decisions on the Performance of AIOps Solutions
Recommendations
On the Model Update Strategies for Supervised Learning in AIOps Solutions
AIOps (Artificial Intelligence for IT Operations) solutions leverage the massive data produced during the operation of large-scale systems and machine learning models to assist software engineers in their system operations. As operation data produced in ...
Be careful of when: an empirical study on time-related misuse of issue tracking data
ESEC/FSE 2018: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringIssue tracking data have been used extensively to aid in predicting or recommending software development practices. Issue attributes typically change over time, but users may use data from a separate time of data collection rather than the time of their ...
Learning from concept drifting data streams with unlabeled data
Most existing work on classification of data streams assumes that all streaming data are labeled and the class labels are immediately available. However, in real-world applications, such as credit fraud and intrusion detection, this assumption is not ...
Comments