ABSTRACT
Anomaly detectors are often used to produce a ranked list of statistical anomalies, which are examined by human analysts in order to extract the actual anomalies of interest. This can be exceedingly difficult and time consuming when most high-ranking anomalies are false positives and not interesting from an application perspective. In this paper, we study how to reduce the analyst's effort by incorporating their feedback about whether the anomalies they investigate are of interest or not. In particular, the feedback will be used to adjust the anomaly ranking after every analyst interaction, ideally moving anomalies of interest closer to the top. Our main contribution is to formulate this problem within the framework of online convex optimization, which yields an efficient and extremely simple approach to incorporating feedback compared to the prior state-of-the-art. We instantiate this approach for the powerful class of tree-based anomaly detectors and conduct experiments on a range of benchmark datasets. The results demonstrate the utility of incorporating feedback and advantages of our approach over the state-of-the-art. In addition, we present results on a significant cybersecurity application where the goal is to detect red-team attacks in real system audit data. We show that our approach for incorporating feedback is able to significantly reduce the time required to identify malicious system entities across multiple attacks on multiple operating systems.
Supplemental Material
- Shubhomoy Das, Weng-Keen Wong, Thomas G. Dietterich, Alan Fern, and Andrew Emmott. 2016. Incorporating Expert Feedback into Active Anomaly Discovery. In Proceedings of the IEEE ICDM. 853--858.Google ScholarCross Ref
- Shubhomoy Das,Weng-KeenWong, Alan Fern, Thomas G Dietterich, and Md Amran Siddiqui. 2017. Incorporating Feedback into Tree-based Anomaly Detection. arXiv preprint arXiv:1708.09441 (2017).Google Scholar
- Boxiang Dong, Zhengzhang Chen, Hui Wendy Wang, Lu-An Tang, Kai Zhang, Ying Lin, Zhichun Li, and Haifeng Chen. 2017. Efficient Discovery of Abnormal Event Sequences in Enterprise Security Systems. In The ACMInternational Conference on Information and Knowledge Management (CIKM). Pan Pacific, Singapore. Google ScholarDigital Library
- Andrew Emmott, Shubhomoy Das, Thomas G. Dietterich, Alan Fern, and Weng- Keen Wong. 2015. Systematic Construction of Anomaly Detection Benchmarks from Real Data. CoRR abs/1503.01158 (2015). http://arxiv.org/abs/1503.01158Google Scholar
- Andrew F Emmott, Shubhomoy Das, Thomas Dietterich, Alan Fern, and Weng- Keen Wong. 2013. Systematic construction of anomaly detection benchmarks from real data. In Proceedings of the ACM SIGKDD workshop on outlier detection and description. ACM, 16--21. Google ScholarDigital Library
- Stephanie Forrest, Steven A Hofmeyr, Anil Somayaji, and Thomas A Longstaff. 1996. A sense of self for unix processes. In Security and Privacy, 1996. Proceedings., 1996 IEEE Symposium on. IEEE, 120--128. Google ScholarDigital Library
- Debin Gao, Michael K Reiter, and Dawn Song. 2004. Gray-box extraction of execution graphs for anomaly detection. In Proc. of the 11th ACM conference on Computer and communications security. 318--329. Google ScholarDigital Library
- Nico Görnitz, Marius Micha Kloft, Konrad Rieck, and Ulf Brefeld. 2013. Toward supervised anomaly detection. Journal of Artificial Intelligence Research (2013). Google ScholarDigital Library
- Martin Grill and Tomá?y. 2016. Learning combination of anomaly detectors for security domain. Computer Networks 107 (2016), 55--63. Google ScholarDigital Library
- Sudipto Guha, Nina Mishra, Gourav Roy, and Okke Schrijvers. 2016. Robust Random Cut Forest Based Anomaly Detection On Streams. In Proceedings of The 33rd International Conference on Machine Learning, Vol. 48. Google ScholarDigital Library
- Shali Jiang, Gustavo Malkomes, Geoff Converse, Alyssa Shofner, Benjamin Moseley, and Roman Garnett. 2017. Efficient Nonmyopic Active Search. In International Conference on Machine Learning. 1714--1723.Google Scholar
- F Korč and W Förstner. 2008. Approximate parameter learning in conditional random fields: An empirical investigation. In Joint Pattern Recog. Symp. Springer. Google ScholarDigital Library
- Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. IEEE, 413--422.Google ScholarDigital Library
- Dan Pelleg and Andrew W Moore. 2005. Active learning for anomaly and rarecategory detection. In Advances in neural information processing systems. Google ScholarDigital Library
- Tomáš Pevny. 2016. Loda: Lightweight On-line Detector of Anomalies. Mach. Learn. 102, 2 (Feb. 2016), 275--304. Google ScholarDigital Library
- Maxim Raginsky, Rebecca M Willett, Corinne Horn, Jorge Silva, and Roummel F Marcia. 2012. Sequential anomaly detection in the presence of noise and limited feedback. IEEE Transactions on Information Theory 58, 8 (2012), 5544--5562. Google ScholarDigital Library
- R Sekar, Mugdha Bendre, Dinakar Dhurjati, and Pradeep Bollineni. 2001. A fast automaton-based method for detecting anomalous program behaviors. In Security and Privacy, 2001. S&EP 2001. Proceedings. 2001 IEEE Symposium on. IEEE, 144--155. Google ScholarDigital Library
- Burr Settles. 2012. Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6, 1 (2012), 1--114. Google ScholarDigital Library
- Shai Shalev-Shwartz et al. 2012. Online learning and online convex optimization. Foundations and Trends® in Machine Learning 4, 2 (2012), 107--194. Google ScholarDigital Library
- Xiaokui Shu, Danfeng Yao, and Naren Ramakrishnan. 2015. Unearthing stealthy program attacks buried in extremely long execution paths. In Proc. of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 401--413. Google ScholarDigital Library
- Md Amran Siddiqui, Alan Fern, Thomas Dietterich, and Shubhomoy Das. 2016. Finite Sample Complexity of Rare Pattern Anomaly Detection. In Conference on Uncertainty in Artificial Intelligence (UAI). Google ScholarDigital Library
- Swee Chuan Tan, Kai Ming Ting, and Tony Fei Liu. 2011. Fast Anomaly Detection for Streaming Data. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence - Volume Two. 1511--1516. Google ScholarDigital Library
- Jagannadan Varadarajan, Ramanathan Subramanian, Narendra Ahuja, Pierre Moulin, and Jean-Marc Odobez. 2017. Active Online Anomaly Detection using Dirichlet Process Mixture Model and Gaussian Process Classification. In Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on. 615--623.Google ScholarCross Ref
- Kalyan Veeramachaneni, Ignacio Arnaldo, Vamsi Korrapati, Constantinos Bassias, and Ke Li. 2016. AI 2: training a big data machine to defend. In Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), 2016 IEEE 2nd International Conference on. IEEE, 49--54.Google ScholarCross Ref
- KeWu, Kun Zhang,Wei Fan, Andrea Edwards, and S Yu Philip. 2014. Rs-forest: A Rapid Density Estimator for Streaming Anomaly Detection. In ICDM, 2014 IEEE International Conference on. 600--609. Google ScholarDigital Library
Index Terms
- Feedback-Guided Anomaly Discovery via Online Optimization
Recommendations
Two-stage anomaly detection algorithm via dynamic community evolution in temporal graph
AbstractDetecting anomalies from a massive amount of user behavioral data is often liken to finding a needle in a haystack. While tremendous efforts have been devoted to anomaly detection from temporal graphs, existing studies rarely consider community ...
Robust Anomaly Detection and Localization via Simulated Anomalies
VRCAI '22: Proceedings of the 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in IndustryAnomaly detection refers to identifying abnormal images and localizing anomalous regions. Reconstruction-based anomaly detection is a commonly used method; however, traditional reconstruction-based methods perform poorly as deep models generalize ...
Human-machine interactive streaming anomaly detection by online self-adaptive forest
AbstractAnomaly detectors are used to distinguish differences between normal and abnormal data, which are usually implemented by evaluating and ranking the anomaly scores of each instance. A static unsupervised streaming anomaly detector is difficult to ...
Comments