skip to main content
10.1145/3219819.3220083acmotherconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Feedback-Guided Anomaly Discovery via Online Optimization

Published:19 July 2018Publication History

ABSTRACT

Anomaly detectors are often used to produce a ranked list of statistical anomalies, which are examined by human analysts in order to extract the actual anomalies of interest. This can be exceedingly difficult and time consuming when most high-ranking anomalies are false positives and not interesting from an application perspective. In this paper, we study how to reduce the analyst's effort by incorporating their feedback about whether the anomalies they investigate are of interest or not. In particular, the feedback will be used to adjust the anomaly ranking after every analyst interaction, ideally moving anomalies of interest closer to the top. Our main contribution is to formulate this problem within the framework of online convex optimization, which yields an efficient and extremely simple approach to incorporating feedback compared to the prior state-of-the-art. We instantiate this approach for the powerful class of tree-based anomaly detectors and conduct experiments on a range of benchmark datasets. The results demonstrate the utility of incorporating feedback and advantages of our approach over the state-of-the-art. In addition, we present results on a significant cybersecurity application where the goal is to detect red-team attacks in real system audit data. We show that our approach for incorporating feedback is able to significantly reduce the time required to identify malicious system entities across multiple attacks on multiple operating systems.

Skip Supplemental Material Section

Supplemental Material

siddiqui_feedback_discovery.mp4

mp4

380.3 MB

References

  1. Shubhomoy Das, Weng-Keen Wong, Thomas G. Dietterich, Alan Fern, and Andrew Emmott. 2016. Incorporating Expert Feedback into Active Anomaly Discovery. In Proceedings of the IEEE ICDM. 853--858.Google ScholarGoogle ScholarCross RefCross Ref
  2. Shubhomoy Das,Weng-KeenWong, Alan Fern, Thomas G Dietterich, and Md Amran Siddiqui. 2017. Incorporating Feedback into Tree-based Anomaly Detection. arXiv preprint arXiv:1708.09441 (2017).Google ScholarGoogle Scholar
  3. Boxiang Dong, Zhengzhang Chen, Hui Wendy Wang, Lu-An Tang, Kai Zhang, Ying Lin, Zhichun Li, and Haifeng Chen. 2017. Efficient Discovery of Abnormal Event Sequences in Enterprise Security Systems. In The ACMInternational Conference on Information and Knowledge Management (CIKM). Pan Pacific, Singapore. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Andrew Emmott, Shubhomoy Das, Thomas G. Dietterich, Alan Fern, and Weng- Keen Wong. 2015. Systematic Construction of Anomaly Detection Benchmarks from Real Data. CoRR abs/1503.01158 (2015). http://arxiv.org/abs/1503.01158Google ScholarGoogle Scholar
  5. Andrew F Emmott, Shubhomoy Das, Thomas Dietterich, Alan Fern, and Weng- Keen Wong. 2013. Systematic construction of anomaly detection benchmarks from real data. In Proceedings of the ACM SIGKDD workshop on outlier detection and description. ACM, 16--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Stephanie Forrest, Steven A Hofmeyr, Anil Somayaji, and Thomas A Longstaff. 1996. A sense of self for unix processes. In Security and Privacy, 1996. Proceedings., 1996 IEEE Symposium on. IEEE, 120--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Debin Gao, Michael K Reiter, and Dawn Song. 2004. Gray-box extraction of execution graphs for anomaly detection. In Proc. of the 11th ACM conference on Computer and communications security. 318--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Nico Görnitz, Marius Micha Kloft, Konrad Rieck, and Ulf Brefeld. 2013. Toward supervised anomaly detection. Journal of Artificial Intelligence Research (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Martin Grill and Tomá?y. 2016. Learning combination of anomaly detectors for security domain. Computer Networks 107 (2016), 55--63. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Sudipto Guha, Nina Mishra, Gourav Roy, and Okke Schrijvers. 2016. Robust Random Cut Forest Based Anomaly Detection On Streams. In Proceedings of The 33rd International Conference on Machine Learning, Vol. 48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Shali Jiang, Gustavo Malkomes, Geoff Converse, Alyssa Shofner, Benjamin Moseley, and Roman Garnett. 2017. Efficient Nonmyopic Active Search. In International Conference on Machine Learning. 1714--1723.Google ScholarGoogle Scholar
  12. F Korč and W Förstner. 2008. Approximate parameter learning in conditional random fields: An empirical investigation. In Joint Pattern Recog. Symp. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. IEEE, 413--422.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Dan Pelleg and Andrew W Moore. 2005. Active learning for anomaly and rarecategory detection. In Advances in neural information processing systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tomáš Pevny. 2016. Loda: Lightweight On-line Detector of Anomalies. Mach. Learn. 102, 2 (Feb. 2016), 275--304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Maxim Raginsky, Rebecca M Willett, Corinne Horn, Jorge Silva, and Roummel F Marcia. 2012. Sequential anomaly detection in the presence of noise and limited feedback. IEEE Transactions on Information Theory 58, 8 (2012), 5544--5562. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R Sekar, Mugdha Bendre, Dinakar Dhurjati, and Pradeep Bollineni. 2001. A fast automaton-based method for detecting anomalous program behaviors. In Security and Privacy, 2001. S&EP 2001. Proceedings. 2001 IEEE Symposium on. IEEE, 144--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Burr Settles. 2012. Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 6, 1 (2012), 1--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Shai Shalev-Shwartz et al. 2012. Online learning and online convex optimization. Foundations and Trends® in Machine Learning 4, 2 (2012), 107--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Xiaokui Shu, Danfeng Yao, and Naren Ramakrishnan. 2015. Unearthing stealthy program attacks buried in extremely long execution paths. In Proc. of the 22nd ACM SIGSAC Conference on Computer and Communications Security. 401--413. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Md Amran Siddiqui, Alan Fern, Thomas Dietterich, and Shubhomoy Das. 2016. Finite Sample Complexity of Rare Pattern Anomaly Detection. In Conference on Uncertainty in Artificial Intelligence (UAI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Swee Chuan Tan, Kai Ming Ting, and Tony Fei Liu. 2011. Fast Anomaly Detection for Streaming Data. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence - Volume Two. 1511--1516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jagannadan Varadarajan, Ramanathan Subramanian, Narendra Ahuja, Pierre Moulin, and Jean-Marc Odobez. 2017. Active Online Anomaly Detection using Dirichlet Process Mixture Model and Gaussian Process Classification. In Applications of Computer Vision (WACV), 2017 IEEE Winter Conference on. 615--623.Google ScholarGoogle ScholarCross RefCross Ref
  24. Kalyan Veeramachaneni, Ignacio Arnaldo, Vamsi Korrapati, Constantinos Bassias, and Ke Li. 2016. AI 2: training a big data machine to defend. In Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), 2016 IEEE 2nd International Conference on. IEEE, 49--54.Google ScholarGoogle ScholarCross RefCross Ref
  25. KeWu, Kun Zhang,Wei Fan, Andrea Edwards, and S Yu Philip. 2014. Rs-forest: A Rapid Density Estimator for Streaming Anomaly Detection. In ICDM, 2014 IEEE International Conference on. 600--609. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Feedback-Guided Anomaly Discovery via Online Optimization

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
        July 2018
        2925 pages
        ISBN:9781450355520
        DOI:10.1145/3219819

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 July 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        KDD '18 Paper Acceptance Rate107of983submissions,11%Overall Acceptance Rate1,133of8,635submissions,13%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader