skip to main content
research-article

Less is More: Building Selective Anomaly Ensembles

Authors Info & Claims
Published:24 May 2016Publication History
Skip Abstract Section

Abstract

Ensemble learning for anomaly detection has been barely studied, due to difficulty in acquiring ground truth and the lack of inherent objective functions. In contrast, ensemble approaches for classification and clustering have been studied and effectively used for long. Our work taps into this gap and builds a new ensemble approach for anomaly detection, with application to event detection in temporal graphs as well as outlier detection in no-graph settings. It handles and combines multiple heterogeneous detectors to yield improved and robust performance. Importantly, trusting results from all the constituent detectors may deteriorate the overall performance of the ensemble, as some detectors could provide inaccurate results depending on the type of data in hand and the underlying assumptions of a detector. This suggests that combining the detectors selectively is key to building effective anomaly ensembles—hence “less is more”.

In this paper we propose a novel ensemble approach called SELECT for anomaly detection, which automatically and systematically selects the results from constituent detectors to combine in a fully unsupervised fashion. We apply our method to event detection in temporal graphs and outlier detection in multi-dimensional point data (no-graph), where SELECT successfully utilizes five base detectors and seven consensus methods under a unified ensemble framework. We provide extensive quantitative evaluation of our approach for event detection on five real-world datasets (four with ground truth events), including Enron email communications, RealityMining SMS and phone call records, New York Times news corpus, and World Cup 2014 Twitter news feed. We also provide results for outlier detection on seven real-world multi-dimensional point datasets from UCI Machine Learning Repository. Thanks to its selection mechanism, SELECT yields superior performance compared to the individual detectors alone, the full ensemble (naively combining all results), an existing diversity-based ensemble, and an existing weighted ensemble approach.

References

  1. Charu C. Aggarwal. 2012. Outlier ensembles: Position paper. SIGKDD Explor. Newsl. 14, 2 (2012), 49--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Charu C. Aggarwal and Saket Sathe. 2015. Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explorations Newsletter 17, 1 (2015), 24--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Leman Akoglu and Christos Faloutsos. 2010. Event detection in time series of mobile communication graphs. In 27th Army Science.Google ScholarGoogle Scholar
  4. Leman Akoglu, Hanghang Tong, and Danai Koutra. 2014. Graph-based anomaly detection and description: A survey. DAMI 28, 4 (2014). DOI:http://dx.doi.org/DOI 10.1007/s10618-014-0365-yGoogle ScholarGoogle Scholar
  5. Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (1996), 123--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying density-based local outliers. In SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Gavin Brown, Jeremy L. Wyatt, Rachel Harris, and Xin Yao. 2005. Diversity creation methods: A survey and categorisation. Information Fusion 6, 1 (2005), 5--20.Google ScholarGoogle ScholarCross RefCross Ref
  8. A. Cameron and P. K. Trivedi. 1998. Regression Analysis of Count Data (1st ed.). Cambridge Univ. Press.Google ScholarGoogle Scholar
  9. Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41, 3 (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In Multiple Classifier Systems (Lecture Notes in Computer Science), Vol. 1857. Springer, 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2013. Data fusion: Resolving conflicts from multiple sources. In WAIM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. 2001. Rank aggregation methods for the web. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Nathan Eagle, Alex (Sandy) Pentland, and David Lazer. 2009. Inferring friendship network structure by using mobile phone data. PNAS (2009).Google ScholarGoogle Scholar
  14. Xiaoli Zhang Fern and Carla E. Brodley. 2003. Random projection for high dimensional data clustering: A cluster ensemble approach. In ICML. AAAI Press, 186--193.Google ScholarGoogle Scholar
  15. Xiaoli Z. Fern and Wei Lin. 2008. Cluster ensemble selection. In SDM (2008-06-11). SIAM, 787--797. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jing Gao, Wei Fan, Deepak Turaga, Olivier Verscheure, Xiaoqiao Meng, Lu Su, and Jiawei Han. 2011. Consensus extraction from heterogeneous detectors to improve performance over network traffic anomaly detection. In IEEE International Conference on Computer Communications Mini-Conference.Google ScholarGoogle ScholarCross RefCross Ref
  17. Jun Gao, Weiming Hu, Zhongfei (Mark) Zhang, and Ou Wu. 2012. Unsupervised ensemble learning for mining top-n outliers. In PAKDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jing Gao and Pang-Ning Tan. 2006. Converting output scores from outlier detection algorithms to probability estimates. In ICDM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Joydeep Ghosh and Ayan Acharya. 2013. Cluster ensembles: Theory and applications. In Data Clustering: Alg. and Appl.Google ScholarGoogle Scholar
  20. Manish Gupta, Jing Gao, Charu C. Aggarwal, and Jiawei Han. 2014. Outlier detection for temporal data: A survey. IEEE Trans. Knowl. Data Eng. 26, 9 (2014), 2250--2267. DOI:http://dx.doi.org/10.1109/ TKDE.2013.184Google ScholarGoogle ScholarCross RefCross Ref
  21. Stefan Todorov Hadjitodorov, Ludmila I. Kuncheva, and Ludmila P. Todorova. 2006. Moderate diversity for better cluster ensembles. Information Fusion 7, 3 (2006), 264--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lars Kai Hansen and Peter Salamon. 1990. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12, 10 (1990).Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Douglas M. Hawkins. 1980. Identification of Outliers. Vol. 11. Springer.Google ScholarGoogle Scholar
  24. Jennifer A. Hoeting, David Madigan, Adrian E. Raftery, and Chris T. Volinsky. 1999. Bayesian model averaging: A tutorial. Statist. Sci. 14, 4 (1999), 382--417.Google ScholarGoogle ScholarCross RefCross Ref
  25. A. Juditsky, P. Rigollet, and A. B. Tsybakov. 2008. Learning by mirror averaging. Annals of Stat. 36, 5 (2008), 2183--2206. DOI:http://dx.doi.org/10.1214/07-AOS546Google ScholarGoogle ScholarCross RefCross Ref
  26. John Kemeny. 1959. Mathematics without numbers. In Daedalus. 577--591.Google ScholarGoogle Scholar
  27. Alexandre Klementiev, Dan Roth, and Kevin Small. 2007. An unsupervised learning algorithm for rank aggregation. In ECML. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Edwin M. Knorr and Raymond T. Ng. 1997. A unified notion of outliers: Properties and computation. In KDD. 219--222.Google ScholarGoogle Scholar
  29. Raivo Kolde, Sven Laur, Priit Adler, and Jaak Vilo. 2012. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28, 4 (2012), 573--580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek. 2009. LoOP: Local outlier probabilities. In CIKM. ACM, 1649--1652. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek. 2011. Interpreting and unifying outlier scores. In SDM. 13--24.Google ScholarGoogle Scholar
  32. L. Kuncheva and C. Whitaker. 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51 (2003), 181--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Anukool Lakhina, Mark Crovella, and Christophe Diot. 2004. Diagnosing network-wide traffic anomalies. In SIGCOMM. 219--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Diane Lambert. 1992. Zero-inflated Poisson regression with an application to defects in manufacturing. In Technometrics. 1--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Aleksandar Lazarevic and Vipin Kumar. 2005. Feature bagging for outlier detection. In KDD. ACM, 157--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Barbora Micenkovä, Brian McWilliams, and Ira Assent. 2014. Learing outlier ensembles: The best of both worlds - supervised and unsupervised. In KDD ODD2 Workshop.Google ScholarGoogle Scholar
  37. Kristine Monteith, James L. Carroll, Kevin D. Seppi, and Tony R. Martinez. 2011. Turning Bayesian model averaging into Bayesian model combination. In IJCNN. IEEE, 2657--2663.Google ScholarGoogle Scholar
  38. Spiros Papadimitriou, Hiroyuki Kitagawa, Phillip B. Gibbons, and Christos Faloutsos. 2003. LOCI: Fast outlier detection using the local correlation integral. In ICDE. 315--326.Google ScholarGoogle Scholar
  39. Spiros Papadimitriou, Jimeng Sun, and Christos Faloutsos. 2005. Streaming pattern discovery in multiple time-series. In VLDB. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Oskar Perron. 1907. Zur theorie der matrices. In Mathematische Annalen.Google ScholarGoogle Scholar
  41. Michael D. Porter and Gentry White. 2012. Self-exciting hurdle models for terrorist actovity. In The Annals of Applied Statistics. 106--124.Google ScholarGoogle Scholar
  42. Christine Preisach and Lars Schmidt-Thieme. 2007. Ensembles of relational classifiers. Knowl. Info. Sys. 14 (2007), 249--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Shebuti Rayana and Leman Akoglu. 2014. An ensemble approach for event detection in dynamic graphs. In KDD ODD2 Workshop.Google ScholarGoogle Scholar
  44. Lior Rokach. 2010. Ensemble-based classifiers. Artif. Intell. Rev. 33, 1--2 (2010), 1--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Robert E. Schapire. 1990. The strength of weak learnability. In Machine Learning. 197--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Erich Schubert, Remigius Wojdanowski, Arthur Zimek, and Hans-Peter Kriegel. 2012. On evaluation of outlier rankings and outlier scores. In SDM. 1047--1058.Google ScholarGoogle Scholar
  47. Alexander P. Topchy, Anil K. Jain, and William F. Punch. 2005. Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27, 12 (2005), 1866--1881. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Alexandre B. Tsybakov. 2004. Optimal aggregation of classifiers in statistical learning. Annals of Statistics 32 (2004), 135--166.Google ScholarGoogle ScholarCross RefCross Ref
  49. Giorgio Valentini and Francesco Masulli. 2002. Ensembles of learning machines. In WIRN. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Brijesh Verma and Ashfaqur Rahman. 2012. Cluster-oriented ensemble classifier: Impact of multicluster characterization on ensemble classifier learning. IEEE Trans. Knowl. Data Eng. 24, 4 (2012), 605--618. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Nguyen Hoang Vu, Hock Hee Ang, and Vivekanand Gopalkrishnan. 2010. Mining outliers with ensemble of heterogeneous detectors on random subspaces. In DASFAA, Vol. 5981. 368--383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Quang H. Vuong. 1989. Likelihood ratio tests for model selection and non-nested hypotheses. In Econometrica.Google ScholarGoogle Scholar
  53. D. H. Wolpert. 1992. Stacked generalization. Neural Networks 5 (1992), 214--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Faisal Zaman and Hideo Hirose. 2011. Classification performance of bagging and boosting type ensemble methods with small training sets. New Gen. Comput. 29, 3 (2011), 277--292.Google ScholarGoogle ScholarCross RefCross Ref
  55. Ke Zhang, Marcus Hutter, and Huidong Jin. 2009. A new local distance-based outlier detection approach for scattered real-world data. In PAKDD. Springer, 813--822. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Arthur Zimek, Ricardo J. G. B. Campello, and Jörg Sander. 2013a. Ensembles for unsupervised outlier detection: Challenges and research questions. SIGKDD Explor. Newsl. 15, 1 (2013), 11--22. DOI:http://dx.doi.org/10.1145/2594473.2594476 Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Arthur Zimek, Matthew Gaudet, Ricardo J. G. B. Campello, and Jörg Sander. 2013b. Subsampling for efficient and effective unsupervised outlier detection ensembles. In KDD. ACM, 428--436. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Less is More: Building Selective Anomaly Ensembles
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 10, Issue 4
      Special Issue on SIGKDD 2014, Special Issue on BIGCHAT and Regular Papers
      July 2016
      417 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/2936311
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 May 2016
      • Accepted: 1 February 2016
      • Revised: 1 January 2016
      • Received: 1 April 2015
      Published in tkdd Volume 10, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader