research-article

Less is More: Building Selective Anomaly Ensembles

Authors:
Shebuti Rayana

Department of Computer Science, Stony Brook University, Stony Brook, NY

Department of Computer Science, Stony Brook University, Stony Brook, NY
View Profile

,
Leman Akoglu

Department of Computer Science, Stony Brook University, Stony Brook, NY

Department of Computer Science, Stony Brook University, Stony Brook, NY
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 10 Issue 4Article No.: 42pp 1–33https://doi.org/10.1145/2890508

Published:24 May 2016Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

Ensemble learning for anomaly detection has been barely studied, due to difficulty in acquiring ground truth and the lack of inherent objective functions. In contrast, ensemble approaches for classification and clustering have been studied and effectively used for long. Our work taps into this gap and builds a new ensemble approach for anomaly detection, with application to event detection in temporal graphs as well as outlier detection in no-graph settings. It handles and combines multiple heterogeneous detectors to yield improved and robust performance. Importantly, trusting results from all the constituent detectors may deteriorate the overall performance of the ensemble, as some detectors could provide inaccurate results depending on the type of data in hand and the underlying assumptions of a detector. This suggests that combining the detectors selectively is key to building effective anomaly ensembles—hence “less is more”.

In this paper we propose a novel ensemble approach called SELECT for anomaly detection, which automatically and systematically selects the results from constituent detectors to combine in a fully unsupervised fashion. We apply our method to event detection in temporal graphs and outlier detection in multi-dimensional point data (no-graph), where SELECT successfully utilizes five base detectors and seven consensus methods under a unified ensemble framework. We provide extensive quantitative evaluation of our approach for event detection on five real-world datasets (four with ground truth events), including Enron email communications, RealityMining SMS and phone call records, New York Times news corpus, and World Cup 2014 Twitter news feed. We also provide results for outlier detection on seven real-world multi-dimensional point datasets from UCI Machine Learning Repository. Thanks to its selection mechanism, SELECT yields superior performance compared to the individual detectors alone, the full ensemble (naively combining all results), an existing diversity-based ensemble, and an existing weighted ensemble approach.

References

Charu C. Aggarwal. 2012. Outlier ensembles: Position paper. SIGKDD Explor. Newsl. 14, 2 (2012), 49--58. Google ScholarDigital Library
Charu C. Aggarwal and Saket Sathe. 2015. Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explorations Newsletter 17, 1 (2015), 24--47. Google ScholarDigital Library
Leman Akoglu and Christos Faloutsos. 2010. Event detection in time series of mobile communication graphs. In 27th Army Science.Google Scholar
Leman Akoglu, Hanghang Tong, and Danai Koutra. 2014. Graph-based anomaly detection and description: A survey. DAMI 28, 4 (2014). DOI:http://dx.doi.org/DOI 10.1007/s10618-014-0365-yGoogle Scholar
Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (1996), 123--140. Google ScholarDigital Library
Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying density-based local outliers. In SIGMOD. Google ScholarDigital Library
Gavin Brown, Jeremy L. Wyatt, Rachel Harris, and Xin Yao. 2005. Diversity creation methods: A survey and categorisation. Information Fusion 6, 1 (2005), 5--20.Google ScholarCross Ref
A. Cameron and P. K. Trivedi. 1998. Regression Analysis of Count Data (1st ed.). Cambridge Univ. Press.Google Scholar
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41, 3 (2009). Google ScholarDigital Library
Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In Multiple Classifier Systems (Lecture Notes in Computer Science), Vol. 1857. Springer, 1--15. Google ScholarDigital Library
Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2013. Data fusion: Resolving conflicts from multiple sources. In WAIM. Google ScholarDigital Library
Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. 2001. Rank aggregation methods for the web. In WWW. Google ScholarDigital Library
Nathan Eagle, Alex (Sandy) Pentland, and David Lazer. 2009. Inferring friendship network structure by using mobile phone data. PNAS (2009).Google Scholar
Xiaoli Zhang Fern and Carla E. Brodley. 2003. Random projection for high dimensional data clustering: A cluster ensemble approach. In ICML. AAAI Press, 186--193.Google Scholar
Xiaoli Z. Fern and Wei Lin. 2008. Cluster ensemble selection. In SDM (2008-06-11). SIAM, 787--797. Google ScholarDigital Library
Jing Gao, Wei Fan, Deepak Turaga, Olivier Verscheure, Xiaoqiao Meng, Lu Su, and Jiawei Han. 2011. Consensus extraction from heterogeneous detectors to improve performance over network traffic anomaly detection. In IEEE International Conference on Computer Communications Mini-Conference.Google ScholarCross Ref
Jun Gao, Weiming Hu, Zhongfei (Mark) Zhang, and Ou Wu. 2012. Unsupervised ensemble learning for mining top-n outliers. In PAKDD. Google ScholarDigital Library
Jing Gao and Pang-Ning Tan. 2006. Converting output scores from outlier detection algorithms to probability estimates. In ICDM. Google ScholarDigital Library
Joydeep Ghosh and Ayan Acharya. 2013. Cluster ensembles: Theory and applications. In Data Clustering: Alg. and Appl.Google Scholar
Manish Gupta, Jing Gao, Charu C. Aggarwal, and Jiawei Han. 2014. Outlier detection for temporal data: A survey. IEEE Trans. Knowl. Data Eng. 26, 9 (2014), 2250--2267. DOI:http://dx.doi.org/10.1109/ TKDE.2013.184Google ScholarCross Ref
Stefan Todorov Hadjitodorov, Ludmila I. Kuncheva, and Ludmila P. Todorova. 2006. Moderate diversity for better cluster ensembles. Information Fusion 7, 3 (2006), 264--275. Google ScholarDigital Library
Lars Kai Hansen and Peter Salamon. 1990. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12, 10 (1990).Google ScholarDigital Library
Douglas M. Hawkins. 1980. Identification of Outliers. Vol. 11. Springer.Google Scholar
Jennifer A. Hoeting, David Madigan, Adrian E. Raftery, and Chris T. Volinsky. 1999. Bayesian model averaging: A tutorial. Statist. Sci. 14, 4 (1999), 382--417.Google ScholarCross Ref
A. Juditsky, P. Rigollet, and A. B. Tsybakov. 2008. Learning by mirror averaging. Annals of Stat. 36, 5 (2008), 2183--2206. DOI:http://dx.doi.org/10.1214/07-AOS546Google ScholarCross Ref
John Kemeny. 1959. Mathematics without numbers. In Daedalus. 577--591.Google Scholar
Alexandre Klementiev, Dan Roth, and Kevin Small. 2007. An unsupervised learning algorithm for rank aggregation. In ECML. Google ScholarDigital Library
Edwin M. Knorr and Raymond T. Ng. 1997. A unified notion of outliers: Properties and computation. In KDD. 219--222.Google Scholar
Raivo Kolde, Sven Laur, Priit Adler, and Jaak Vilo. 2012. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28, 4 (2012), 573--580. Google ScholarDigital Library
Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek. 2009. LoOP: Local outlier probabilities. In CIKM. ACM, 1649--1652. Google ScholarDigital Library
Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek. 2011. Interpreting and unifying outlier scores. In SDM. 13--24.Google Scholar
L. Kuncheva and C. Whitaker. 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51 (2003), 181--207. Google ScholarDigital Library
Anukool Lakhina, Mark Crovella, and Christophe Diot. 2004. Diagnosing network-wide traffic anomalies. In SIGCOMM. 219--230. Google ScholarDigital Library
Diane Lambert. 1992. Zero-inflated Poisson regression with an application to defects in manufacturing. In Technometrics. 1--14. Google ScholarDigital Library
Aleksandar Lazarevic and Vipin Kumar. 2005. Feature bagging for outlier detection. In KDD. ACM, 157--166. Google ScholarDigital Library
Barbora Micenkovä, Brian McWilliams, and Ira Assent. 2014. Learing outlier ensembles: The best of both worlds - supervised and unsupervised. In KDD ODD² Workshop.Google Scholar
Kristine Monteith, James L. Carroll, Kevin D. Seppi, and Tony R. Martinez. 2011. Turning Bayesian model averaging into Bayesian model combination. In IJCNN. IEEE, 2657--2663.Google Scholar
Spiros Papadimitriou, Hiroyuki Kitagawa, Phillip B. Gibbons, and Christos Faloutsos. 2003. LOCI: Fast outlier detection using the local correlation integral. In ICDE. 315--326.Google Scholar
Spiros Papadimitriou, Jimeng Sun, and Christos Faloutsos. 2005. Streaming pattern discovery in multiple time-series. In VLDB. Google ScholarDigital Library
Oskar Perron. 1907. Zur theorie der matrices. In Mathematische Annalen.Google Scholar
Michael D. Porter and Gentry White. 2012. Self-exciting hurdle models for terrorist actovity. In The Annals of Applied Statistics. 106--124.Google Scholar
Christine Preisach and Lars Schmidt-Thieme. 2007. Ensembles of relational classifiers. Knowl. Info. Sys. 14 (2007), 249--272. Google ScholarDigital Library
Shebuti Rayana and Leman Akoglu. 2014. An ensemble approach for event detection in dynamic graphs. In KDD ODD² Workshop.Google Scholar
Lior Rokach. 2010. Ensemble-based classifiers. Artif. Intell. Rev. 33, 1--2 (2010), 1--39. Google ScholarDigital Library
Robert E. Schapire. 1990. The strength of weak learnability. In Machine Learning. 197--227. Google ScholarDigital Library
Erich Schubert, Remigius Wojdanowski, Arthur Zimek, and Hans-Peter Kriegel. 2012. On evaluation of outlier rankings and outlier scores. In SDM. 1047--1058.Google Scholar
Alexander P. Topchy, Anil K. Jain, and William F. Punch. 2005. Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27, 12 (2005), 1866--1881. Google ScholarDigital Library
Alexandre B. Tsybakov. 2004. Optimal aggregation of classifiers in statistical learning. Annals of Statistics 32 (2004), 135--166.Google ScholarCross Ref
Giorgio Valentini and Francesco Masulli. 2002. Ensembles of learning machines. In WIRN. Google ScholarDigital Library
Brijesh Verma and Ashfaqur Rahman. 2012. Cluster-oriented ensemble classifier: Impact of multicluster characterization on ensemble classifier learning. IEEE Trans. Knowl. Data Eng. 24, 4 (2012), 605--618. Google ScholarDigital Library
Nguyen Hoang Vu, Hock Hee Ang, and Vivekanand Gopalkrishnan. 2010. Mining outliers with ensemble of heterogeneous detectors on random subspaces. In DASFAA, Vol. 5981. 368--383. Google ScholarDigital Library
Quang H. Vuong. 1989. Likelihood ratio tests for model selection and non-nested hypotheses. In Econometrica.Google Scholar
D. H. Wolpert. 1992. Stacked generalization. Neural Networks 5 (1992), 214--259. Google ScholarDigital Library
Faisal Zaman and Hideo Hirose. 2011. Classification performance of bagging and boosting type ensemble methods with small training sets. New Gen. Comput. 29, 3 (2011), 277--292.Google ScholarCross Ref
Ke Zhang, Marcus Hutter, and Huidong Jin. 2009. A new local distance-based outlier detection approach for scattered real-world data. In PAKDD. Springer, 813--822. Google ScholarDigital Library
Arthur Zimek, Ricardo J. G. B. Campello, and Jörg Sander. 2013a. Ensembles for unsupervised outlier detection: Challenges and research questions. SIGKDD Explor. Newsl. 15, 1 (2013), 11--22. DOI:http://dx.doi.org/10.1145/2594473.2594476 Google ScholarDigital Library
Arthur Zimek, Matthew Gaudet, Ricardo J. G. B. Campello, and Jörg Sander. 2013b. Subsampling for efficient and effective unsupervised outlier detection ensembles. In KDD. ACM, 428--436. Google ScholarDigital Library

Index Terms

Less is More: Building Selective Anomaly Ensembles
1. Information systems
  1. Information systems applications
    1. Data mining

Index terms have been assigned to the content through auto-classification.

Recommendations

Improved classification with allocation method and multiple classifiers

We propose a new allocation method for building a classification ensemble.Allocation method uses multiple classifiers: the allocator and micro classifiers.Allocator separates the dataset and allocates them to one of micro classifiers.Allocator is based ...
Read More
Performance Evaluation of Ensemble Methods For Software Fault Prediction: An Experiment
ASWEC ' 15 Vol. II: Proceedings of the ASWEC 2015 24th Australasian Software Engineering Conference

In object-oriented software development, a plethora of studies have been carried out to present the application of machine learning algorithms for fault prediction. Furthermore, it has been empirically validated that an ensemble method can improve ...
Read More
Switching class labels to generate classification ensembles

Ensembles that combine the decisions of classifiers generated by using perturbed versions of the training set where the classes of the training examples are randomly switched can produce a significant error reduction, provided that large numbers of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 10, Issue 4
Special Issue on SIGKDD 2014, Special Issue on BIGCHAT and Regular Papers
July 2016
417 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/2936311
Editor:
Philip S. Yu
University of Illinois at Chicago, USA
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 May 2016
- Accepted: 1 February 2016
- Revised: 1 January 2016
- Received: 1 April 2015
Published in tkdd Volume 10, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Ensemble methods
anomaly ensembles
anomaly mining
dynamic graphs
event detection
rank aggregation
unsupervised learning
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 87
  Total Citations
  View Citations
- 875
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Less is More: Building Selective Anomaly Ensembles

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Improved classification with allocation method and multiple classifiers

Performance Evaluation of Ensemble Methods For Software Fault Prediction: An Experiment

Switching class labels to generate classification ensembles