Abstract
Ensemble learning for anomaly detection has been barely studied, due to difficulty in acquiring ground truth and the lack of inherent objective functions. In contrast, ensemble approaches for classification and clustering have been studied and effectively used for long. Our work taps into this gap and builds a new ensemble approach for anomaly detection, with application to event detection in temporal graphs as well as outlier detection in no-graph settings. It handles and combines multiple heterogeneous detectors to yield improved and robust performance. Importantly, trusting results from all the constituent detectors may deteriorate the overall performance of the ensemble, as some detectors could provide inaccurate results depending on the type of data in hand and the underlying assumptions of a detector. This suggests that combining the detectors selectively is key to building effective anomaly ensembles—hence “less is more”.
In this paper we propose a novel ensemble approach called SELECT for anomaly detection, which automatically and systematically selects the results from constituent detectors to combine in a fully unsupervised fashion. We apply our method to event detection in temporal graphs and outlier detection in multi-dimensional point data (no-graph), where SELECT successfully utilizes five base detectors and seven consensus methods under a unified ensemble framework. We provide extensive quantitative evaluation of our approach for event detection on five real-world datasets (four with ground truth events), including Enron email communications, RealityMining SMS and phone call records, New York Times news corpus, and World Cup 2014 Twitter news feed. We also provide results for outlier detection on seven real-world multi-dimensional point datasets from UCI Machine Learning Repository. Thanks to its selection mechanism, SELECT yields superior performance compared to the individual detectors alone, the full ensemble (naively combining all results), an existing diversity-based ensemble, and an existing weighted ensemble approach.
- Charu C. Aggarwal. 2012. Outlier ensembles: Position paper. SIGKDD Explor. Newsl. 14, 2 (2012), 49--58. Google ScholarDigital Library
- Charu C. Aggarwal and Saket Sathe. 2015. Theoretical foundations and algorithms for outlier ensembles. ACM SIGKDD Explorations Newsletter 17, 1 (2015), 24--47. Google ScholarDigital Library
- Leman Akoglu and Christos Faloutsos. 2010. Event detection in time series of mobile communication graphs. In 27th Army Science.Google Scholar
- Leman Akoglu, Hanghang Tong, and Danai Koutra. 2014. Graph-based anomaly detection and description: A survey. DAMI 28, 4 (2014). DOI:http://dx.doi.org/DOI 10.1007/s10618-014-0365-yGoogle Scholar
- Leo Breiman. 1996. Bagging predictors. Machine Learning 24, 2 (1996), 123--140. Google ScholarDigital Library
- Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying density-based local outliers. In SIGMOD. Google ScholarDigital Library
- Gavin Brown, Jeremy L. Wyatt, Rachel Harris, and Xin Yao. 2005. Diversity creation methods: A survey and categorisation. Information Fusion 6, 1 (2005), 5--20.Google ScholarCross Ref
- A. Cameron and P. K. Trivedi. 1998. Regression Analysis of Count Data (1st ed.). Cambridge Univ. Press.Google Scholar
- Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM Comput. Surv. 41, 3 (2009). Google ScholarDigital Library
- Thomas G. Dietterich. 2000. Ensemble methods in machine learning. In Multiple Classifier Systems (Lecture Notes in Computer Science), Vol. 1857. Springer, 1--15. Google ScholarDigital Library
- Xin Luna Dong, Laure Berti-Equille, and Divesh Srivastava. 2013. Data fusion: Resolving conflicts from multiple sources. In WAIM. Google ScholarDigital Library
- Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. 2001. Rank aggregation methods for the web. In WWW. Google ScholarDigital Library
- Nathan Eagle, Alex (Sandy) Pentland, and David Lazer. 2009. Inferring friendship network structure by using mobile phone data. PNAS (2009).Google Scholar
- Xiaoli Zhang Fern and Carla E. Brodley. 2003. Random projection for high dimensional data clustering: A cluster ensemble approach. In ICML. AAAI Press, 186--193.Google Scholar
- Xiaoli Z. Fern and Wei Lin. 2008. Cluster ensemble selection. In SDM (2008-06-11). SIAM, 787--797. Google ScholarDigital Library
- Jing Gao, Wei Fan, Deepak Turaga, Olivier Verscheure, Xiaoqiao Meng, Lu Su, and Jiawei Han. 2011. Consensus extraction from heterogeneous detectors to improve performance over network traffic anomaly detection. In IEEE International Conference on Computer Communications Mini-Conference.Google ScholarCross Ref
- Jun Gao, Weiming Hu, Zhongfei (Mark) Zhang, and Ou Wu. 2012. Unsupervised ensemble learning for mining top-n outliers. In PAKDD. Google ScholarDigital Library
- Jing Gao and Pang-Ning Tan. 2006. Converting output scores from outlier detection algorithms to probability estimates. In ICDM. Google ScholarDigital Library
- Joydeep Ghosh and Ayan Acharya. 2013. Cluster ensembles: Theory and applications. In Data Clustering: Alg. and Appl.Google Scholar
- Manish Gupta, Jing Gao, Charu C. Aggarwal, and Jiawei Han. 2014. Outlier detection for temporal data: A survey. IEEE Trans. Knowl. Data Eng. 26, 9 (2014), 2250--2267. DOI:http://dx.doi.org/10.1109/ TKDE.2013.184Google ScholarCross Ref
- Stefan Todorov Hadjitodorov, Ludmila I. Kuncheva, and Ludmila P. Todorova. 2006. Moderate diversity for better cluster ensembles. Information Fusion 7, 3 (2006), 264--275. Google ScholarDigital Library
- Lars Kai Hansen and Peter Salamon. 1990. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 12, 10 (1990).Google ScholarDigital Library
- Douglas M. Hawkins. 1980. Identification of Outliers. Vol. 11. Springer.Google Scholar
- Jennifer A. Hoeting, David Madigan, Adrian E. Raftery, and Chris T. Volinsky. 1999. Bayesian model averaging: A tutorial. Statist. Sci. 14, 4 (1999), 382--417.Google ScholarCross Ref
- A. Juditsky, P. Rigollet, and A. B. Tsybakov. 2008. Learning by mirror averaging. Annals of Stat. 36, 5 (2008), 2183--2206. DOI:http://dx.doi.org/10.1214/07-AOS546Google ScholarCross Ref
- John Kemeny. 1959. Mathematics without numbers. In Daedalus. 577--591.Google Scholar
- Alexandre Klementiev, Dan Roth, and Kevin Small. 2007. An unsupervised learning algorithm for rank aggregation. In ECML. Google ScholarDigital Library
- Edwin M. Knorr and Raymond T. Ng. 1997. A unified notion of outliers: Properties and computation. In KDD. 219--222.Google Scholar
- Raivo Kolde, Sven Laur, Priit Adler, and Jaak Vilo. 2012. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28, 4 (2012), 573--580. Google ScholarDigital Library
- Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek. 2009. LoOP: Local outlier probabilities. In CIKM. ACM, 1649--1652. Google ScholarDigital Library
- Hans-Peter Kriegel, Peer Kröger, Erich Schubert, and Arthur Zimek. 2011. Interpreting and unifying outlier scores. In SDM. 13--24.Google Scholar
- L. Kuncheva and C. Whitaker. 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 51 (2003), 181--207. Google ScholarDigital Library
- Anukool Lakhina, Mark Crovella, and Christophe Diot. 2004. Diagnosing network-wide traffic anomalies. In SIGCOMM. 219--230. Google ScholarDigital Library
- Diane Lambert. 1992. Zero-inflated Poisson regression with an application to defects in manufacturing. In Technometrics. 1--14. Google ScholarDigital Library
- Aleksandar Lazarevic and Vipin Kumar. 2005. Feature bagging for outlier detection. In KDD. ACM, 157--166. Google ScholarDigital Library
- Barbora Micenkovä, Brian McWilliams, and Ira Assent. 2014. Learing outlier ensembles: The best of both worlds - supervised and unsupervised. In KDD ODD2 Workshop.Google Scholar
- Kristine Monteith, James L. Carroll, Kevin D. Seppi, and Tony R. Martinez. 2011. Turning Bayesian model averaging into Bayesian model combination. In IJCNN. IEEE, 2657--2663.Google Scholar
- Spiros Papadimitriou, Hiroyuki Kitagawa, Phillip B. Gibbons, and Christos Faloutsos. 2003. LOCI: Fast outlier detection using the local correlation integral. In ICDE. 315--326.Google Scholar
- Spiros Papadimitriou, Jimeng Sun, and Christos Faloutsos. 2005. Streaming pattern discovery in multiple time-series. In VLDB. Google ScholarDigital Library
- Oskar Perron. 1907. Zur theorie der matrices. In Mathematische Annalen.Google Scholar
- Michael D. Porter and Gentry White. 2012. Self-exciting hurdle models for terrorist actovity. In The Annals of Applied Statistics. 106--124.Google Scholar
- Christine Preisach and Lars Schmidt-Thieme. 2007. Ensembles of relational classifiers. Knowl. Info. Sys. 14 (2007), 249--272. Google ScholarDigital Library
- Shebuti Rayana and Leman Akoglu. 2014. An ensemble approach for event detection in dynamic graphs. In KDD ODD2 Workshop.Google Scholar
- Lior Rokach. 2010. Ensemble-based classifiers. Artif. Intell. Rev. 33, 1--2 (2010), 1--39. Google ScholarDigital Library
- Robert E. Schapire. 1990. The strength of weak learnability. In Machine Learning. 197--227. Google ScholarDigital Library
- Erich Schubert, Remigius Wojdanowski, Arthur Zimek, and Hans-Peter Kriegel. 2012. On evaluation of outlier rankings and outlier scores. In SDM. 1047--1058.Google Scholar
- Alexander P. Topchy, Anil K. Jain, and William F. Punch. 2005. Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27, 12 (2005), 1866--1881. Google ScholarDigital Library
- Alexandre B. Tsybakov. 2004. Optimal aggregation of classifiers in statistical learning. Annals of Statistics 32 (2004), 135--166.Google ScholarCross Ref
- Giorgio Valentini and Francesco Masulli. 2002. Ensembles of learning machines. In WIRN. Google ScholarDigital Library
- Brijesh Verma and Ashfaqur Rahman. 2012. Cluster-oriented ensemble classifier: Impact of multicluster characterization on ensemble classifier learning. IEEE Trans. Knowl. Data Eng. 24, 4 (2012), 605--618. Google ScholarDigital Library
- Nguyen Hoang Vu, Hock Hee Ang, and Vivekanand Gopalkrishnan. 2010. Mining outliers with ensemble of heterogeneous detectors on random subspaces. In DASFAA, Vol. 5981. 368--383. Google ScholarDigital Library
- Quang H. Vuong. 1989. Likelihood ratio tests for model selection and non-nested hypotheses. In Econometrica.Google Scholar
- D. H. Wolpert. 1992. Stacked generalization. Neural Networks 5 (1992), 214--259. Google ScholarDigital Library
- Faisal Zaman and Hideo Hirose. 2011. Classification performance of bagging and boosting type ensemble methods with small training sets. New Gen. Comput. 29, 3 (2011), 277--292.Google ScholarCross Ref
- Ke Zhang, Marcus Hutter, and Huidong Jin. 2009. A new local distance-based outlier detection approach for scattered real-world data. In PAKDD. Springer, 813--822. Google ScholarDigital Library
- Arthur Zimek, Ricardo J. G. B. Campello, and Jörg Sander. 2013a. Ensembles for unsupervised outlier detection: Challenges and research questions. SIGKDD Explor. Newsl. 15, 1 (2013), 11--22. DOI:http://dx.doi.org/10.1145/2594473.2594476 Google ScholarDigital Library
- Arthur Zimek, Matthew Gaudet, Ricardo J. G. B. Campello, and Jörg Sander. 2013b. Subsampling for efficient and effective unsupervised outlier detection ensembles. In KDD. ACM, 428--436. Google ScholarDigital Library
Index Terms
- Less is More: Building Selective Anomaly Ensembles
Recommendations
Improved classification with allocation method and multiple classifiers
We propose a new allocation method for building a classification ensemble.Allocation method uses multiple classifiers: the allocator and micro classifiers.Allocator separates the dataset and allocates them to one of micro classifiers.Allocator is based ...
Performance Evaluation of Ensemble Methods For Software Fault Prediction: An Experiment
ASWEC ' 15 Vol. II: Proceedings of the ASWEC 2015 24th Australasian Software Engineering ConferenceIn object-oriented software development, a plethora of studies have been carried out to present the application of machine learning algorithms for fault prediction. Furthermore, it has been empirically validated that an ensemble method can improve ...
Switching class labels to generate classification ensembles
Ensembles that combine the decisions of classifiers generated by using perturbed versions of the training set where the classes of the training examples are randomly switched can produce a significant error reduction, provided that large numbers of ...
Comments