Abstract
Ensemble analysis has recently been studied in the context of the outlier detection problem. In this paper, we investigate the theoretical underpinnings of outlier ensemble analysis. In spite of the significant differences between the classification and the outlier analysis problems, we show that the theoretical underpinnings between the two problems are actually quite similar in terms of the bias-variance trade-off. We explain the existing algorithms within this traditional framework, and clarify misconceptions about the reasoning underpinning these methods. We propose more effective variants of subsampling and feature bagging. We also discuss the impact of the combination function and discuss the specific trade-offs of the average and maximization functions. We use these insights to propose new combination functions that are robust in many settings.
- C. Aggarwal. Outlier Analysis, Springer, 2013. Google ScholarDigital Library
- C. Aggarwal. Outlier ensembles: Position paper, SIGKDD Explorations, 14(2), 2012. Google ScholarDigital Library
- C. Aggarwal, P. Yu. Outlier detection in highdimensional data. SIGMOD, 2001. Google ScholarDigital Library
- F. Angiulli, C. Pizzuti. Fast outlier detection in high dimensional spaces. PKDD, pp. 15--26, 2002. Google ScholarDigital Library
- D. Barbara, Y. Li, J. Couto, J. Lin, S. Jajodia. Bootstrapping a data mining intrusion detection system. In ACM SAC, pp. 421--425, 2003. Google ScholarDigital Library
- P. Buhlmann. Bagging, subagging and bragging for improving some prediction algorithms, Recent advances and trends in nonparametric statistics, Elsivier, 2003.Google Scholar
- P. Buhlmann, B. Yu. Analyzing bagging. Annals of Statistics, pp. 927--961, 2002.Google ScholarCross Ref
- A. Buja, W. Stuetzle. Observations on bagging. Statistica Sinica, 16(2), 323, 2006.Google Scholar
- M. Breunig, H.-P. Kriegel, R. Ng, J. Sander. LOF: Identifying density-based local outliers, SIGMOD, 2000. Google ScholarDigital Library
- Y. Freund, R. Schapire. A Decision-theoretic generalization of online learning and application to boosting. Computational Learning Theory, 1995. Google ScholarDigital Library
- J. Gao, P.-N. Tan. Converting output scores from outlier detection algorithms into probability estimates. ICDM Conference, 2006. Google ScholarDigital Library
- Z. He, S. Deng, X. Xu. A unified subspace outlier ensemble framework for outlier detection. WAIM, 2005. Google ScholarDigital Library
- F. Keller, E. Muller, K. Bohm. HiCS: High-contrast subspaces for density-based outlier ranking. ICDE, 2012. Google ScholarDigital Library
- A. Lazarevic, V. Kumar. Feature bagging for outlier detection, ACM KDD Conference, 2005. Google ScholarDigital Library
- F. T. Liu, K. M. Ting, Z.-H. Zhou. Isolation forest. ICDM Conference, 2008. Google ScholarDigital Library
- P. Melville, R. Mooney. Creating diversity in ensembles using artificial data. Information Fusion, 6(1), 2005.Google Scholar
- B. Micenkova, B. McWilliams, I. Assent. Learning representations for outlier detection on a budget. CoRR abs/1507.08104, 2015.Google Scholar
- E. Muller, M. Schiffer, T. Seidl. Statistical selection of relevant subspace projections for outlier ranking. ICDE Conference, 2011. Google ScholarDigital Library
- H. Nguyen, H. Ang, V. Gopalakrishnan. Mining ensembles of heterogeneous detectors on random subspaces. DASFAA, 2010. Google ScholarDigital Library
- D. Politis, J. Romano, and M. Wolf. Subsampling. Springer, 1999.Google ScholarCross Ref
- S. Rayana, L. Akoglu. Less is more: Building selective anomaly ensembles. SDM Conference, 2015.Google Scholar
- M. Shyu, S. Chen, K. Sarinnapakorn, L. Chang. A novel anomaly detection scheme based on principal component classifier. ICDMW, 2003.Google Scholar
- A. Zimek, R. Campello, J. Sander. Ensembles for unsupervised outlier detection: Challenges and research questions, SIGKDD Explorations, 15(1), 2013. Google ScholarDigital Library
- A. Zimek, M. Gaudet, R. Campello, J. Sander. Subsampling for efficient and effective unsupervised outlier detection ensembles, KDD Conference, 2013. Google ScholarDigital Library
- A. Zimek, R. Campello, J. Sander. Data perturbation for outlier detection ensembles. SSDBM, 2014. Google ScholarDigital Library
- http://elki.dbs.ifi.lmu.de/wiki/AlgorithmsGoogle Scholar
Index Terms
- Theoretical Foundations and Algorithms for Outlier Ensembles
Recommendations
Outlier ensembles: position paper
Ensemble analysis is a widely used meta-algorithm for many data mining problems such as classification and clustering. Numerous ensemble-based algorithms have been proposed in the literature for these problems. Compared to the clustering and ...
A Theoretical Analysis of Why Hybrid Ensembles Work
Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms ...
Comments