Abstract
Ensemble analysis is a widely used meta-algorithm for many data mining problems such as classification and clustering. Numerous ensemble-based algorithms have been proposed in the literature for these problems. Compared to the clustering and classification problems, ensemble analysis has been studied in a limited way in the outlier detection literature. In some cases, ensemble analysis techniques have been implicitly used by many outlier analysis algorithms, but the approach is often buried deep into the algorithm and not formally recognized as a general-purpose meta-algorithm. This is in spite of the fact that this problem is rather important in the context of outlier analysis. This paper discusses the various methods which are used in the literature for outlier ensembles and the general principles by which such analysis can be made more effective. A discussion is also provided on how outlier ensembles relate to the ensemble-techniques used commonly for other data mining problems.
- C. C. Aggarwal. Outlier Analysis, Springer, 2013. Google ScholarDigital Library
- C. C. Aggarwal, C. Reddy. Data Clustering: Algorithms and Applications, CRC Press, 2013.Google ScholarDigital Library
- C. C. Aggarwal and P. S. Yu. Outlier Detection in High Dimensional Data, ACM SIGMOD Conference, 2001. Google ScholarDigital Library
- C. C. Aggarwal, C. Procopiuc, J. Wolf, P. Yu, and J. Park. Fast Algorithms for Projected Clustering, ACM SIGMOD Conference, 1999. Google ScholarDigital Library
- F. Angiulli, C. Pizzuti. Fast outlier detection in high dimensional spaces, PKDD Conference, 2002. Google ScholarDigital Library
- D. Barbara, Y. Li, J. Couto, J.-L. Lin, and S. Jajodia. Bootstrapping a Data Mining Intrusion Detection System. Symposium on Applied Computing, 2003. Google ScholarDigital Library
- S. Bickel, T. Scheffer. Multi-view clustering. ICDM Conference, 2004. Google ScholarDigital Library
- L. Brieman. Random Forests. Journal Machine Learning archive, 45(1), pp. 5--32, 2001. Google ScholarDigital Library
- L. Brieman. Bagging Predictors. Machine Learning, 24(2), pp. 123--140, 1996. Google ScholarDigital Library
- V. Chandola, A. Banerjee, V. Kumar. Anomaly Detection: A Survey, ACM Computing Surveys, 2009. Google ScholarDigital Library
- S. D. Bay and M. Schwabacher, Mining distance-based outliers in near linear time with randomization and a simple pruning rule, KDD Conf., 2003. Google ScholarDigital Library
- M. Breunig, H.-P. Kriegel, R. Ng, and J. Sander. LOF: Identifying Density-based Local Outliers, ACM SIGMOD Conference, 2000. Google ScholarDigital Library
- N. Chawla, A. Lazarevic, L. Hall, and K. Bowyer. SMOTEBoost: Improving prediction of the minority class in boosting, PKDD, pp. 107--119, 2003.Google Scholar
- B. Clarke, Bayes Model Averaging and Stacking when Model Approximation Error cannot be Ignored, Journal of Machine Learning Research, pp 683--712, 2003. Google ScholarDigital Library
- P. Domingos. Bayesian Averaging of Classifiers and the Overfitting Problem. ICML Conference, 2000. Google ScholarDigital Library
- Y. Freund, R. Schapire. A Decision-theoretic Generalization of Online Learning and Application to Boosting, Computational Learning Theory, 1995. Google ScholarDigital Library
- J. Gao, P.-N. Tan. Converting output scores from outlier detection algorithms into probability estimates. ICDM Conference, 2006. Google ScholarDigital Library
- Z. He, S. Deng and X. Xu. A Unified Subspace Outlier Ensemble Framework for Outlier Detection, Advances in Web Age Information Management, 2005. Google ScholarDigital Library
- D. Hawkins. Identification of Outliers, Chapman and Hall, 1980.Google ScholarCross Ref
- A. Hinneburg, D. Keim, and M. Wawryniuk. Hd-eye: Visual mining of high-dimensional data. IEEE Computer Graphics and Applications, 19:22--31, 1999. Google ScholarDigital Library
- W. Jin, A. Tung, and J. Han. Mining top-n local outliers in large databases, ACM KDD Conference, 2001. Google ScholarDigital Library
- T. Johnson, I. Kwok, and R. Ng. Fast computation of 2-dimensional depth contours. ACM KDD Conference, 1998.Google Scholar
- M. Joshi, V. Kumar, and R. Agarwal. Evaluating Boosting Algorithms to Classify Rare Classes: Comparison and Improvements. ICDM Conference, pp. 257--264, 2001. Google ScholarDigital Library
- F. Keller, E. Muller, K. Bohm. HiCS: High-Contrast Subspaces for Density-based Outlier Ranking, IEEE ICDE Conference, 2012. Google ScholarDigital Library
- H. Kriegel, P. Kroger, E. Schubert, and A. Zimek. Interpreting and Unifying Outlier Scores. SDM Conference, 2011.Google Scholar
- E. Knorr, and R. Ng. Algorithms for Mining Distancebased Outliers in Large Datasets. VLDB Conference, 1998. Google ScholarDigital Library
- E. Knorr, and R. Ng. Finding Intensional Knowledge of Distance-Based Outliers. VLDB Conference, 1999. Google ScholarDigital Library
- A. Lazarevic, and V. Kumar. Feature Bagging for Outlier Detection, ACM KDD Conference, 2005. Google ScholarDigital Library
- F. T. Liu, K. M. Ting, and Z.-H. Zhou. Isolation Forest. ICDM Conference, 2008. Google ScholarDigital Library
- E. Muller, M. Schiffer, and T. Seidl. Statistical Selection of Relevant Subspace Projections for Outlier Ranking. ICDE Conference, pp, 434--445, 2011. Google ScholarDigital Library
- E. Muller, S. Gunnemann, I. Farber, and T. Seidl, Discovering multiple clustering solutions: Grouping objects in different views of the data, ICDM Conference, 2010. Google ScholarDigital Library
- E. Muller, S. Gunnemann, T. Seidl, and I. Farber. Tutorial: Discovering Multiple Clustering Solutions Grouping Objects in Different Views of the Data. ICDE Conference, 2012. Google ScholarDigital Library
- E. Muller, I. Assent, P. Iglesias, Y. Mulle, and K. Bohm. Outlier Ranking via Subspace Analysis in Multiple Views of the Data, ICDM Conference, 2012. Google ScholarDigital Library
- H. Nguyen, H. Ang, and V. Gopalakrishnan. Mining ensembles of heterogeneous detectors on random subspaces, DASFAA, 2010. Google ScholarDigital Library
- S. Papadimitriou, H. Kitagawa, P. Gibbons, and C. Faloutsos, LOCI: Fast outlier detection using the local correlation integral, ICDE Conference, 2003.Google ScholarCross Ref
- S. Ramaswamy, R. Rastogi, and K. Shim. Efficient Algorithms for Mining Outliers from Large Data Sets. ACM SIGMOD Conference, pp. 427--438, 2000. Google ScholarDigital Library
- P. Smyth and D. Wolpert. Linearly Combining Density Estimators via Stacking, Machine Learning Journal, 36, pp. 59--83, 1999. Google ScholarDigital Library
- D. Wolpert. Stacked Generalization, Neural Networks, 5(2), pp. 241--259, 1992. Google ScholarDigital Library
- B. Zenko. Is Combining Classifiers Better than Selecting the Best One, Machine Learning, pp. 255--273, 2004. Google ScholarDigital Library
Index Terms
- Outlier ensembles: position paper
Recommendations
Outlier ensembles
ODD '13: Proceedings of the ACM SIGKDD Workshop on Outlier Detection and DescriptionEnsemble analysis is a widely used meta-algorithm for many data mining problems such as classification and clustering. Numerous ensemble-based algorithms have been proposed in the literature for these problems. Compared to the clustering and ...
Comments