research-article

Outlier ensembles: position paper

Author:
Charu C. Aggarwal

IBM T. J. Watson Research Center, Yorktown Heights, NY

IBM T. J. Watson Research Center, Yorktown Heights, NY
View Profile

Authors Info & Claims

ACM SIGKDD Explorations Newsletter Volume 14 Issue 2December 2012pp 49–58https://doi.org/10.1145/2481244.2481252

Published:30 April 2013Publication History

ACM SIGKDD Explorations Newsletter

Abstract

Ensemble analysis is a widely used meta-algorithm for many data mining problems such as classification and clustering. Numerous ensemble-based algorithms have been proposed in the literature for these problems. Compared to the clustering and classification problems, ensemble analysis has been studied in a limited way in the outlier detection literature. In some cases, ensemble analysis techniques have been implicitly used by many outlier analysis algorithms, but the approach is often buried deep into the algorithm and not formally recognized as a general-purpose meta-algorithm. This is in spite of the fact that this problem is rather important in the context of outlier analysis. This paper discusses the various methods which are used in the literature for outlier ensembles and the general principles by which such analysis can be made more effective. A discussion is also provided on how outlier ensembles relate to the ensemble-techniques used commonly for other data mining problems.

References

C. C. Aggarwal. Outlier Analysis, Springer, 2013. Google ScholarDigital Library
C. C. Aggarwal, C. Reddy. Data Clustering: Algorithms and Applications, CRC Press, 2013.Google ScholarDigital Library
C. C. Aggarwal and P. S. Yu. Outlier Detection in High Dimensional Data, ACM SIGMOD Conference, 2001. Google ScholarDigital Library
C. C. Aggarwal, C. Procopiuc, J. Wolf, P. Yu, and J. Park. Fast Algorithms for Projected Clustering, ACM SIGMOD Conference, 1999. Google ScholarDigital Library
F. Angiulli, C. Pizzuti. Fast outlier detection in high dimensional spaces, PKDD Conference, 2002. Google ScholarDigital Library
D. Barbara, Y. Li, J. Couto, J.-L. Lin, and S. Jajodia. Bootstrapping a Data Mining Intrusion Detection System. Symposium on Applied Computing, 2003. Google ScholarDigital Library
S. Bickel, T. Scheffer. Multi-view clustering. ICDM Conference, 2004. Google ScholarDigital Library
L. Brieman. Random Forests. Journal Machine Learning archive, 45(1), pp. 5--32, 2001. Google ScholarDigital Library
L. Brieman. Bagging Predictors. Machine Learning, 24(2), pp. 123--140, 1996. Google ScholarDigital Library
V. Chandola, A. Banerjee, V. Kumar. Anomaly Detection: A Survey, ACM Computing Surveys, 2009. Google ScholarDigital Library
S. D. Bay and M. Schwabacher, Mining distance-based outliers in near linear time with randomization and a simple pruning rule, KDD Conf., 2003. Google ScholarDigital Library
M. Breunig, H.-P. Kriegel, R. Ng, and J. Sander. LOF: Identifying Density-based Local Outliers, ACM SIGMOD Conference, 2000. Google ScholarDigital Library
N. Chawla, A. Lazarevic, L. Hall, and K. Bowyer. SMOTEBoost: Improving prediction of the minority class in boosting, PKDD, pp. 107--119, 2003.Google Scholar
B. Clarke, Bayes Model Averaging and Stacking when Model Approximation Error cannot be Ignored, Journal of Machine Learning Research, pp 683--712, 2003. Google ScholarDigital Library
P. Domingos. Bayesian Averaging of Classifiers and the Overfitting Problem. ICML Conference, 2000. Google ScholarDigital Library
Y. Freund, R. Schapire. A Decision-theoretic Generalization of Online Learning and Application to Boosting, Computational Learning Theory, 1995. Google ScholarDigital Library
J. Gao, P.-N. Tan. Converting output scores from outlier detection algorithms into probability estimates. ICDM Conference, 2006. Google ScholarDigital Library
Z. He, S. Deng and X. Xu. A Unified Subspace Outlier Ensemble Framework for Outlier Detection, Advances in Web Age Information Management, 2005. Google ScholarDigital Library
D. Hawkins. Identification of Outliers, Chapman and Hall, 1980.Google ScholarCross Ref
A. Hinneburg, D. Keim, and M. Wawryniuk. Hd-eye: Visual mining of high-dimensional data. IEEE Computer Graphics and Applications, 19:22--31, 1999. Google ScholarDigital Library
W. Jin, A. Tung, and J. Han. Mining top-n local outliers in large databases, ACM KDD Conference, 2001. Google ScholarDigital Library
T. Johnson, I. Kwok, and R. Ng. Fast computation of 2-dimensional depth contours. ACM KDD Conference, 1998.Google Scholar
M. Joshi, V. Kumar, and R. Agarwal. Evaluating Boosting Algorithms to Classify Rare Classes: Comparison and Improvements. ICDM Conference, pp. 257--264, 2001. Google ScholarDigital Library
F. Keller, E. Muller, K. Bohm. HiCS: High-Contrast Subspaces for Density-based Outlier Ranking, IEEE ICDE Conference, 2012. Google ScholarDigital Library
H. Kriegel, P. Kroger, E. Schubert, and A. Zimek. Interpreting and Unifying Outlier Scores. SDM Conference, 2011.Google Scholar
E. Knorr, and R. Ng. Algorithms for Mining Distancebased Outliers in Large Datasets. VLDB Conference, 1998. Google ScholarDigital Library
E. Knorr, and R. Ng. Finding Intensional Knowledge of Distance-Based Outliers. VLDB Conference, 1999. Google ScholarDigital Library
A. Lazarevic, and V. Kumar. Feature Bagging for Outlier Detection, ACM KDD Conference, 2005. Google ScholarDigital Library
F. T. Liu, K. M. Ting, and Z.-H. Zhou. Isolation Forest. ICDM Conference, 2008. Google ScholarDigital Library
E. Muller, M. Schiffer, and T. Seidl. Statistical Selection of Relevant Subspace Projections for Outlier Ranking. ICDE Conference, pp, 434--445, 2011. Google ScholarDigital Library
E. Muller, S. Gunnemann, I. Farber, and T. Seidl, Discovering multiple clustering solutions: Grouping objects in different views of the data, ICDM Conference, 2010. Google ScholarDigital Library
E. Muller, S. Gunnemann, T. Seidl, and I. Farber. Tutorial: Discovering Multiple Clustering Solutions Grouping Objects in Different Views of the Data. ICDE Conference, 2012. Google ScholarDigital Library
E. Muller, I. Assent, P. Iglesias, Y. Mulle, and K. Bohm. Outlier Ranking via Subspace Analysis in Multiple Views of the Data, ICDM Conference, 2012. Google ScholarDigital Library
H. Nguyen, H. Ang, and V. Gopalakrishnan. Mining ensembles of heterogeneous detectors on random subspaces, DASFAA, 2010. Google ScholarDigital Library
S. Papadimitriou, H. Kitagawa, P. Gibbons, and C. Faloutsos, LOCI: Fast outlier detection using the local correlation integral, ICDE Conference, 2003.Google ScholarCross Ref
S. Ramaswamy, R. Rastogi, and K. Shim. Efficient Algorithms for Mining Outliers from Large Data Sets. ACM SIGMOD Conference, pp. 427--438, 2000. Google ScholarDigital Library
P. Smyth and D. Wolpert. Linearly Combining Density Estimators via Stacking, Machine Learning Journal, 36, pp. 59--83, 1999. Google ScholarDigital Library
D. Wolpert. Stacked Generalization, Neural Networks, 5(2), pp. 241--259, 1992. Google ScholarDigital Library
B. Zenko. Is Combining Classifiers Better than Selecting the Best One, Machine Learning, pp. 255--273, 2004. Google ScholarDigital Library

Index Terms

Outlier ensembles: position paper
1. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Outlier ensembles
ODD '13: Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description

Ensemble analysis is a widely used meta-algorithm for many data mining problems such as classification and clustering. Numerous ensemble-based algorithms have been proposed in the literature for these problems. Compared to the clustering and ...
Read More
Outlier Ensembles: An Introduction
Read More
Outlier Ensembles: An Introduction
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM SIGKDD Explorations Newsletter Volume 14, Issue 2
December 2012
81 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/2481244
Issue’s Table of Contents

Copyright © 2013 Author
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 April 2013
Check for updates
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 132
  Total Citations
  View Citations
- 1,249
  Total Downloads
- Downloads (Last 12 months)107
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Outlier ensembles: position paper

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Outlier ensembles

Outlier Ensembles: An Introduction

Outlier Ensembles: An Introduction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Outlier ensembles: position paper

ACM SIGKDD Explorations Newsletter

Abstract

References

Cited By

Index Terms

Recommendations

Outlier ensembles

Outlier Ensembles: An Introduction

Outlier Ensembles: An Introduction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media