skip to main content
research-article

Outlier ensembles: position paper

Published:30 April 2013Publication History
Skip Abstract Section

Abstract

Ensemble analysis is a widely used meta-algorithm for many data mining problems such as classification and clustering. Numerous ensemble-based algorithms have been proposed in the literature for these problems. Compared to the clustering and classification problems, ensemble analysis has been studied in a limited way in the outlier detection literature. In some cases, ensemble analysis techniques have been implicitly used by many outlier analysis algorithms, but the approach is often buried deep into the algorithm and not formally recognized as a general-purpose meta-algorithm. This is in spite of the fact that this problem is rather important in the context of outlier analysis. This paper discusses the various methods which are used in the literature for outlier ensembles and the general principles by which such analysis can be made more effective. A discussion is also provided on how outlier ensembles relate to the ensemble-techniques used commonly for other data mining problems.

References

  1. C. C. Aggarwal. Outlier Analysis, Springer, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. C. Aggarwal, C. Reddy. Data Clustering: Algorithms and Applications, CRC Press, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. C. Aggarwal and P. S. Yu. Outlier Detection in High Dimensional Data, ACM SIGMOD Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. C. Aggarwal, C. Procopiuc, J. Wolf, P. Yu, and J. Park. Fast Algorithms for Projected Clustering, ACM SIGMOD Conference, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Angiulli, C. Pizzuti. Fast outlier detection in high dimensional spaces, PKDD Conference, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Barbara, Y. Li, J. Couto, J.-L. Lin, and S. Jajodia. Bootstrapping a Data Mining Intrusion Detection System. Symposium on Applied Computing, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Bickel, T. Scheffer. Multi-view clustering. ICDM Conference, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Brieman. Random Forests. Journal Machine Learning archive, 45(1), pp. 5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Brieman. Bagging Predictors. Machine Learning, 24(2), pp. 123--140, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. V. Chandola, A. Banerjee, V. Kumar. Anomaly Detection: A Survey, ACM Computing Surveys, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. D. Bay and M. Schwabacher, Mining distance-based outliers in near linear time with randomization and a simple pruning rule, KDD Conf., 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Breunig, H.-P. Kriegel, R. Ng, and J. Sander. LOF: Identifying Density-based Local Outliers, ACM SIGMOD Conference, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Chawla, A. Lazarevic, L. Hall, and K. Bowyer. SMOTEBoost: Improving prediction of the minority class in boosting, PKDD, pp. 107--119, 2003.Google ScholarGoogle Scholar
  14. B. Clarke, Bayes Model Averaging and Stacking when Model Approximation Error cannot be Ignored, Journal of Machine Learning Research, pp 683--712, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Domingos. Bayesian Averaging of Classifiers and the Overfitting Problem. ICML Conference, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Freund, R. Schapire. A Decision-theoretic Generalization of Online Learning and Application to Boosting, Computational Learning Theory, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Gao, P.-N. Tan. Converting output scores from outlier detection algorithms into probability estimates. ICDM Conference, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. He, S. Deng and X. Xu. A Unified Subspace Outlier Ensemble Framework for Outlier Detection, Advances in Web Age Information Management, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Hawkins. Identification of Outliers, Chapman and Hall, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  20. A. Hinneburg, D. Keim, and M. Wawryniuk. Hd-eye: Visual mining of high-dimensional data. IEEE Computer Graphics and Applications, 19:22--31, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. W. Jin, A. Tung, and J. Han. Mining top-n local outliers in large databases, ACM KDD Conference, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Johnson, I. Kwok, and R. Ng. Fast computation of 2-dimensional depth contours. ACM KDD Conference, 1998.Google ScholarGoogle Scholar
  23. M. Joshi, V. Kumar, and R. Agarwal. Evaluating Boosting Algorithms to Classify Rare Classes: Comparison and Improvements. ICDM Conference, pp. 257--264, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Keller, E. Muller, K. Bohm. HiCS: High-Contrast Subspaces for Density-based Outlier Ranking, IEEE ICDE Conference, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. H. Kriegel, P. Kroger, E. Schubert, and A. Zimek. Interpreting and Unifying Outlier Scores. SDM Conference, 2011.Google ScholarGoogle Scholar
  26. E. Knorr, and R. Ng. Algorithms for Mining Distancebased Outliers in Large Datasets. VLDB Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Knorr, and R. Ng. Finding Intensional Knowledge of Distance-Based Outliers. VLDB Conference, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Lazarevic, and V. Kumar. Feature Bagging for Outlier Detection, ACM KDD Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F. T. Liu, K. M. Ting, and Z.-H. Zhou. Isolation Forest. ICDM Conference, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. E. Muller, M. Schiffer, and T. Seidl. Statistical Selection of Relevant Subspace Projections for Outlier Ranking. ICDE Conference, pp, 434--445, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Muller, S. Gunnemann, I. Farber, and T. Seidl, Discovering multiple clustering solutions: Grouping objects in different views of the data, ICDM Conference, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. E. Muller, S. Gunnemann, T. Seidl, and I. Farber. Tutorial: Discovering Multiple Clustering Solutions Grouping Objects in Different Views of the Data. ICDE Conference, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. E. Muller, I. Assent, P. Iglesias, Y. Mulle, and K. Bohm. Outlier Ranking via Subspace Analysis in Multiple Views of the Data, ICDM Conference, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. H. Nguyen, H. Ang, and V. Gopalakrishnan. Mining ensembles of heterogeneous detectors on random subspaces, DASFAA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Papadimitriou, H. Kitagawa, P. Gibbons, and C. Faloutsos, LOCI: Fast outlier detection using the local correlation integral, ICDE Conference, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  36. S. Ramaswamy, R. Rastogi, and K. Shim. Efficient Algorithms for Mining Outliers from Large Data Sets. ACM SIGMOD Conference, pp. 427--438, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. P. Smyth and D. Wolpert. Linearly Combining Density Estimators via Stacking, Machine Learning Journal, 36, pp. 59--83, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Wolpert. Stacked Generalization, Neural Networks, 5(2), pp. 241--259, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. B. Zenko. Is Combining Classifiers Better than Selecting the Best One, Machine Learning, pp. 255--273, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Outlier ensembles: position paper

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGKDD Explorations Newsletter
      ACM SIGKDD Explorations Newsletter  Volume 14, Issue 2
      December 2012
      81 pages
      ISSN:1931-0145
      EISSN:1931-0153
      DOI:10.1145/2481244
      Issue’s Table of Contents

      Copyright © 2013 Author

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 30 April 2013

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader