skip to main content
column

Theoretical Foundations and Algorithms for Outlier Ensembles

Published:29 September 2015Publication History
Skip Abstract Section

Abstract

Ensemble analysis has recently been studied in the context of the outlier detection problem. In this paper, we investigate the theoretical underpinnings of outlier ensemble analysis. In spite of the significant differences between the classification and the outlier analysis problems, we show that the theoretical underpinnings between the two problems are actually quite similar in terms of the bias-variance trade-off. We explain the existing algorithms within this traditional framework, and clarify misconceptions about the reasoning underpinning these methods. We propose more effective variants of subsampling and feature bagging. We also discuss the impact of the combination function and discuss the specific trade-offs of the average and maximization functions. We use these insights to propose new combination functions that are robust in many settings.

References

  1. C. Aggarwal. Outlier Analysis, Springer, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Aggarwal. Outlier ensembles: Position paper, SIGKDD Explorations, 14(2), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Aggarwal, P. Yu. Outlier detection in highdimensional data. SIGMOD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Angiulli, C. Pizzuti. Fast outlier detection in high dimensional spaces. PKDD, pp. 15--26, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Barbara, Y. Li, J. Couto, J. Lin, S. Jajodia. Bootstrapping a data mining intrusion detection system. In ACM SAC, pp. 421--425, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Buhlmann. Bagging, subagging and bragging for improving some prediction algorithms, Recent advances and trends in nonparametric statistics, Elsivier, 2003.Google ScholarGoogle Scholar
  7. P. Buhlmann, B. Yu. Analyzing bagging. Annals of Statistics, pp. 927--961, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  8. A. Buja, W. Stuetzle. Observations on bagging. Statistica Sinica, 16(2), 323, 2006.Google ScholarGoogle Scholar
  9. M. Breunig, H.-P. Kriegel, R. Ng, J. Sander. LOF: Identifying density-based local outliers, SIGMOD, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Freund, R. Schapire. A Decision-theoretic generalization of online learning and application to boosting. Computational Learning Theory, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Gao, P.-N. Tan. Converting output scores from outlier detection algorithms into probability estimates. ICDM Conference, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Z. He, S. Deng, X. Xu. A unified subspace outlier ensemble framework for outlier detection. WAIM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Keller, E. Muller, K. Bohm. HiCS: High-contrast subspaces for density-based outlier ranking. ICDE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Lazarevic, V. Kumar. Feature bagging for outlier detection, ACM KDD Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. T. Liu, K. M. Ting, Z.-H. Zhou. Isolation forest. ICDM Conference, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Melville, R. Mooney. Creating diversity in ensembles using artificial data. Information Fusion, 6(1), 2005.Google ScholarGoogle Scholar
  17. B. Micenkova, B. McWilliams, I. Assent. Learning representations for outlier detection on a budget. CoRR abs/1507.08104, 2015.Google ScholarGoogle Scholar
  18. E. Muller, M. Schiffer, T. Seidl. Statistical selection of relevant subspace projections for outlier ranking. ICDE Conference, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Nguyen, H. Ang, V. Gopalakrishnan. Mining ensembles of heterogeneous detectors on random subspaces. DASFAA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Politis, J. Romano, and M. Wolf. Subsampling. Springer, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  21. S. Rayana, L. Akoglu. Less is more: Building selective anomaly ensembles. SDM Conference, 2015.Google ScholarGoogle Scholar
  22. M. Shyu, S. Chen, K. Sarinnapakorn, L. Chang. A novel anomaly detection scheme based on principal component classifier. ICDMW, 2003.Google ScholarGoogle Scholar
  23. A. Zimek, R. Campello, J. Sander. Ensembles for unsupervised outlier detection: Challenges and research questions, SIGKDD Explorations, 15(1), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Zimek, M. Gaudet, R. Campello, J. Sander. Subsampling for efficient and effective unsupervised outlier detection ensembles, KDD Conference, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Zimek, R. Campello, J. Sander. Data perturbation for outlier detection ensembles. SSDBM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. http://elki.dbs.ifi.lmu.de/wiki/AlgorithmsGoogle ScholarGoogle Scholar

Index Terms

  1. Theoretical Foundations and Algorithms for Outlier Ensembles

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGKDD Explorations Newsletter
        ACM SIGKDD Explorations Newsletter  Volume 17, Issue 1
        June 2015
        50 pages
        ISSN:1931-0145
        EISSN:1931-0153
        DOI:10.1145/2830544
        Issue’s Table of Contents

        Copyright © 2015 Authors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 29 September 2015

        Check for updates

        Qualifiers

        • column

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader