Skip to main content

2017 | Buch

Outlier Ensembles

An Introduction

insite
SUCHEN

Über dieses Buch

This book discusses a variety of methods for outlier ensembles and organizes them by the specific principles with which accuracy improvements are achieved. In addition, it covers the techniques with which such methods can be made more effective. A formal classification of these methods is provided, and the circumstances in which they work well are examined. The authors cover how outlier ensembles relate (both theoretically and practically) to the ensemble techniques used commonly for other data mining problems like classification. The similarities and (subtle) differences in the ensemble techniques for the classification and outlier detection problems are explored. These subtle differences do impact the design of ensemble algorithms for the latter problem. This book can be used for courses in data mining and related curricula. Many illustrative examples and exercises are provided in order to facilitate classroom teaching. A familiarity is assumed to the outlier detection problem and also to generic problem of ensemble analysis in classification. This is because many of the ensemble methods discussed in this book are adaptations from their counterparts in the classification domain. Some techniques explained in this book, such as wagging, randomized feature weighting, and geometric subsampling, provide new insights that are not available elsewhere. Also included is an analysis of the performance of various types of base detectors and their relative effectiveness. The book is valuable for researchers and practitioners for leveraging ensemble methods into optimal algorithmic design.

Inhaltsverzeichnis

Frontmatter
Chapter 1. An Introduction to Outlier Ensembles
Abstract
The outlier analysis problem has been widely studied by database, data mining, machine learning and statistical communities. Numerous algorithms have been proposed for this problem in recent years (Aggarwal, Outlier Detection in High Dimensional Data, [6]; Angiulli, Fast Outlier Detection in High Dimensional Spaces, [9]; Bay, Mining distance-based outliers in near linear time with randomization and a simple pruning rule, [11]; Breunig, LOF: Identifying Density-based Local Outliers, [14]; Knorr, Algorithms for Mining Distance-based Outliers in Large Datasets, [35]; Knorr, Finding Intensional Knowledge of Distance-Based Outliers, [36]; Jin, Mining top-n local outliers in large databases, [39]; Johnson, Fast computation of 2-dimensional depth contours, [40]; Papadimitriou, LOCI: Fast outlier detection using the local correlation integral, [53]; Ramaswamy, Efficient Algorithms for Mining Outliers from Large Data Sets, [55]).
Charu C. Aggarwal, Saket Sathe
Chapter 2. Theory of Outlier Ensembles
Abstract
Outlier detection is an unsupervised problem, in which labels are not available with data records (Aggarwal, Outlier analysis, 2017, [2]). As a result, it is generally more challenging to design ensemble analysis algorithms for outlier detection. In particular, methods that require the use of labels in intermediate steps of the algorithm cannot be generalized to outlier detection.
Charu C. Aggarwal, Saket Sathe
Chapter 3. Variance Reduction in Outlier Ensembles
Abstract
The theoretical discussion in the previous chapter establishes that the error of an outlier detector can be decomposed into the squared bias and the variance. Ensemble methods attempt to reduce the overall error by reducing either the squared bias or the variance.
Charu C. Aggarwal, Saket Sathe
Chapter 4. Bias Reduction in Outlier Ensembles: The Guessing Game
Abstract
Bias reduction is a difficult problem in unsupervised problem like outlier detection. The main reason is that bias-reduction algorithms often require a quantification of error in intermediate steps of the algorithm. An example of such a bias reduction algorithm from classification is referred to as “boosting”. In boosting, the outputs of highly biased detectors are used to learn portions of the decision space in which the bias performance affects the algorithm in a negative way.
Charu C. Aggarwal, Saket Sathe
Chapter 5. Model Combination Methods for Outlier Ensembles
Abstract
An important part of the process of creating outlier ensembles is to combine the outputs of different detectors. The precise method for model combination has a significant impact on the effectiveness of a particular outlier detection method because of the varying theoretical effects of different combination methods. For example, the impact of the scheme of averaging is quite different from that of maximization in terms of the bias and variance of the result. Therefore, the choice of model combination has a crucial effect on the results of the ensemble.
Charu C. Aggarwal, Saket Sathe
Chapter 6. Which Outlier Detection Algorithm Should I Use?
Abstract
Ensembles can be used to improve the performance of base detectors in several different ways. The first method is to use a single base detector in conjunction with a method like feature bagging and subsampling. The second method is to combine multiple base detectors in order to induce greater diversity. What is the impact of using generic ensemble methods on various base detectors? What is the impact of combining these ensemble methods into a higher-level combination? This chapter will discuss both these different ways of combining base detectors and also various ways in which one can squeeze the most out of ensemble methods.
Charu C. Aggarwal, Saket Sathe
Backmatter
Metadaten
Titel
Outlier Ensembles
verfasst von
Charu C. Aggarwal
Saket Sathe
Copyright-Jahr
2017
Electronic ISBN
978-3-319-54765-7
Print ISBN
978-3-319-54764-0
DOI
https://doi.org/10.1007/978-3-319-54765-7