Outlier Ensembles

An Introduction

verfasst von: Charu C. Aggarwal, Saket Sathe

Verlag: Springer International Publishing

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

This book discusses a variety of methods for outlier ensembles and organizes them by the specific principles with which accuracy improvements are achieved. In addition, it covers the techniques with which such methods can be made more effective. A formal classification of these methods is provided, and the circumstances in which they work well are examined. The authors cover how outlier ensembles relate (both theoretically and practically) to the ensemble techniques used commonly for other data mining problems like classification. The similarities and (subtle) differences in the ensemble techniques for the classification and outlier detection problems are explored. These subtle differences do impact the design of ensemble algorithms for the latter problem. This book can be used for courses in data mining and related curricula. Many illustrative examples and exercises are provided in order to facilitate classroom teaching. A familiarity is assumed to the outlier detection problem and also to generic problem of ensemble analysis in classification. This is because many of the ensemble methods discussed in this book are adaptations from their counterparts in the classification domain. Some techniques explained in this book, such as wagging, randomized feature weighting, and geometric subsampling, provide new insights that are not available elsewhere. Also included is an analysis of the performance of various types of base detectors and their relative effectiveness. The book is valuable for researchers and practitioners for leveraging ensemble methods into optimal algorithmic design.

Inhaltsverzeichnis

Frontmatter

Chapter 1. An Introduction to Outlier Ensembles

Abstract

The outlier analysis problem has been widely studied by database, data mining, machine learning and statistical communities. Numerous algorithms have been proposed for this problem in recent years (Aggarwal, Outlier Detection in High Dimensional Data, [6]; Angiulli, Fast Outlier Detection in High Dimensional Spaces, [9]; Bay, Mining distance-based outliers in near linear time with randomization and a simple pruning rule, [11]; Breunig, LOF: Identifying Density-based Local Outliers, [14]; Knorr, Algorithms for Mining Distance-based Outliers in Large Datasets, [35]; Knorr, Finding Intensional Knowledge of Distance-Based Outliers, [36]; Jin, Mining top-n local outliers in large databases, [39]; Johnson, Fast computation of 2-dimensional depth contours, [40]; Papadimitriou, LOCI: Fast outlier detection using the local correlation integral, [53]; Ramaswamy, Efficient Algorithms for Mining Outliers from Large Data Sets, [55]).