Skip to main content

2012 | Buch

Analysis of Rare Categories

insite
SUCHEN

Über dieses Buch

In many real-world problems, rare categories (minority classes) play essential roles despite their extreme scarcity. The discovery, characterization and prediction of rare categories of rare examples may protect us from fraudulent or malicious behavior, aid scientific discovery, and even save lives.

This book focuses on rare category analysis, where the majority classes have smooth distributions, and the minority classes exhibit the compactness property. Furthermore, it focuses on the challenging cases where the support regions of the majority and minority classes overlap. The author has developed effective algorithms with theoretical guarantees and good empirical results for the related techniques, and these are explained in detail. The book is suitable for researchers in the area of artificial intelligence, in particular machine learning and data mining.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
Imbalanced data sets are prevalent in real applications, i.e., some classes occupy the majority of the data set, a.k.a., the majority classes; whereas the remaining classes only have a few examples, a.k.a., the minority classes or the rare categories.
Jingrui He
Chapter 2. Survey and Overview
Abstract
Rare category analysis is related to many research areas, including active learning, where the goal is to improve the classification performance with the fewest label requests to the labeling oracle; imbalanced classification, where the goal is to construct a classifier for imbalanced data sets which is able to identify the under represented classes; anomaly detection (outlier detection), which refers to the problem of finding patterns in the data that do not conform to expected behavior; clustering, which refers to the problem of grouping similar data items into clusters; co-clustering, which generally involves grouping the data from various dimensions; and unsupervised feature selection, where the goal is to select features for the sake of grouping the data without any supervision.
Jingrui He
Chapter 3. Rare Category Detection
Abstract
In this chapter, we focus on rare category detection, the first task in the supervised setting. In this task, we are given an unlabeled, imbalanced data set, which is often non-separable, and have access to a labeling oracle, which is able to give us the class label of any example with a fixed cost.
Jingrui He
Chapter 4. Rare Category Characterization
Abstract
In Chapter 3,we have introduced various algorithms for rare category detection, which result in a set of labeled examples. Based on this labeled set, a natural follow-up step is rare category characterization, i.e., to characterize the minority classes in order to identify all the rare examples in the data set.
Jingrui He
Chapter 5. Unsupervised Rare Category Analysis
Abstract
In this chapter, we focus on unsupervised rare category analysis, i.e., no label information is available in the learning process, and address the following two problems: (1) rare category selection, i.e., selecting a set of examples which are likely to come from the minority class; (2) feature selection, i.e., selecting the features that are relevant to identify the minority class.
Jingrui He
Chapter 6. Conclusion and Future Directions
Abstract
Rare categories are of key importance in many real applications: although the occurrence of such examples is rare, their impact is significant. Applications of rare category analysis include: financial fraud detection, Medicare fraud detection, network intrusion detection, astronomy, spam image detection and health care.
Jingrui He
Backmatter
Metadaten
Titel
Analysis of Rare Categories
verfasst von
Jingrui He
Copyright-Jahr
2012
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-22813-1
Print ISBN
978-3-642-22812-4
DOI
https://doi.org/10.1007/978-3-642-22813-1