Analysis of Rare Categories

verfasst von: Jingrui He

Verlag: Springer Berlin Heidelberg

Buchreihe : Cognitive Technologies

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

In many real-world problems, rare categories (minority classes) play essential roles despite their extreme scarcity. The discovery, characterization and prediction of rare categories of rare examples may protect us from fraudulent or malicious behavior, aid scientific discovery, and even save lives.

This book focuses on rare category analysis, where the majority classes have smooth distributions, and the minority classes exhibit the compactness property. Furthermore, it focuses on the challenging cases where the support regions of the majority and minority classes overlap. The author has developed effective algorithms with theoretical guarantees and good empirical results for the related techniques, and these are explained in detail. The book is suitable for researchers in the area of artificial intelligence, in particular machine learning and data mining.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Introduction

Abstract

Imbalanced data sets are prevalent in real applications, i.e., some classes occupy the majority of the data set, a.k.a., the majority classes; whereas the remaining classes only have a few examples, a.k.a., the minority classes or the rare categories.

Jingrui He

Chapter 2. Survey and Overview

Abstract

Rare category analysis is related to many research areas, including active learning, where the goal is to improve the classification performance with the fewest label requests to the labeling oracle; imbalanced classification, where the goal is to construct a classifier for imbalanced data sets which is able to identify the under represented classes; anomaly detection (outlier detection), which refers to the problem of finding patterns in the data that do not conform to expected behavior; clustering, which refers to the problem of grouping similar data items into clusters; co-clustering, which generally involves grouping the data from various dimensions; and unsupervised feature selection, where the goal is to select features for the sake of grouping the data without any supervision.

Jingrui He

Chapter 3. Rare Category Detection

Abstract

In this chapter, we focus on rare category detection, the first task in the supervised setting. In this task, we are given an unlabeled, imbalanced data set, which is often non-separable, and have access to a labeling oracle, which is able to give us the class label of any example with a fixed cost.

Jingrui He

Chapter 4. Rare Category Characterization

Abstract

In Chapter 3,we have introduced various algorithms for rare category detection, which result in a set of labeled examples. Based on this labeled set, a natural follow-up step is rare category characterization, i.e., to characterize the minority classes in order to identify all the rare examples in the data set.

Jingrui He

Chapter 5. Unsupervised Rare Category Analysis

Abstract

In this chapter, we focus on unsupervised rare category analysis, i.e., no label information is available in the learning process, and address the following two problems: (1) rare category selection, i.e., selecting a set of examples which are likely to come from the minority class; (2) feature selection, i.e., selecting the features that are relevant to identify the minority class.

Jingrui He

Chapter 6. Conclusion and Future Directions

Abstract

Rare categories are of key importance in many real applications: although the occurrence of such examples is rare, their impact is significant. Applications of rare category analysis include: financial fraud detection, Medicare fraud detection, network intrusion detection, astronomy, spam image detection and health care.

Jingrui He

Backmatter

Titel: Analysis of Rare Categories
verfasst von: Jingrui He
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-642-22813-1
Print ISBN: 978-3-642-22812-4
DOI: https://doi.org/10.1007/978-3-642-22813-1