Skip to main content
main-content

Über dieses Buch

Based on ideas from Support Vector Machines (SVMs), Learning To Classify Text Using Support Vector Machines presents a new approach to generating text classifiers from examples. The approach combines high performance and efficiency with theoretical understanding and improved robustness. In particular, it is highly effective without greedy heuristic components. The SVM approach is computationally efficient in training and classification, and it comes with a learning theory that can guide real-world applications.

Learning To Classify Text Using Support Vector Machines gives a complete and detailed description of the SVM approach to learning text classifiers, including training algorithms, transductive text classification, efficient performance estimation, and a statistical learning model of text classification. In addition, it includes an overview of the field of text classification, making it self-contained even for newcomers to the field. This book gives a concise introduction to SVMs for pattern recognition, and it includes a detailed description of how to formulate text-classification tasks for machine learning.

Inhaltsverzeichnis

Frontmatter

Introduction

Chapter 1. Introduction

Abstract
With the rapid growth of the World Wide Web, the task of classifying natural language documents into a predefined set of semantic categories has become one of the key methods for organizing online information. This task is commonly referred to as text classification. It is a basic building block in a wide range of applications. For example, directories like Yahoo! categorize Web pages by topic, online newspapers customize themselves to a particular user’s reading preferences, and routing agents at service hotlines forward incoming email to the appropriate expert by content. While it was possible in the past to have human indexers do the category assignments manually, the exponential growth of the number of online documents and the increased pace with which information needs to be distributed has created the need for automatic document classification.
Thorsten Joachims

Text Classification

Chapter 2. Text Classification

Abstract
This chapter reviews the state-of-the-art in learning text classifiers from examples. First, it gives formal definitions of the most common scenarios in text classification — namely binary, multi-class, and multi-label classification. Furthermore, it gives an overview of different representations of text, feature selection methods, and criteria for evaluating predictive performance. The chapter ends with a description of the experimental setup used throughout this book.
Thorsten Joachims

Support Vector Machines

Chapter 3. Support Vector Machines

Abstract
This chapter gives a short introduction to support vector machines, the basic learning method used, extended, and analyzed for text classification throughout this work. Support vector machines [Cortes and Vapnik, 1995][Vapnik, 1998] were developed by Vapnik et al. based on the Structural Risk Minimization principle [Vapnik, 1982] from statistical learning theory. The idea of structural risk minimization is to find a hypothesis h from a hypothesis space H for which one can guarantee the lowest probability of error Err(h) for a given training sample S
$$({\vec x_1},{y_1}), \ldots ,({\vec x_n},{y_n}) {\vec x_i} \in {\Re ^N},{y_i} \in \{ - 1, + 1\}$$
(3.1)
of n examples. The following upper bound connects the true error of a hypothesis h with the error Err train (h) of h on the training set and the complexity of h [Vapnik, 1998] (see also Section 1.1).
Thorsten Joachims

Part Theory

Chapter 4. A Statistical Learning Model of Text Classification for SVMs

Abstract
There are at least two ways to motivate why a particular learning method is suitable for a particular learning task. Since ultimately one is interested in the performance of the method, one way is through comparative studies. Chapters 6 and 7 present such studies and show that SVMs deliver state-of-the-art classification performance. However, success on benchmarks is a brittle justification for a learning algorithm and gives only limited insight. Therefore, this dissertation takes a different approach. It introduces support vector machines for learning text classifiers from a theoretical perspective.
Thorsten Joachims

Chapter 5. Efficient Performance Estimators for SVMs

Abstract
Predicting the generalization performance of a learner is one of the central goals of learning theory. The previous chapter approached this question based on an intensional description of the learning task. However, such a model is necessarily coarse, since it operates on a high level of abstraction. Training data can give more details about a learning task than an intensional model with only a few parameters. This chapter explores the problem of predicting the generalization performance of an SVM after training data becomes available.
Thorsten Joachims

Part Methods

Chapter 6. Inductive Text Classification

Abstract
After giving the theoretical motivation and justification for the maximum-margin approach to text classification in the previous two chapters, this chapter evaluates its empirical performance. It also addresses practical issues related to selecting a good representation and an appropriate parameter setting.
Thorsten Joachims

Chapter 7. Transductive Text Classification

Abstract
For many practical uses of text classification, it is crucial that the learner be able to generalize well using little training data. A news-filtering service, for example, requiring a hundred days’ worth of training data is unlikely to please even the most patient users. The work presented in the following tackles the problem of learning from small training samples by taking a transductive [Vapnik, 1998], instead of an inductive approach. In the inductive setting the learner tries to induce a decision function which has a low error rate on the whole distribution of examples for the particular learning task. Often, this setting is unnecessarily complex. In many situations we do not care about the particular decision function, but rather that we classify a given set of examples (i.e. a test set) with as few errors as possible. This is the goal of transductive inference.
Thorsten Joachims

Part Algorithms

Chapter 8. Training Inductive Support Vector Machines

Abstract
Training a support vector machine (SVM) leads to a quadratic optimization problem with bound constraints and one linear equality constraint. Despite the fact that this type of problem is well understood in principle, there are many issues to be considered in designing an SVM learner. In particular, for large learning tasks with many training examples, off-the-shelf optimization techniques for general quadratic programs such as Newton, Quasi Newton, etc., quickly become intractable in their memory and time requirements.
Thorsten Joachims

Chapter 9. Training Transductive Support Vector Machines

Abstract
Chapter 7 shows that a transductive approach to text classification can lead to improved predictive performance. Especially when the number of labeled training examples is small and the test set is large, a transductive SVM (TSVM) can offer a substantial benefit over an inductive SVM. However, the problem of computational efficiency in training transductive SVMs has not been considered yet.
Thorsten Joachims

Chapter 10. Conclusions

Abstract
From the viewpoint of the application domain, this dissertation presents a new machine-learning approach to the problem of learning text classifiers from examples. It is not primarily about methods, nor primarily about theory, nor primarily about algorithms. Rather, it addressed all relevant aspects of this particular class of learning problems. It is the first approach to learning text classifiers from examples
  • that is computationally efficient,
  • for which there is a justified learning theory that describes its mechanics with respect to text classification, and
  • that performs well and robustly in practice.
Thorsten Joachims

Backmatter

Weitere Informationen