main-content

## Über dieses Buch

This book is the outgrowth of both a research program and a graduate course at the University of California, Irvine (UCI) since 1966, as well as a graduate course at the California State Polytechnic University, Pomona (Cal Poly Pomona). The research program, part of the UCI Pattern Recogni­ tion Project, was concerned with the design of trainable classifiers; the graduate courses were broader in scope, including subjects such as feature selection, cluster analysis, choice of data set, and estimates of probability densities. In the interest of minimizing overlap with other books on pattern recogni­ tion or classifier theory, we have selected a few topics of special interest for this book, and treated them in some depth. Some of this material has not been previously published. The book is intended for use as a guide to the designer of pattern classifiers, or as a text in a graduate course in an engi­ neering or computer science curriculum. Although this book is directed primarily to engineers and computer scientists, it may also be of interest to psychologists, biologists, medical scientists, and social scientists.

## Inhaltsverzeichnis

### Chapter 1. Introduction and Overview

Abstract
Just as the successful invention of the airplane stimulated the study of aerodynamics, the modern digital computer has stimulated the study of intelligence and learning.
Jack Sklansky, Gustav N. Wassel

### Chapter 2. Linearly Separable Classes

Abstract
Much of the analysis of trainable classifiers is simplified if it is known that every pair of class regions encountered by the classifier can be separated by a linear hypersurface. In augmented feature space such a surface is determined by an equation of the form
$$\text{v}^T \text{y}\;\text{=}\;\text{0}\text{.}$$
(2.1)
Jack Sklansky, Gustav N. Wassel

### Chapter 3. Nonlinear Classifiers

Abstract
The classifier design techniques we have discussed so far have been, for the most part, restricted to linear decision surfaces (hyperplanes). These techniques are useful only in cases where the error probability obtained by the best linear decision surface—a hyperplane—is acceptably small. In many practical situations, no such hyperplane exists, so that a nonlinear decision surface is necessary. An example of such a situation is illustrated in Figure 3.1 for a two-dimensional feature space. Here the smallest possible error probability obtainable by any hyperplane (straight line in this illustration) is 0.25. On the other hand, the pair of hyperplanes shown in the figure achieves zero error.
Jack Sklansky, Gustav N. Wassel

### Chapter 4. Loss Functions and Stochastic Approximation

Abstract
Gradient descent as a technique for finding the minimum of a loss function J(v) was introduced in Section 2.10. Recall that the technique consists of finding the gradient ∇ J(v) and then adjusting the parameter vector v so that the change in v is in the direction of the negative of the gradient.
Jack Sklansky, Gustav N. Wassel

### Chapter 5. Linear Classifiers for Nonseparable Classes

Abstract
In this chapter we describe several training procedures based on the gradient and stochastic approximation techniques of the preceding chapter. Although our discussion in this chapter is restricted to two-class cases, the techniques may be extended to multiple-class cases by using the concepts of Section 2.6. The training procedures of this chapter apply to pairs of classes that are linearly nonseparable, i.e., that are not linearly separable, as well as those that are linearly separable. Linearly nonseparable classes include pairs of classes that are separable but not by a hyperplane, as well as pairs of classes that overlap.
Jack Sklansky, Gustav N. Wassel

### Chapter 6. Markov Chain Training Models for Nonseparable Classes

Abstract
In Chapter 2, we derived the convergence under separability property of the proportional increment training procedure. As interesting as this property is, convergence is usually of less practical importance than a good estimate of the performance of the classifier as a function of training length. Indeed, it is sometimes possible to obtain a noncovergent training procedure that performs better over a finite training period than a convergent training procedure on the same input data. This situation has motivated much of the research on the learning dynamics of trainable classifiers.
Jack Sklansky, Gustav N. Wassel

### Chapter 7. Continuous-State Models

Abstract
In Chapter 6 we showed how the transient and steady-state behavior of single-feature training procedures can be analyzed by Markov chains. However in most training procedures the state space is multidimensional and the density of states is indefinitely large, thereby making finite Markov chain models inappropriate. Consider, for example, the proportional increment training procedure described by Equations (2.13) and (2.14) in Chapter 2. (In this chapter, we replace r(n) by the random variable R (n).) The augmented weight vector V(n) takes the role of a state, in the sense that the future statistical behavior of V(n + M), m ≥ 0, is completely determined from a knowledge of the present weight vector (i.e., V(n)), the constituent densities, and the training procedure. The motion of V(n) is similar to Brownian motion, as illustrated in Figure 7.1, so that the set of possible values of V(n) is infinitely dense. Thus V(n) cannot be a Markov chain in this case. Yet V(n) is a Markov process, i.e.,
$$P[\text{V}(n + 1)\left| {\text{V}(0), \ldots ,\text{V}(n)] = P[\text{V}(n + 1)} \right|\text{V}(n)],$$
since (a) the proportional increment training procedure generates V(n + 1) from ω(n), R(n), and V(n), (b) the statistics of R(n) depend on V(n) and ω(n), and (c) the random variables {ω(n)} are statistically independent.
Jack Sklansky, Gustav N. Wassel

### Backmatter

Weitere Informationen