nach oben

2005 | Buch

Kapitel lesen Erstes Kapitel lesen

Support Vector Machines: Theory and Applications

herausgegeben von: Professor Lipo Wang

Verlag: Springer Berlin Heidelberg

Buchreihe : Studies in Fuzziness and Soft Computing

Enthalten in: Springer Professional "Wirtschaft+Technik" , Springer Professional "Technik" , Springer Professional "Wirtschaft"

Einloggen, um Zugang zu erhalten

Über dieses Buch

The support vector machine (SVM) has become one of the standard tools for machine learning and data mining. This carefully edited volume presents the state of the art of the mathematical foundation of SVM in statistical learning theory, as well as novel algorithms and applications. Support Vector Machines provides a selection of numerous real-world applications, such as bioinformatics, text categorization, pattern recognition, and object detection, written by leading experts in their respective fields.

Inhaltsverzeichnis

Frontmatter

Support Vector Machines – An Introduction

Abstract

This is a book about learning from empirical data (i.e., examples, samples, measurements, records, patterns or observations) by applying support vector machines (SVMs) a.k.a. kernel machines. The basic aim of this introduction¹ is to give, as far as possible, a condensed (but systematic) presentation of a novel learning paradigm embodied in SVMs. Our focus will be on the constructive learning algorithms for both the classification (pattern recognition) and regression (function approximation) problems. Consequently, we will not go into all the subtleties and details of the statistical learning theory (SLT) and structural risk minimization (SRM) which are theoretical foundations for the learning algorithms presented below. Instead, a quadratic programming based learning leading to parsimonious SVMs will be presented in a gentle way - starting with linear separable problems, through the classification tasks having overlapped classes but still a linear separation boundary, beyond the linearity assumptions to the nonlinear separation boundary, and finally to the linear and nonlinear regression problems. The adjective “parsimonious” denotes an SVM with a small number of support vectors. The scarcity of the model results from a sophisticated learning that matches the model capacity to the data complexity ensuring a good performance on the future, previously unseen, data.

V. Kecman

Multiple Model Estimation for Nonlinear Classification

Abstract

This chapter describes a new method for nonlinear classification using a collection of several simple (linear) classifiers. The approach is based on a new formulation of the learning problem called Multiple Model Estimation. Whereas standard supervised-learning learning formulations (such as regression and classification) seek to describe a given (training) data set using a single (albeit complex) model, under multiple model formulation the goal is to describe the data using several models, where each (simple) component model describes a subset of the data. We describe practical implementation of the multiple model estimation approach for classification. Several empirical comparisons indicate that the proposed multiple model classification (MMC) method (using linear component models) may yield comparable (or better) prediction accuracy than standard nonlinear SVM classifiers. In addition, the proposed approach has improved interpretation capabilities, and is more robust since it avoids the problem of SVM kernel selection.

Y. Ma, V. Cherkassky

Componentwise Least Squares Support Vector Machines

Abstract

This chapter describes componentwise Least Squares Support Vector Machines (LS-SVMs) for the estimation of additive models consisting of a sum of nonlinear components. The primal-dual derivations characterizing LS-SVMs for the estimation of the additive model result in a single set of linear equations with size growing in the number of data-points. The derivation is elaborated for the classification as well as the regression case. Furthermore, different techniques are proposed to discover structure in the data by looking for sparse components in the model based on dedicated regularization schemes on the one hand and fusion of the componentwise LS-SVMs training with a validation criterion on the other hand.

K. Pelckmans, I. Goethals, J.D. Brabanter, J.A.K. Suykens, B.D. Moor

Active Support Vector Learning with Statistical Queries

Abstract

The article describes an active learning strategy to solve the large quadratic programming (QP) problem of support vector machine (SVM) design in data mining applications. The learning strategy is motivated by the statistical query model. While most existing methods of active SVM learning query for points based on their proximity to the current separating hyperplane, the proposed method queries for a set of points according to a distribution as determined by the current separating hyperplane and a newly defined concept of an adaptive confidence factor. This enables the algorithm to have more robust and efficient learning capabilities. The confidence factor is estimated from local information using the k nearest neighbor principle. Effectiveness of the method is demonstrated on real life data sets both in terms of generalization performance and training time.

P. Mitra, C.A. Murthy, S.K. Pal

Local Learning vs. Global Learning: An Introduction to Maxi-Min Margin Machine

Abstract

We present a unifying theory of the Maxi-Min Margin Machine (M⁴) that subsumes the Support Vector Machine (SVM), the Minimax Probability Machine (MPM), and the Linear Discriminant Analysis (LDA). As a unified approach, M⁴ combines some merits from these three models. While LDA and MPM focus on building the decision plane using global information and SVM focuses on constructing the decision plane in a local manner, M⁴ incorporates these two seemingly different yet complementary characteristics in an integrative framework that achieves good classification accuracy. We give some historical perspectives on the three models leading up to the development of M⁴. We then outline the M⁴ framework and perform investigations on various aspects including the mathematical definition, the geometrical interpretation, the time complexity, and its relationship with other existing models.

K. Huang, H. Yang, I. King, M.R. Lyu

Active-Set Methods for Support Vector Machines

Abstract

This chapter describes an active-set algorithm for quadratic programming problems that arise from the computation of support vector machines (SVMs). Currently, most SVM optimizers implement working-set (decomposition)techniques because of their ability to handle large data sets. Although these show good results in general, active-set methods are a reasonable alternative - in particular if the data set is not too large, if the problem is ill-conditioned, or if high precision is needed. Algorithms are derived for classification and regression with both fixed and variable bias term. The material is completed by acceleration and approximation techniques as well as a comparison with other optimization methods in application examples.

M. Vogt, V. Kecman

Theoretical and Practical Model Selection Methods for Support Vector Classifiers

Abstract

In this chapter, we revise several methods for SVM model selection, deriving from different approaches: some of them build on practical lines of reasoning but are not fully justified by a theoretical point of view; on the other hand, some methods rely on rigorous theoretical work but are of little help when applied to real-world problems, because the underlying hypotheses cannot be verified or the result of their application is uninformative. Our objective is to sketch some light on these issues by carefully analyze the most well-known methods and test some of them on standard benchmarks to evaluate their effectiveness.

D. Anguita, A. Boni, S. Ridella, F. Rivieccio, D. Sterpi

Adaptive Discriminant and Quasiconformal Kernel Nearest Neighbor Classification

Abstract

Nearest neighbor classification assumes locally constant class conditional probabilities. This assumption becomes invalid in high dimensions due to the curse-of-dimensionality. Severe bias can be introduced under these conditions when using the nearest neighbor rule. We propose locally adaptive nearest neighbor classification methods to try to minimize bias. We use locally linear support vector machines as well as quasiconformal transformed kernels to estimate an effective metric for producing neighborhoods that are elongated along less discriminant feature dimensions and constricted along most discriminant ones. As a result, the class conditional probabilities can be expected to be approximately constant in the modified neighborhoods, whereby better classification performance can be achieved. The efficacy of our method is validated and compared against other competing techniques using a variety of data sets.

J. Peng, D.R. Heisterkamp, H.K. Dai

Improving the Performance of the Support Vector Machine: Two Geometrical Scaling Methods

Abstract

In this chapter, we discuss two possible ways of improving the performance of the SVM, using geometric methods. The first adapts the kernel by magnifying the Riemannian metric in the neighborhood of the boundary, thereby increasing separation between the classes. The second method is concerned with optimal location of the separating boundary, given that the distributions of data on either side may have different scales.

P. Williams, S. Wu, J. Feng

An Accelerated Robust Support Vector Machine Algorithm

Abstract

This chapter proposes an accelerated decomposition algorithm for robust support vector machine (SVM). Robust SVM aims at solving the overfitting problem when there is outlier in the training data set, which makes the decision surface less detored and results in sparse support vectors. Training of the robust SVM leads to a quadratic optimization problem with bound and linear constraint. Osuna provides a theorem which proves that the Standard SVM’s Quadratic Programming (QP) problem can be broken down into a series of smaller QP sub-problems. This chapter derives the Kuhn-Tucker condition and decomposition algorithm for the robust SVM. Furthermore, a pre-selection technique is incorporated into the algorithm to speed up the calculation. The experiment using standard data sets shows that the accelerated decomposition algorithm makes the training process more efficient.

Q. Song, W. Hu, X. Yang

Fuzzy Support Vector Machines with Automatic Membership Setting

Abstract

Support vector machines like other classification approaches aim to learn the decision surface from the input points for classification problems or regression problems. In many applications, each input points may be associated with different weightings to reflect their relative strengths to conform to the decision surface. In our previous research, we applied a fuzzy membership to each input point and reformulate the support vector machines to be fuzzy support vector machines (FSVMs) such that different input points can make different contributions to the learning of the decision surface.

C.-fu Lin, S.-de Wang

Iterative Single Data Algorithm for Training Kernel Machines from Huge Data Sets: Theory and Performance

Abstract

The chapter introduces the latest developments and results of Iterative Single Data Algorithm (ISDA) for solving large-scale support vector machines (SVMs) problems. First, the equality of a Kernel AdaTron (KA) method (originating from a gradient ascent learning approach) and the Sequential Minimal Optimization (SMO) learning algorithm (based on an analytic quadratic programming step for a model without bias term b) in designing SVMs with positive definite kernels is shown for both the nonlinear classification and the nonlinear regression tasks. The chapter also introduces the classic Gauss-Seidel procedure and its derivative known as the successive over-relaxation algorithm as viable (and usually faster) training algorithms. The convergence theorem for these related iterative algorithms is proven. The second part of the chapter presents the effects and the methods of incorporating explicit bias term b into the ISDA. The algorithms shown here implement the single training data based iteration routine (a.k.a. per-pattern learning). This makes the proposed ISDAs remarkably quick. The final solution in a dual domain is not an approximate one, but it is the optimal set of dual variables which would have been obtained by using any of existing and proven QP problem solvers if they only could deal with huge data sets.

V. Kecman, T.-M. Huang, M. Vogt

Kernel Discriminant Learning with Application to Face Recognition

Abstract

When applied to high-dimensional pattern classification tasks such as face recognition, traditional kernel discriminant analysis methods often suffer from two problems: (1) small training sample size compared to the dimensionality of the sample (or mapped kernel feature) space, and (2) high computational complexity. In this chapter, we introduce a new kernel discriminant learning method, which attempts to deal with the two problems by using regularization and subspace decomposition techniques. The proposed method is tested by extensive experiments performed on real face databases. The obtained results indicate that the method outperforms, in terms of classification accuracy, existing kernel methods, such as kernel Principal Component Analysis and kernel Linear Discriminant Analysis, at a significantly reduced computational cost.

J. Lu, K.N. Plataniotis, A.N. Venetsanopoulos

Fast Color Texture-Based Object Detection in Images: Application to License Plate Localization

Abstract

The current chapter presents a color texture-based method for object detection in images. A support vector machine (SVM) is used to classify each pixel in the image into object of interest and background based on localized color texture patterns. The main problem in this approach is high run-time complexity of SVMs. To alleviate this problem, two methods are proposed. Firstly, an artificial neural network (ANN) is adopted to make the problem linearly separable. Training an ANN on a given problem to achieve low training error and taking up to the last hidden layer replaces the kernel map in nonlinear SVMs, which is a major computational burden in SVMs. As such, the resulting color texture analyzer is embedded in the continuously adaptive mean shift algorithm (CAMShift), which then automatically identifies regions of interest in a coarse-to-fine manner. Consequently, the combination of CAMShift and SVMs produces robust and efficient object detection, as time-consuming color texture analyses of less relevant pixels are restricted, leaving only a small part of the input image to be analyzed. To demonstrate the validity of the proposed technique, a vehicle license plate (LP) localization system is developed and experiments conducted with a variety of images.

K.I. Kim, K. Jung, H.J. Kim

Support Vector Machines for Signal Processing

Abstract

This chapter deals with the use of the support vector machine (SVM) algorithm as a possible design method in the signal processing applications. It critically discusses the main difficulties related with its application to such a general set of problems. Moreover, the problem of digital channel equalization is also discussed in details since it is an important example of the use of the SVM algorithm in the signal processing.

D. Mattera

Cancer Diagnosis and Protein Secondary Structure Prediction Using Support Vector Machines

Abstract

In this chapter, we use support vector machines (SVMs) to deal with two bioinformatics problems, i.e., cancer diagnosis based on gene expression data and protein secondary structure prediction (PSSP). For the problem of cancer diagnosis, the SVMs that we used achieved highly accurate results with fewer genes compared to previously proposed approaches. For the problem of PSSP, the SVMs achieved results comparable to those obtained by other methods.

F. Chu, G. Jin, L. Wang

Gas Sensing Using Support Vector Machines

Abstract

In this chapter we deal with the use of Support Vector Machines in gas sensing. After a brief introduction to the inner workings of multisensor systems, the potential benefits of SVMs in this type of instruments are discussed. Examples on how SVMs are being evaluated in the gas sensor community are described in detail, including studies in their generalisation ability, their role as a valid variable selection technique and their regression performance. These studies have been carried out measuring different blends of coffee, different types of vapours (CO, O₂, acetone, hexanal, etc.) and even discriminating between different types of nerve agents.

J. Brezmes, E. Llobet, S. Al-Khalifa, S. Maldonado, J.W. Gardner

Application of Support Vector Machines in Inverse Problems in Ocean Color Remote Sensing

Abstract

Neural networks are widely used as transfer functions in inverse problems in remote sensing. However, this method still suffers from some problems such as the danger of over-fitting and may easily be trapped in a local minimum. This paper investigates the possibility of using a new universal approximator, support vector machine (SVM), as the nonlinear transfer function in inverse problem in ocean color remote sensing. A field data set is used to evaluate the performance of the proposed approach. Experimental results show that the SVM performs as well as the optimal multi-layer perceptron (MLP) and can be a promising alternative to the conventional MLPs for the retrieval of oceanic chlorophyll concentration from marine reflectance.

H. Zhan

Application of Support Vector Machine to the Detection of Delayed Gastric Emptying from Electrogastrograms

Abstract

The radioscintigraphy is currently the gold standard for gastric emptying test, but it involves radiation exposure and considerable expenses. Recent studies reported neural network approaches for the non-invasive diagnosis of delayed gastric emptying from the cutaneous electrogastrograms (EGGs). Using support vector machines, we show that this relatively new technique can be used for detection of delayed gastric emptying and is in fact able to improve the performance of the conventional neural networks.

H. Liang

Tachycardia Discrimination in Implantable Cardioverter Defibrillators Using Support Vector Machines and Bootstrap Resampling

Abstract

Accurate automatic discrimination between supraventricular (SV) and ventricular (V) tachycardia (T) in implantable cardioverter defibrillators (ICD) is still today a challenging problem. An interesting approach to this issue can be Support Vector Machine (SVM) classifiers, but their application in this scenario can exhibit limitations of technical (reduced data sets are only available) and clinical (the solution is inside a hard-to-interpret black box) nature. We first show that the use of bootstrap resampling can be helpful for training SVM with few available observations. Then, we perform a principal component analysis of the support vectors that leads to simple discrimination rules that are as effective as the black box SVM. Therefore, a low computational burden method can be stated for discriminating between SVT and VT in ICD.

J.L. Rojo-Álvarez, A. García-Alberola, A. Artés-Rodríguez, Á Arenal-Maíz

Titel: Support Vector Machines: Theory and Applications
herausgegeben von: Professor Lipo Wang
Verlag: Springer Berlin Heidelberg
Electronic ISBN: 978-3-540-32384-6
Print ISBN: 978-3-540-24388-5
DOI: https://doi.org/10.1007/b95439