Multiclass support vector classification via coding and regression

doi:10.1016/j.neucom.2009.11.005

Neurocomputing

Volume 73, Issues 7–9, March 2010, Pages 1501-1512

https://doi.org/10.1016/j.neucom.2009.11.005 Get rights and content

Abstract

The multiclass classification problem is considered and resolved through coding and regression. There are various coding schemes for transforming class labels into response scores. An equivalence notion of coding schemes is developed, and the regression approach is adopted for extracting a low-dimensional discriminant feature subspace. This feature subspace can be a linear subspace of the column span of original input data or kernel-mapped feature data. The classification training and prediction are carried out in this feature subspace using a linear classifier, which lead to a simple and computationally light but yet powerful toolkit for classification. Experimental results, including prediction ability and CPU time comparison with LIBSVM, show that the regression-based approach is a competent alternative for the multiclass problem.

Introduction

Over the last decade, there have been a great interest and successful development of support vector machines (SVMs) for classification [7], [5], [11]. SVMs are originally designed for binary classification. There are two commonly seen multiclass extensions for SVMs. One is the composition type methods built upon a series of binary classifiers, e.g., the one-against-one, one-against-rest and error correcting output codes [12], [1], [8], [10], [18], and the other is the single machine type methods, often huge and solved in one optimization formulation [37], [4], [9], [24], [28]. Comparison studies and discussions on compositions of binary classifiers and single machine approaches can be found in [22], [33]. Based on their findings, there is no universally dominant classification rule for the multiclass problems, and different methods have their own merits and advantages. Thus, it allows the room for exploration of alternative approaches.

In this article, we propose an alternative approach based on regression concept. The time-honored Fisher linear discriminant analysis (FLDA) separates data classes by projecting input attributes to the eigen-space of the between-class covariance (scaled by the within-class covariance). In the binary classification case, FLDA can be solved via a multiresponse regression [14], [30], [2] by encoding the binary class labels into numeric responses ${n_{1} / n, - n_{2} / n}$ , where $n_{1}$ and $n_{2}$ are class sizes and $n = n_{1} + n_{2}$ . Hastie et al. [20] have further extended the FLDA to the nonlinear and multiclass classification via a penalized regression setting, and they named it the “flexible discriminant analysis (FDA)”. The FDA is based on encoding the class labels into response scores and then a nonparametric regression technique, such as MARS (multivariate additive regression splines), neural networks, or else, is used to fit the response scores. A particular encoding scheme “optimal scoring” is proposed in FDA. Later [31], [34] have adopted the same FDA approach, but have replaced the conventional nonparametric regression technique with a kernel trick, and their approach is named kernel discriminant analysis (KDA). The KDA [34] is solved by an EM algorithm.

Inspired by the above-mentioned regression approaches and the successful development of SVMs, we take both ideas and adopt the multiresponse support vector regression (mSVR) for the multiclass classification problem. Some preceding works on support vector classification can also be interpreted as ridge regression applied to classification problems, e.g., the proximal SVM [16], [17] and the regularized least squares SVM [36]. Our mSVR for classification consists of three major steps. The first is to encode the class labels into multiresponse scores. Next, the regression fit of scores on kernel-transformed feature inputs is carried out to extract a low-dimensional discriminant feature subspace. The final step is a classification rule to transform (i.e., to decode) the mapped values in this low-dimensional discriminant subspace back to class labels. The standard kernel Fisher discriminant analysis and also SVM solve the classification problems on a high-dimensional feature space. Through the mSVR, we can extract the information of the attributes into a $(J - 1) -dimensional$ feature subspace (J is the number of classes), which can accelerate the speed of classification training and decrease the influence due to noise. We will give a unified view of different coding schemes. The low-dimensional discriminant feature subspace generated by different coding schemes with long enough code length will be identical, which introduces the notion of equivalence of codes. We will prove this equivalence theoretically and also confirm it by our numerical experiments. The regression step can be viewed as a feature extraction to make the final classification (decoding) step computationally light and easy. The nonlinear structure between different classes of data patterns will be embedded in the extracted features because of the kernel transformation. Thus, we can apply any feasible linear learning algorithm to the data images in this low-dimensional discriminant feature subspace. Numerical tests and comparisons show that our framework, regression setup combined with a linear discriminant algorithm, is an easy and yet competent alternative to the multiclass classification problem.

The rest of the article is organized as follows. In Section 2, we introduce the framework of mSVR for classification. We describe the major steps of our regression approach including kernelization, encoding the class labels, a regularized least-square-based mSVR algorithm and the principle for decoding. In Section 3, we develop some notions and properties of equivalent codes and scores in the discriminant analysis context. In Section 4, implementation issues including model selection and the choice of base classifiers are discussed. Experimental results are provided to demonstrate the efficiency of our proposal and to illustrate numerical properties of different coding schemes. Concluding remarks are in Section 5. All proofs are in the Appendix.

Section snippets

Classification by multiresponse regression

Consider the problem of multiclass classification with J classes based on d measurements of input attributes $x \in R^{d \times 1}$ . Denote the membership set by $J = {1, 2, \dots, J}$ and each individual membership by $g \in J$ . Suppose we have training data ${(x_{i}, g_{i}) \in R^{d \times 1} \times J}_{i = 1}^{n}$ . Our goal is to construct a classification rule which, given a new input, can correctly predict the associated class label of this new input. Aside from various support vector approaches mentioned in Section 1 originating from the machine learning

Encoding and equivalence class of codes

In this section we introduce several existing coding schemes to encode the class labels. We also unify them under the notions of equivalence of codes and scores in the context of discriminant analysis. We refer the reader to [21] for general theory of discrete codes and to [8], [10] for continuous codes.

Experimental study

Data sets and experimental setting. The following data sets are used for experimental study: ionosphere, Iris, wine, glass, segment, image, dna, satimage and pendigits, where ionosphere, Iris, wine, glass, image and pendigits are from the UCI Repository machine learning databases [3], and segment, dna and satimage are from the UCI statlog collection. Data structures are characterized in Table 1. For data sets without a given training/testing split, we divide them into 10 folds for

Concluding remarks

In this article the mSVR is proposed for the multiclass classification problem. The class labels are encoded into multiresponse scores and then the regression of scores on kernel inputs is used to extract a low-dimensional discriminant feature subspace, which is spanned by the regression coefficient variates. The discriminant feature subspace generated by different coding schemes with long enough code length will be identical, which introduces the notion of equivalence of codes. Data are then

Acknowledgments

The authors thank Chuhsing Kate Hsiao and Chii-Ruey Hwang for helpful comments.

Pei-Chun Chen received her B.S. degree from National Dong-Hwa University in 2001, and the M.S. and Ph.D. degrees from Graduate Institute of Epidemiology, National Taiwan University in 2003 and 2008, respectively. Currently, she is a Post-doctoral Fellow in Bioinformatics and Biostatistics Core, Research Center for Medical Excellence, National Taiwan University. Her research interests are in biostatistics, Bayesian statistics, bioinformatics and machine learning.

References (37)

C.M. Huang et al.
Model selection for support vector machines via uniform design
Comput. Stat. Data Anal.
(2007)
Y. Liu et al.
A novel and quick SVM-based multi-class classifier
Pattern Recognition
(2006)
E.L. Allwein et al.
Reducing multiclass to binary: a unifying approach for margin classifiers
J. Mach. Learn. Res.
(2001)
T.W. Anderson
An Introduction to Multivariate Statistical Analysis
(2003)
A. Asuncion, D.J. Newman, UCI Machine Learning Repository, University of California, Irvine, School of Information and...
E.J. Bredensteiner et al.
Multicategory classification by support vector machines
Comput. Optim. Appl.
(1999)
C.J.C. Burges
A tutorial on support vector machines for pattern recognition
Data Min. Knowl. Disc.
(1998)
C.C. Chang, C.J. Lin, LIBSVM: a library for support vector machines, 2001,...
C. Cortes et al.
Support vector networks
Mach. Learn.
(1995)
K. Crammer, Y. Singer, Improved output coding for classification using continuous relaxation, in: Proceeding of the...

K. Crammer et al.

On the algorithmic implementation of multiclass kernel-based vector machines

J. Mach. Learn. Res.

(2001)

K. Crammer et al.

On the learnability and design of output codes for multiclass problems

Mach. Learn.

(2002)

N. Cristianini et al.

An Introduction to Support Vector Machines

(2000)

T.G. Dietterich et al.

Solving multiclass learning problems via error-correcting output codes

J. Artif. Intell. Res.

(1995)

K.T. Fang et al.

Number-Theoretic Methods in Statistics

(1994)

R.A. Fisher

The use of multiple measurements in taxonomic problems

Ann. Eugen.

(1936)

J.H. Friedman

Multivariate adaptive regression splines (with discussion)

Ann. Stat.

(1991)

G.M. Fung, O.L. Mangasarian, Proximal support vector machine classifiers, in: Proceedings KDD-2001: Knowledge Discovery...

Cited by (14)

In situ identification of shearing parameters for loose lunar soil using least squares support vector machine
2016, Aerospace Science and Technology
Citation Excerpt :
It uses a set of linear equations instead of a quadratic programming problem and reduces the complexity of the algorithm, thus extending the application of the SVM [38]. The LS_SVM can be used to solve regression problems with multiple outputs [39–41]. For the sake of completeness, a brief introduction is presented.
A method is presented for the online prediction of the terrain-shearing parameters for a wheeled Unmanned Ground Vehicles (UGVs) traversing on an unknown terrain. The method uses a trained multiple-output least squares support vector machine (LS_SVM) to map engineering data and predict the terrain-shearing parameters such as cohesion, internal friction angle and shear deformation modulus without requiring information on wheel sinkage. The predicted terrain-shearing parameters can be used to predict vehicle drawbar pull which can be used for trafficability prediction, traction control and performance optimization. Experiments were performed using a single-wheel soil bin to measure the sinkage, drawbar pull and torque for a griddle net wheel under different slip ratio. An additional experiment was performed under a continuous slip ratio from 0.2 to 0.6 with a wheel load of 50 N to validate the method. The experimental results show that the multiple output LS_SVM model can accurately predict the terrain-shearing parameters using the slip ratio, torque and wheel load without the need of wheel sinkage.
Color image classification and retrieval through ternary decision structure based multi-category TWSVM
2015, Neurocomputing
In this paper, we propose Ternary Decision Structure based multi-category twin support vector machines (TDS-TWSVM) classifier. Twin support vector machines (TWSVM) formulation deals with finding non-parallel plane classifier which is obtained by solving two related Quadratic Programming Problems (QPPs). The proposed TDS-TWSVM classifier is an extension of TWSVM so as to handle multi-class data and is more efficient in terms of training and testing time of classifiers. For a K-class problem, a balanced ternary structure requires $⌈ \log_{3} K ⌉$ comparisons for evaluating a test sample. The experimental results depict that TDS-TWSVM outperforms One-Against-All TWSVM (OAA-TWSVM) and binary tree-based TWSVM (TB-TWSVM) in terms of classification accuracy. We have shown the efficacy of the proposed algorithm via image classification and further for image retrieval. Experiments are performed on a varied range of benchmark image databases with 5-fold cross validation.
Determination of API gravity, kinematic viscosity and water content in petroleum by ATR-FTIR spectroscopy and multivariate calibration
2014, Fuel
Citation Excerpt :
The support vector is a machine learning method developed by Cortes and Vapnik [22], originally for solving binary classification problems. However, the technique was extended to handle multiclass problems [23,24] and regression [21,25–29]. Support vector regression (SVR) is machine learning based on statistical learning theory and seeks to maximize the ability to generalize using the structural risk minimization principle.
In this work, API gravity, kinematic viscosity and water content were determined in petroleum oil using Fourier transform infrared spectroscopy with attenuated total reflectance (FT-IR/ATR). Support vector regression (SVR) was used as the non-linear multivariate calibration procedure and partial least squares regression (PLS) as the linear procedure. In SVR models, the multiplication of the spectra matrix by support vectors resulted in information about the importance of the original variables. The most important variables in PLS models were attained by regression coefficients. For API gravity and kinematic viscosity these variables correspond to vibrations around 2900 cm⁻¹, 1450 cm⁻¹ and below to 720 cm⁻¹ and for water content, between 3200 and 3650 cm⁻¹, around 1650 cm^-1 and below to 900 cm⁻¹. The SVR model produced a root mean square error of prediction (RMSEP) of 0.25 for API gravity, 22 mm² s⁻¹ for kinematic viscosity and 0.26% v/v for water content. For PLS models, the RMSEP values for API gravity was 0.38 mm² s⁻¹, for kinematic viscosity was 27 mm² s⁻¹ and for water content was 0.34%. Using the F-test at 95% of confidence it was concluded that the SVR model produced better results than PLS for API gravity determination. For kinematic viscosity and water content the two methods were equivalent. However, a non-linear behavior in the PLS kinematic viscosity model was observed.
Towards enhancing centroid classifier for text classification-A border-instance approach
2013, Neurocomputing
Citation Excerpt :
The basic idea of CC is to use all the training instances belonging to the same categories to construct centroid vectors, and assign a new document to the category with the most similar centroid. While being praised for its simplicity and efficiency, CC has long been criticized for its relatively low classification accuracies compared with some state-of-the-art algorithms such as SVMs [33,38,3], especially when the data have classes in non-spherical, disconnected, or other irregular shapes. People recognize that CC's inferior performances partially attribute to the inappropriately constructed centroid vectors [8].
Text classification/categorization (TC) is to assign new unlabeled natural language documents to the predefined thematic categories. Centroid-based classifier (CC) has been widely used for TC because of its simplicity and efficiency. However, it has also been long criticized for its relatively low classification accuracy compared with state-of-the-art classifiers such as support vector machines (SVMs). In this paper, we find that for CC using only border instances rather than all instances to construct centroid vectors can obtain higher generalization accuracy. Along this line, we propose Border-Instance-based Iteratively Adjusted Centroid Classifier (IACC_BI), which relies on the border instances found by some routines, e.g. 1-Nearest-and-1-Furthest-Neighbors strategy, to construct centroid vectors for CC. IACC_BI then iteratively adjusts the initial centroid vectors according to the misclassified training instances. Our extensive experiments on 11 real-world text corpora demonstrate that IACC_BI improves the performance of centroid-based classifiers greatly and obtains classification accuracy competitive to the well-known SVMs, while at significantly lower computational costs.
Nonparametric Functional Graphical Modeling Through Functional Additive Regression Operator
2022, Journal of the American Statistical Association
Robust and Sample Optimal Algorithms for PSD Low-Rank Approximation
2019, arXiv

View all citing articles on Scopus

Kuang-Yao Lee received the B.S. and M.S. degrees in the Mathematics Department of National Taiwan University, in 2002 and 2005. Currently a Ph.D. student in the Department of Statistics in Pennsylvania State University. Research interests include linear and nonlinear dimensionality reduction methods and machine learning.

Tsung-Ju Lee received a B.S. degree in Mathematics from TungHai University, Taiwan in 2000 and the M.S. degree in Applied Mathematics from National Chiao Tung University, Taiwan in 2002. Currently, he is working towards the Ph.D. degree in the Department of Computer Science, National Chiao Tung University, Taiwan. His current research interests include machine learning, data mining and various applications, especially in network security, e-learning and computational biology.

Yuh-Jye Lee received his Master degree in Applied Mathematics from the National Tsing Hua University, Taiwan in 1992 and Ph.D. degree in computer sciences from the University of Wisconsin, Madison in 2001. He is an associate professor in the Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology. His research interests are in machine learning, data mining, optimization, information security and operations research. Dr. Lee developed new algorithms for large data mining problems such as classification and regression problem, abnormal detection and dimension reduction. Using the methodologies such as support vector machines, reduced kernel method, chunking and smoothing techniques allow us to get a very robust solution (prediction) for a large dataset. These methods have been applied to solve many real world problems such as intrusion detection system (IDS), face detection, microarray gene expression analysis and breast cancer diagnosis and prognosis.

Su-Yun Huang received the B.S. and M.S. degrees from Department of Mathematics, National Taiwan University, in 1983 and 1985, respectively, and the Ph.D. degree from Department of Statistics, Purdue University in 1990. She is currently a Research Fellow in the Institute of Statistical Science, Academia Sinica, Taiwan. Her research interests are mainly on mathematical statistics.

View full text

Multiclass support vector classification via coding and regression

Abstract

Introduction

Section snippets

Classification by multiresponse regression

Encoding and equivalence class of codes

Experimental study

Concluding remarks

Acknowledgments

Comput. Stat. Data Anal.

Pattern Recognition

Reducing multiclass to binary: a unifying approach for margin classifiers

J. Mach. Learn. Res.

An Introduction to Multivariate Statistical Analysis

Multicategory classification by support vector machines

Comput. Optim. Appl.

A tutorial on support vector machines for pattern recognition

Data Min. Knowl. Disc.

Support vector networks

Mach. Learn.

On the algorithmic implementation of multiclass kernel-based vector machines

J. Mach. Learn. Res.

On the learnability and design of output codes for multiclass problems

Mach. Learn.

An Introduction to Support Vector Machines

Solving multiclass learning problems via error-correcting output codes

J. Artif. Intell. Res.

Number-Theoretic Methods in Statistics

The use of multiple measurements in taxonomic problems

Ann. Eugen.

Multivariate adaptive regression splines (with discussion)

Ann. Stat.