Elsevier

Neurocomputing

Volume 73, Issues 7–9, March 2010, Pages 1501-1512
Neurocomputing

Multiclass support vector classification via coding and regression

https://doi.org/10.1016/j.neucom.2009.11.005Get rights and content

Abstract

The multiclass classification problem is considered and resolved through coding and regression. There are various coding schemes for transforming class labels into response scores. An equivalence notion of coding schemes is developed, and the regression approach is adopted for extracting a low-dimensional discriminant feature subspace. This feature subspace can be a linear subspace of the column span of original input data or kernel-mapped feature data. The classification training and prediction are carried out in this feature subspace using a linear classifier, which lead to a simple and computationally light but yet powerful toolkit for classification. Experimental results, including prediction ability and CPU time comparison with LIBSVM, show that the regression-based approach is a competent alternative for the multiclass problem.

Introduction

Over the last decade, there have been a great interest and successful development of support vector machines (SVMs) for classification [7], [5], [11]. SVMs are originally designed for binary classification. There are two commonly seen multiclass extensions for SVMs. One is the composition type methods built upon a series of binary classifiers, e.g., the one-against-one, one-against-rest and error correcting output codes [12], [1], [8], [10], [18], and the other is the single machine type methods, often huge and solved in one optimization formulation [37], [4], [9], [24], [28]. Comparison studies and discussions on compositions of binary classifiers and single machine approaches can be found in [22], [33]. Based on their findings, there is no universally dominant classification rule for the multiclass problems, and different methods have their own merits and advantages. Thus, it allows the room for exploration of alternative approaches.

In this article, we propose an alternative approach based on regression concept. The time-honored Fisher linear discriminant analysis (FLDA) separates data classes by projecting input attributes to the eigen-space of the between-class covariance (scaled by the within-class covariance). In the binary classification case, FLDA can be solved via a multiresponse regression [14], [30], [2] by encoding the binary class labels into numeric responses {n1/n,-n2/n}, where n1 and n2 are class sizes and n=n1+n2. Hastie et al. [20] have further extended the FLDA to the nonlinear and multiclass classification via a penalized regression setting, and they named it the “flexible discriminant analysis (FDA)”. The FDA is based on encoding the class labels into response scores and then a nonparametric regression technique, such as MARS (multivariate additive regression splines), neural networks, or else, is used to fit the response scores. A particular encoding scheme “optimal scoring” is proposed in FDA. Later [31], [34] have adopted the same FDA approach, but have replaced the conventional nonparametric regression technique with a kernel trick, and their approach is named kernel discriminant analysis (KDA). The KDA [34] is solved by an EM algorithm.

Inspired by the above-mentioned regression approaches and the successful development of SVMs, we take both ideas and adopt the multiresponse support vector regression (mSVR) for the multiclass classification problem. Some preceding works on support vector classification can also be interpreted as ridge regression applied to classification problems, e.g., the proximal SVM [16], [17] and the regularized least squares SVM [36]. Our mSVR for classification consists of three major steps. The first is to encode the class labels into multiresponse scores. Next, the regression fit of scores on kernel-transformed feature inputs is carried out to extract a low-dimensional discriminant feature subspace. The final step is a classification rule to transform (i.e., to decode) the mapped values in this low-dimensional discriminant subspace back to class labels. The standard kernel Fisher discriminant analysis and also SVM solve the classification problems on a high-dimensional feature space. Through the mSVR, we can extract the information of the attributes into a (J1)-dimensional feature subspace (J is the number of classes), which can accelerate the speed of classification training and decrease the influence due to noise. We will give a unified view of different coding schemes. The low-dimensional discriminant feature subspace generated by different coding schemes with long enough code length will be identical, which introduces the notion of equivalence of codes. We will prove this equivalence theoretically and also confirm it by our numerical experiments. The regression step can be viewed as a feature extraction to make the final classification (decoding) step computationally light and easy. The nonlinear structure between different classes of data patterns will be embedded in the extracted features because of the kernel transformation. Thus, we can apply any feasible linear learning algorithm to the data images in this low-dimensional discriminant feature subspace. Numerical tests and comparisons show that our framework, regression setup combined with a linear discriminant algorithm, is an easy and yet competent alternative to the multiclass classification problem.

The rest of the article is organized as follows. In Section 2, we introduce the framework of mSVR for classification. We describe the major steps of our regression approach including kernelization, encoding the class labels, a regularized least-square-based mSVR algorithm and the principle for decoding. In Section 3, we develop some notions and properties of equivalent codes and scores in the discriminant analysis context. In Section 4, implementation issues including model selection and the choice of base classifiers are discussed. Experimental results are provided to demonstrate the efficiency of our proposal and to illustrate numerical properties of different coding schemes. Concluding remarks are in Section 5. All proofs are in the Appendix.

Section snippets

Classification by multiresponse regression

Consider the problem of multiclass classification with J classes based on d measurements of input attributes xRd×1. Denote the membership set by J={1,2,,J} and each individual membership by gJ. Suppose we have training data {(xi,gi)Rd×1×J}i=1n. Our goal is to construct a classification rule which, given a new input, can correctly predict the associated class label of this new input. Aside from various support vector approaches mentioned in Section 1 originating from the machine learning

Encoding and equivalence class of codes

In this section we introduce several existing coding schemes to encode the class labels. We also unify them under the notions of equivalence of codes and scores in the context of discriminant analysis. We refer the reader to [21] for general theory of discrete codes and to [8], [10] for continuous codes.

Experimental study

Data sets and experimental setting. The following data sets are used for experimental study: ionosphere, Iris, wine, glass, segment, image, dna, satimage and pendigits, where ionosphere, Iris, wine, glass, image and pendigits are from the UCI Repository machine learning databases [3], and segment, dna and satimage are from the UCI statlog collection. Data structures are characterized in Table 1. For data sets without a given training/testing split, we divide them into 10 folds for

Concluding remarks

In this article the mSVR is proposed for the multiclass classification problem. The class labels are encoded into multiresponse scores and then the regression of scores on kernel inputs is used to extract a low-dimensional discriminant feature subspace, which is spanned by the regression coefficient variates. The discriminant feature subspace generated by different coding schemes with long enough code length will be identical, which introduces the notion of equivalence of codes. Data are then

Acknowledgments

The authors thank Chuhsing Kate Hsiao and Chii-Ruey Hwang for helpful comments.

Pei-Chun Chen received her B.S. degree from National Dong-Hwa University in 2001, and the M.S. and Ph.D. degrees from Graduate Institute of Epidemiology, National Taiwan University in 2003 and 2008, respectively. Currently, she is a Post-doctoral Fellow in Bioinformatics and Biostatistics Core, Research Center for Medical Excellence, National Taiwan University. Her research interests are in biostatistics, Bayesian statistics, bioinformatics and machine learning.

References (37)

  • C.M. Huang et al.

    Model selection for support vector machines via uniform design

    Comput. Stat. Data Anal.

    (2007)
  • Y. Liu et al.

    A novel and quick SVM-based multi-class classifier

    Pattern Recognition

    (2006)
  • E.L. Allwein et al.

    Reducing multiclass to binary: a unifying approach for margin classifiers

    J. Mach. Learn. Res.

    (2001)
  • T.W. Anderson

    An Introduction to Multivariate Statistical Analysis

    (2003)
  • A. Asuncion, D.J. Newman, UCI Machine Learning Repository, University of California, Irvine, School of Information and...
  • E.J. Bredensteiner et al.

    Multicategory classification by support vector machines

    Comput. Optim. Appl.

    (1999)
  • C.J.C. Burges

    A tutorial on support vector machines for pattern recognition

    Data Min. Knowl. Disc.

    (1998)
  • C.C. Chang, C.J. Lin, LIBSVM: a library for support vector machines, 2001,...
  • C. Cortes et al.

    Support vector networks

    Mach. Learn.

    (1995)
  • K. Crammer, Y. Singer, Improved output coding for classification using continuous relaxation, in: Proceeding of the...
  • K. Crammer et al.

    On the algorithmic implementation of multiclass kernel-based vector machines

    J. Mach. Learn. Res.

    (2001)
  • K. Crammer et al.

    On the learnability and design of output codes for multiclass problems

    Mach. Learn.

    (2002)
  • N. Cristianini et al.

    An Introduction to Support Vector Machines

    (2000)
  • T.G. Dietterich et al.

    Solving multiclass learning problems via error-correcting output codes

    J. Artif. Intell. Res.

    (1995)
  • K.T. Fang et al.

    Number-Theoretic Methods in Statistics

    (1994)
  • R.A. Fisher

    The use of multiple measurements in taxonomic problems

    Ann. Eugen.

    (1936)
  • J.H. Friedman

    Multivariate adaptive regression splines (with discussion)

    Ann. Stat.

    (1991)
  • G.M. Fung, O.L. Mangasarian, Proximal support vector machine classifiers, in: Proceedings KDD-2001: Knowledge Discovery...
  • Cited by (14)

    • In situ identification of shearing parameters for loose lunar soil using least squares support vector machine

      2016, Aerospace Science and Technology
      Citation Excerpt :

      It uses a set of linear equations instead of a quadratic programming problem and reduces the complexity of the algorithm, thus extending the application of the SVM [38]. The LS_SVM can be used to solve regression problems with multiple outputs [39–41]. For the sake of completeness, a brief introduction is presented.

    • Determination of API gravity, kinematic viscosity and water content in petroleum by ATR-FTIR spectroscopy and multivariate calibration

      2014, Fuel
      Citation Excerpt :

      The support vector is a machine learning method developed by Cortes and Vapnik [22], originally for solving binary classification problems. However, the technique was extended to handle multiclass problems [23,24] and regression [21,25–29]. Support vector regression (SVR) is machine learning based on statistical learning theory and seeks to maximize the ability to generalize using the structural risk minimization principle.

    • Towards enhancing centroid classifier for text classification-A border-instance approach

      2013, Neurocomputing
      Citation Excerpt :

      The basic idea of CC is to use all the training instances belonging to the same categories to construct centroid vectors, and assign a new document to the category with the most similar centroid. While being praised for its simplicity and efficiency, CC has long been criticized for its relatively low classification accuracies compared with some state-of-the-art algorithms such as SVMs [33,38,3], especially when the data have classes in non-spherical, disconnected, or other irregular shapes. People recognize that CC's inferior performances partially attribute to the inappropriately constructed centroid vectors [8].

    View all citing articles on Scopus

    Pei-Chun Chen received her B.S. degree from National Dong-Hwa University in 2001, and the M.S. and Ph.D. degrees from Graduate Institute of Epidemiology, National Taiwan University in 2003 and 2008, respectively. Currently, she is a Post-doctoral Fellow in Bioinformatics and Biostatistics Core, Research Center for Medical Excellence, National Taiwan University. Her research interests are in biostatistics, Bayesian statistics, bioinformatics and machine learning.

    Kuang-Yao Lee received the B.S. and M.S. degrees in the Mathematics Department of National Taiwan University, in 2002 and 2005. Currently a Ph.D. student in the Department of Statistics in Pennsylvania State University. Research interests include linear and nonlinear dimensionality reduction methods and machine learning.

    Tsung-Ju Lee received a B.S. degree in Mathematics from TungHai University, Taiwan in 2000 and the M.S. degree in Applied Mathematics from National Chiao Tung University, Taiwan in 2002. Currently, he is working towards the Ph.D. degree in the Department of Computer Science, National Chiao Tung University, Taiwan. His current research interests include machine learning, data mining and various applications, especially in network security, e-learning and computational biology.

    Yuh-Jye Lee received his Master degree in Applied Mathematics from the National Tsing Hua University, Taiwan in 1992 and Ph.D. degree in computer sciences from the University of Wisconsin, Madison in 2001. He is an associate professor in the Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology. His research interests are in machine learning, data mining, optimization, information security and operations research. Dr. Lee developed new algorithms for large data mining problems such as classification and regression problem, abnormal detection and dimension reduction. Using the methodologies such as support vector machines, reduced kernel method, chunking and smoothing techniques allow us to get a very robust solution (prediction) for a large dataset. These methods have been applied to solve many real world problems such as intrusion detection system (IDS), face detection, microarray gene expression analysis and breast cancer diagnosis and prognosis.

    Su-Yun Huang received the B.S. and M.S. degrees from Department of Mathematics, National Taiwan University, in 1983 and 1985, respectively, and the Ph.D. degree from Department of Statistics, Purdue University in 1990. She is currently a Research Fellow in the Institute of Statistical Science, Academia Sinica, Taiwan. Her research interests are mainly on mathematical statistics.

    View full text