Elsevier

Neural Networks

Volume 51, March 2014, Pages 67-79
Neural Networks

Lagrangian support vector regression via unconstrained convex minimization

https://doi.org/10.1016/j.neunet.2013.12.003Get rights and content

Abstract

In this paper, a simple reformulation of the Lagrangian dual of the 2-norm support vector regression (SVR) is proposed as an unconstrained minimization problem. This formulation has the advantage that its objective function is strongly convex and further having only m variables, where m is the number of input data points. The proposed unconstrained Lagrangian SVR (ULSVR) is solvable by computing the zeros of its gradient. However, since its objective function contains the non-smooth ‘plus’ function, two approaches are followed to solve the proposed optimization problem: (i) by introducing a smooth approximation, generate a slightly modified unconstrained minimization problem and solve it; (ii) solve the problem directly by applying generalized derivative. Computational results obtained on a number of synthetic and real-world benchmark datasets showing similar generalization performance with much faster learning speed in accordance with the conventional SVR and training time very close to least squares SVR clearly indicate the superiority of ULSVR solved by smooth and generalized derivative approaches.

Introduction

Support Vector Machines (SVMs) are state-of-the-art machine learning techniques based on Vapnik and Chervonenkis structural risk minimization principle (Vapnik, 2000). They are computationally powerful tools showing better generalization ability compared to other machine learning methods on a wide variety of real-world problems like face detection (Osuna, Freund, & Girosi, 1997), gene prediction (Guyon, Weston, Barnhill, & Vapnik, 2002), text categorization (Joachims, Ndellec, & Rouveriol, 1998) and image segmentation (Chen & Wang, 2005). It is well known that SVM formulation (Cristianini and Shawe-Taylor, 2000, Vapnik, 2000) can be posed as a quadratic programming problem (QPP) with linear inequality constraints which is a convex programming problem having a unique optimal solution. Although SVM has been initially proposed for classification, later it has been extended to regression (Vapnik, 2000).

The function approximation by regression is of great importance in many fields of research such as bioinformatics, control theory, economics, information science and signal processing. The main challenge in developing a useful regression model is to capture accurately the underlying functional relationship between the given inputs and their output values. Once the resulting model is obtained it can be used as a tool for analysis, simulation and prediction.

For a given set of data points, support vector regression (SVR) aims at determining a linear regressor where the input data are either considered in the original pattern space itself or taken into a higher dimensional feature space. With the introduction of ε-insensitive error loss function proposed by Vapnik (2000), SVR has emerged as a powerful paradigm of choice for regression because of its improved generalization performance in comparison to other popular machine learning methods like artificial neural networks.

Considering the minimization of the square of 2-norm of the slack variables instead of the usual 1-norm and maximizing the margin with respect to both the orientation and the relative location to the origin of the bounding parallel planes, Mangasarian and Musicant, 2001a, Mangasarian and Musicant, 2001b studied “equivalent” SVM formulations for classification problems resulting in positive-definite dual problems having only non-negative constraints. For the formulation of SVM as an unconstrained optimization problem whose objective function becomes not twice differentiable, a smoothing technique is adopted to derive a new SVM formulation called Smooth SVM (SSVM) in Lee and Mangasarian (2001). As its extension to ε-insensitive error loss based SVR, termed Smooth SVR (SSVR), we refer the reader to Lee, Hsieh, and Huang (2005). Also, on a finite Newton method of solution for SVM for classification with generalized Hessian approach, see Fung and Mangasarian (2003) and further on its extension to SVR, the interested reader is referred to Balasundaram and Kapil (2011). In addition, for the extension of the Active set SVM (ASVM) proposed for the classification (Mangasarian & Musicant, 2001b) to SVR, we refer Musicant and Feinberg (2004). For a review on optimization techniques used for training SVMs, see Shawe-Taylor and Sun (2011). Finally on the interesting work on multitask learning and multiview learning methods as extension of kernel based methods, we refer Ji and Sun (2013), Sun (2011) and Xie and Sun (2012).

It is well-known that the conventional ε-insensitive SVR is formulated as a convex quadratic minimization problem having 2m nonnegative variables and 2m linear inequality constraints whereas 2-norm SVR introduces only 2m linear inequality constraints, where m is the number of training points. In either case, more variables and constraints enlarge the problem size and therefore increase in the computational cost for solving the regression problem. Recently, a new nonparallel plane regressor termed as twin SVR (TSVR) is proposed in Peng (2010a) wherein a pair of QPPs, each having m number of constraints, is solved instead of solving a single QPP having 2m constraints as in the conventional SVR which makes TSVR works faster than SVR.

In our approach, the 2-norm SVR in dual is reformulated as a single, unconstrained minimization problem having only m variables. We term such reformulation as unconstrained Lagrangian SVR (ULSVR). The proposed ULSVR model can be solved using gradient based methods. However, since the objective function contains a term having non-smooth ‘plus’ function, two approaches are taken to solve the minimization problem: (i) two smooth approximation functions, studied in Lee and Mangasarian (2001) and Peng (2010b), are introduced and their slightly modified unconstrained minimization problems are solved; (ii) directly solve the minimization problem by applying generalized derivative. In both the approaches, by applying simple iterative methods, the zeros of the gradient are computed.

Finally, for the study of Lagrangian and implicit Lagrangian SVR solved using block matrices with the advantage of reduced training cost, see the work of Balasundaram and Kapil, 2010, Balasundaram and Kapil, 2011.

The effectiveness of the proposed approach in formulating ULSVR problem and solving it using various iterative methods is demonstrated by performing numerical experiments on a number of synthetic and real-world benchmark datasets and comparing their results with conventional SVR and least squares SVR. Comparable generalization performance in solving ULSVR with less computational training time clearly indicates its suitability on problems of interest.

The paper is organized as follows. In Section  2, formulations of the conventional and 2-norm SVR are introduced. Section  3 briefly dwells on least squares SVR. The proposed ULSVR formulation having only m variables and various algorithms used for solving it are described in Section  4. On a number of synthetic and real-world datasets numerical experiments are performed and the results obtained by ULSVR are compared with SVR and least squares SVR in Section  5. Finally we conclude our work in Section  6.

In this work, all vectors are considered as column vectors. For any two vectors x,y in n, their inner product will be denoted by xty where xt is the transpose of x. When x is orthogonal to y we write xy. The 2-norm of a vector x will be denoted by, ||x||. For x=(x1,,xn)tn, the plus function x+ is defined as: (x+)i=max{0,xi}, where i=1,,n. Further we define the step function x as: (x)i=1 for xi>0, (x)i=0 if xi<0 and (x)i=0.5 when xi=0. The identity matrix of appropriate size is denoted by I and the diagonal matrix of order n whose diagonal elements become the components of the vector xn by diag(x). The column vector of ones of dimension m is denoted by e. If f is a real valued function of the variable x=(x1,,xn)tn then its gradient vector and Hessian matrix are defined by, f=(fx1,,fxn)t and 2f=(2f/xixj)i,j=1,,n respectively.

Section snippets

Support vector regression (SVR)

In this section, we briefly describe the conventional SVR and 2-norm SVR formulations.

Assume that a set of input samples {(xi,yi)}i=1,2,,m be given where for each training example xin, let yi be its corresponding observed value. Further, let the training examples be represented by a matrix Am×n whose ith row is defined to be the row vector xit and the vector of observed values be denoted by y=(y1,,ym)t.

The primary goal of SVR is in seeking a nonlinear regression function by mapping the

Least squares support vector regression (LS-SVR)

In this section, we give a brief introduction to the least squares SVR (LS-SVR). Unlike in the SVR formulation (1) where inequality constraints are considered, equality constraints are used to determine the nonlinear regressor in LS-SVR formulation minw,b,ξ12wtw+C2i=1mξi2 subject to

yi=wtφ(xi)+b+ξi,i=1,2,,m where the vector w and the scalar b are the unknowns; ξ=(ξ1,,ξm)t is the residual vector; C>0 is the regularization parameter and φ(.) is a nonlinear feature mapping.

The primal LS-SVR

Lagrangian dual of 2-norm SVR as unconstrained minimization problem

In this section, we reformulate the dual problem (5) as an unconstrained minimization problem having m number of unknown variables only and obtain its solution by applying iterative methods.

Since u1tu2=0 holds at optimality (Musicant & Feinberg, 2004), adding (2u1tu2) to the second term of the objective function in (5), problem (5) can be rewritten as

minu1,u2m12[(u1u2)tK(G,Gt)(u1u2)]+12C(u1tu1+u2tu22u1tu2)yt(u1u2)+εet(u1+u2) subject to 0u1u20. Now, by defining u=u1u2 or equivalently

Experimental results

In this section, we investigate the performance of the proposed ULSVR, defined by (11) and solved by gradient based iterative algorithms: SULSVR1, SULSVR2, SULSVR3, NULSVR and GULSVR, by comparing their results in terms of accuracy and learning time with that of the conventional SVR and LS-SVR on a number of synthetic and publicly available benchmark real-world datasets.

All experiments were carried-out in MATLAB R2010a environment on a PC running on Windows XP OS with 32 bit, 2.27 GHz Intel(R)

Conclusions

By a simple reformulation of the Lagrangian dual of SVR, a novel SVR formulation is proposed in this work as an unconstrained minimization problem with the key advantage being the number of unknown variables is equal to the number of input data points. It is further proposed to solve the unconstrained minimization problem using iterative algorithms derived by considering smoothing and generalized derivative approaches. Unlike solving a quadratic programming problem of SVR, all the iterative

Acknowledgments

The authors are extremely thankful to the learned referees for their critical and constructive comments that greatly improved the earlier version of the paper.

References (32)

  • G.E.P. Box et al.

    Time series analysis: forecasting and control

    (1976)
  • N. Cristianini et al.

    An introduction to support vector machines and other kernel based learning method

    (2000)
  • J. Demsar

    Statistical comparisons of classifiers over multiple data sets

    Journal of Machine Learning Research

    (2006)
  • G. Fung et al.

    A feature selection Newton method for support vector machine classification

    Computational optimization and Applications

    (2004)
  • S. Garcia et al.

    An extension on “Statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons

    Journal of Machine Learning Research

    (2008)
  • G.H. Golub et al.

    Matrix computations

    (1996)
  • Cited by (36)

    • A novel multi-innovation gradient support vector machine regression method

      2022, ISA Transactions
      Citation Excerpt :

      In the SVM training process, the solving of the constrained quadratic programming issue is required, and the number of constraints in this problem is equal to the number of samples. Therefore, for large-capacity samples, this method requires a lot of work and a long training time [12,13]. The least square support vector machine (LS-SVM) is developed based on the standard SVM and is the least square version of SVM [14–16].

    • Valley-loss regular simplex support vector machine for robust multiclass classification

      2021, Knowledge-Based Systems
      Citation Excerpt :

      Support vector machine (SVM) [1,2] is a typical statistical-learning based approach for classification, from which many variants have been derived and applied successfully to a wide scope of fields [3–6].

    • Structural improved regular simplex support vector machine for multiclass classification

      2020, Applied Soft Computing Journal
      Citation Excerpt :

      As an effective statistical learning tool, support vector machine (SVM) [1,2] has been extensively studied and commonly used [3–6].

    View all citing articles on Scopus
    View full text