Lagrangian support vector regression via unconstrained convex minimization

doi:10.1016/j.neunet.2013.12.003

Neural Networks

Volume 51, March 2014, Pages 67-79

https://doi.org/10.1016/j.neunet.2013.12.003 Get rights and content

Abstract

In this paper, a simple reformulation of the Lagrangian dual of the 2-norm support vector regression (SVR) is proposed as an unconstrained minimization problem. This formulation has the advantage that its objective function is strongly convex and further having only $m$ variables, where $m$ is the number of input data points. The proposed unconstrained Lagrangian SVR (ULSVR) is solvable by computing the zeros of its gradient. However, since its objective function contains the non-smooth ‘plus’ function, two approaches are followed to solve the proposed optimization problem: (i) by introducing a smooth approximation, generate a slightly modified unconstrained minimization problem and solve it; (ii) solve the problem directly by applying generalized derivative. Computational results obtained on a number of synthetic and real-world benchmark datasets showing similar generalization performance with much faster learning speed in accordance with the conventional SVR and training time very close to least squares SVR clearly indicate the superiority of ULSVR solved by smooth and generalized derivative approaches.

Introduction

Support Vector Machines (SVMs) are state-of-the-art machine learning techniques based on Vapnik and Chervonenkis structural risk minimization principle (Vapnik, 2000). They are computationally powerful tools showing better generalization ability compared to other machine learning methods on a wide variety of real-world problems like face detection (Osuna, Freund, & Girosi, 1997), gene prediction (Guyon, Weston, Barnhill, & Vapnik, 2002), text categorization (Joachims, Ndellec, & Rouveriol, 1998) and image segmentation (Chen & Wang, 2005). It is well known that SVM formulation (Cristianini and Shawe-Taylor, 2000, Vapnik, 2000) can be posed as a quadratic programming problem (QPP) with linear inequality constraints which is a convex programming problem having a unique optimal solution. Although SVM has been initially proposed for classification, later it has been extended to regression (Vapnik, 2000).

The function approximation by regression is of great importance in many fields of research such as bioinformatics, control theory, economics, information science and signal processing. The main challenge in developing a useful regression model is to capture accurately the underlying functional relationship between the given inputs and their output values. Once the resulting model is obtained it can be used as a tool for analysis, simulation and prediction.

For a given set of data points, support vector regression (SVR) aims at determining a linear regressor where the input data are either considered in the original pattern space itself or taken into a higher dimensional feature space. With the introduction of $ε$ -insensitive error loss function proposed by Vapnik (2000), SVR has emerged as a powerful paradigm of choice for regression because of its improved generalization performance in comparison to other popular machine learning methods like artificial neural networks.

Considering the minimization of the square of 2-norm of the slack variables instead of the usual 1-norm and maximizing the margin with respect to both the orientation and the relative location to the origin of the bounding parallel planes, Mangasarian and Musicant, 2001a, Mangasarian and Musicant, 2001b studied “equivalent” SVM formulations for classification problems resulting in positive-definite dual problems having only non-negative constraints. For the formulation of SVM as an unconstrained optimization problem whose objective function becomes not twice differentiable, a smoothing technique is adopted to derive a new SVM formulation called Smooth SVM (SSVM) in Lee and Mangasarian (2001). As its extension to $ε$ -insensitive error loss based SVR, termed Smooth SVR (SSVR), we refer the reader to Lee, Hsieh, and Huang (2005). Also, on a finite Newton method of solution for SVM for classification with generalized Hessian approach, see Fung and Mangasarian (2003) and further on its extension to SVR, the interested reader is referred to Balasundaram and Kapil (2011). In addition, for the extension of the Active set SVM (ASVM) proposed for the classification (Mangasarian & Musicant, 2001b) to SVR, we refer Musicant and Feinberg (2004). For a review on optimization techniques used for training SVMs, see Shawe-Taylor and Sun (2011). Finally on the interesting work on multitask learning and multiview learning methods as extension of kernel based methods, we refer Ji and Sun (2013), Sun (2011) and Xie and Sun (2012).

It is well-known that the conventional $ε$ -insensitive SVR is formulated as a convex quadratic minimization problem having $2 m$ nonnegative variables and $2 m$ linear inequality constraints whereas 2-norm SVR introduces only $2 m$ linear inequality constraints, where $m$ is the number of training points. In either case, more variables and constraints enlarge the problem size and therefore increase in the computational cost for solving the regression problem. Recently, a new nonparallel plane regressor termed as twin SVR (TSVR) is proposed in Peng (2010a) wherein a pair of QPPs, each having $m$ number of constraints, is solved instead of solving a single QPP having $2 m$ constraints as in the conventional SVR which makes TSVR works faster than SVR.

In our approach, the 2-norm SVR in dual is reformulated as a single, unconstrained minimization problem having only $m$ variables. We term such reformulation as unconstrained Lagrangian SVR (ULSVR). The proposed ULSVR model can be solved using gradient based methods. However, since the objective function contains a term having non-smooth ‘plus’ function, two approaches are taken to solve the minimization problem: (i) two smooth approximation functions, studied in Lee and Mangasarian (2001) and Peng (2010b), are introduced and their slightly modified unconstrained minimization problems are solved; (ii) directly solve the minimization problem by applying generalized derivative. In both the approaches, by applying simple iterative methods, the zeros of the gradient are computed.

Finally, for the study of Lagrangian and implicit Lagrangian SVR solved using block matrices with the advantage of reduced training cost, see the work of Balasundaram and Kapil, 2010, Balasundaram and Kapil, 2011.

The effectiveness of the proposed approach in formulating ULSVR problem and solving it using various iterative methods is demonstrated by performing numerical experiments on a number of synthetic and real-world benchmark datasets and comparing their results with conventional SVR and least squares SVR. Comparable generalization performance in solving ULSVR with less computational training time clearly indicates its suitability on problems of interest.

The paper is organized as follows. In Section 2, formulations of the conventional and 2-norm SVR are introduced. Section 3 briefly dwells on least squares SVR. The proposed ULSVR formulation having only $m$ variables and various algorithms used for solving it are described in Section 4. On a number of synthetic and real-world datasets numerical experiments are performed and the results obtained by ULSVR are compared with SVR and least squares SVR in Section 5. Finally we conclude our work in Section 6.

In this work, all vectors are considered as column vectors. For any two vectors $x, y$ in $ℜ^{n}$ , their inner product will be denoted by $x^{t} y$ where $x^{t}$ is the transpose of $x$ . When $x$ is orthogonal to $y$ we write $x ⊥ y$ . The 2-norm of a vector $x$ will be denoted by, $| | x | |$ . For $x = {(x_{1}, \dots, x_{n})}^{t} \in ℜ^{n}$ , the plus function $x_{+}$ is defined as: ${(x_{+})}_{i} = max {0, x_{i}}$ , where $i = 1, \dots, n$ . Further we define the step function $x_{*}$ as: ${(x_{*})}_{i} = 1$ for $x_{i} > 0$ , ${(x_{*})}_{i} = 0$ if $x_{i} < 0$ and ${(x_{*})}_{i} = 0.5$ when $x_{i} = 0$ . The identity matrix of appropriate size is denoted by $I$ and the diagonal matrix of order $n$ whose diagonal elements become the components of the vector $x \in ℜ^{n}$ by diag(x). The column vector of ones of dimension $m$ is denoted by $e$ . If $f$ is a real valued function of the variable $x = {(x_{1}, \dots, x_{n})}^{t} \in ℜ^{n}$ then its gradient vector and Hessian matrix are defined by, $\nabla f = {(\frac{\partial f}{\partial x_{1}}, \dots, \frac{\partial f}{\partial x_{n}})}^{t}$ and $\nabla^{2} f = {(\partial^{2} f / \partial x_{i} \partial x_{j})}_{i, j = 1, \dots, n}$ respectively.

Section snippets

Support vector regression (SVR)

In this section, we briefly describe the conventional SVR and 2-norm SVR formulations.

Assume that a set of input samples ${(x_{i}, y_{i})}_{i = 1, 2, \dots, m}$ be given where for each training example $x_{i} \in ℜ^{n}$ , let $y_{i} \in ℜ$ be its corresponding observed value. Further, let the training examples be represented by a matrix $A \in ℜ^{m \times n}$ whose $i$ th row is defined to be the row vector $x_{i}^{t}$ and the vector of observed values be denoted by $y = {(y_{1}, \dots, y_{m})}^{t}$ .

The primary goal of SVR is in seeking a nonlinear regression function by mapping the

Least squares support vector regression (LS-SVR)

In this section, we give a brief introduction to the least squares SVR (LS-SVR). Unlike in the SVR formulation (1) where inequality constraints are considered, equality constraints are used to determine the nonlinear regressor in LS-SVR formulation $min_{w, b, ξ} \frac{1}{2} w^{t} w + \frac{C}{2} \sum_{i = 1}^{m} ξ_{i}^{2}$ subject to

$y_{i} = w^{t} φ (x_{i}) + b + ξ_{i}, i = 1, 2, \dots, m$ where the vector $w$ and the scalar $b$ are the unknowns; $ξ = {(ξ_{1}, \dots, ξ_{m})}^{t}$ is the residual vector; $C > 0$ is the regularization parameter and $φ (.)$ is a nonlinear feature mapping.

The primal LS-SVR

Lagrangian dual of 2-norm SVR as unconstrained minimization problem

In this section, we reformulate the dual problem (5) as an unconstrained minimization problem having $m$ number of unknown variables only and obtain its solution by applying iterative methods.

Since $u_{1}^{t} u_{2} = 0$ holds at optimality (Musicant & Feinberg, 2004), adding $(- 2 u_{1}^{t} u_{2})$ to the second term of the objective function in (5), problem (5) can be rewritten as

$min_{u_{1}, u_{2} \in ℜ^{m}} \frac{1}{2} [{(u_{1} - u_{2})}^{t} K (G, G^{t}) (u_{1} - u_{2})] + \frac{1}{2 C} (u_{1}^{t} u_{1} + u_{2}^{t} u_{2} - 2 u_{1}^{t} u_{2}) - y^{t} (u_{1} - u_{2}) + ε e^{t} (u_{1} + u_{2})$ subject to $0 \leq u_{1} ⊥ u_{2} \geq 0 .$ Now, by defining $u = u_{1} - u_{2}$ or equivalently

Experimental results

In this section, we investigate the performance of the proposed ULSVR, defined by (11) and solved by gradient based iterative algorithms: SULSVR1, SULSVR2, SULSVR3, NULSVR and GULSVR, by comparing their results in terms of accuracy and learning time with that of the conventional SVR and LS-SVR on a number of synthetic and publicly available benchmark real-world datasets.

All experiments were carried-out in MATLAB R2010a environment on a PC running on Windows XP OS with 32 bit, 2.27 GHz Intel(R)

Conclusions

By a simple reformulation of the Lagrangian dual of SVR, a novel SVR formulation is proposed in this work as an unconstrained minimization problem with the key advantage being the number of unknown variables is equal to the number of input data points. It is further proposed to solve the unconstrained minimization problem using iterative algorithms derived by considering smoothing and generalized derivative approaches. Unlike solving a quadratic programming problem of SVR, all the iterative

Acknowledgments

The authors are extremely thankful to the learned referees for their critical and constructive comments that greatly improved the earlier version of the paper.

References (32)

S. Balasundaram et al.
On Lagrangian support vector regression
Expert Systems with Applications
(2010)
S. Chen et al.
Seeking multi-threshold directly from support vectors for image segmentation
Neurocomputing
(2005)
G. Fung et al.
Finite Newton method for Lagrangian support vector machine
Neurocomputing
(2003)
Y. Ji et al.
Multitask multiclass support vector machines: model and experiments
Pattern Recognition
(2013)
X. Peng
TSVR: an efficient twin support vector machine for regression
Neural Networks
(2010)
X. Peng
Primal twin support vector regression and its sparse approximation
Neurocomputing
(2010)
J. Shawe-Taylor et al.
A review of optimization methodologies in support vector machines
Neurocomputing
(2011)
J. Sjoberg et al.
Nonlinear black-box modeling in system identification: a unified overview
Automatica
(1995)
S. Balasundaram et al.
Finite Newton method for implicit Lagrangian support vector regression
International Journal of Knowledge-based and Intelligent Engineering Systems
(2011)
S. Balasundaram et al.
On finite Newton method for support vector regression
Neural Computing & Applications
(2010)

G.E.P. Box et al.

Time series analysis: forecasting and control

(1976)

N. Cristianini et al.

An introduction to support vector machines and other kernel based learning method

(2000)

J. Demsar

Statistical comparisons of classifiers over multiple data sets

Journal of Machine Learning Research

(2006)

G. Fung et al.

A feature selection Newton method for support vector machine classification

Computational optimization and Applications

(2004)

S. Garcia et al.

An extension on “Statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons

Journal of Machine Learning Research

(2008)

G.H. Golub et al.

Matrix computations

(1996)

Cited by (36)

Systematic investigation of mitochondrial transfer between cancer cells and T cells at single-cell resolution
2023, Cancer Cell
Mitochondria (MT) participate in most metabolic activities of mammalian cells. A near-unidirectional mitochondrial transfer from T cells to cancer cells was recently observed to “metabolically empower” cancer cells while “depleting immune cells,” providing new insights into tumor-T cell interaction and immune evasion. Here, we leverage single-cell RNA-seq technology and introduce MERCI, a statistical deconvolution method for tracing and quantifying mitochondrial trafficking between cancer and T cells. Through rigorous benchmarking and validation, MERCI accurately predicts the recipient cells and their relative mitochondrial compositions. Application of MERCI to human cancer samples identifies a reproducible MT transfer phenotype, with its signature genes involved in cytoskeleton remodeling, energy production, and TNF-α signaling pathways. Moreover, MT transfer is associated with increased cell cycle activity and poor clinical outcome across different cancer types. In summary, MERCI enables systematic investigation of an understudied aspect of tumor-T cell interactions that may lead to the development of therapeutic opportunities.
A novel multi-innovation gradient support vector machine regression method
2022, ISA Transactions
Citation Excerpt :
In the SVM training process, the solving of the constrained quadratic programming issue is required, and the number of constraints in this problem is equal to the number of samples. Therefore, for large-capacity samples, this method requires a lot of work and a long training time [12,13]. The least square support vector machine (LS-SVM) is developed based on the standard SVM and is the least square version of SVM [14–16].
For the regression problem of support vector machine, the solution processes of the most existing methods use offline datasets, which cannot be realized online. For this problem, this paper presents a new online approach to identify these unknown parameters contained in the support vector machine. A new cost function is constructed by substituting the error term into the standard cost function, which is different from the standard support vector machine, and the gradient descent approach is then used to minimize the newly created loss function, thus proposing a stochastic gradient support vector machine algorithm to estimate the unknown parameters based on the recursive identification methods. Furthermore, to advance the property of the stochastic gradient support vector machine algorithm, a moving data window is used to widen the scalar information into a fixed-length innovation vector, thereby increasing the amount of information used in the parameter estimation based on the multi-innovation identification theory. In addition, the forgetting factor is brought into the proposed algorithms, and the corresponding forgetting factor recursive algorithms are derived. These methods are recursive identification methods, which may be implemented online and are more efficient in terms of computing. Finally, utilizing the MatLab platform, the validity and usefulness of the explored methodologies are proven using several numerical simulation examples.
National-scale electricity peak load forecasting: Traditional, machine learning, or hybrid model?
2022, Energy
As the volatility of electricity demand increases owing to climate change and electrification, the importance of accurate peak load forecasting is increasing. Traditional peak load forecasting has been conducted through time series-based models; however, recently, new models based on machine or deep learning are being introduced. This study performs a comparative analysis to determine the most accurate peak load-forecasting model for Korea, by comparing the performance of time series, machine learning, and hybrid models. Seasonal autoregressive integrated moving average with exogenous variables (SARIMAX) is used for the time series model. Artificial neural network (ANN), support vector regression (SVR), and long short-term memory (LSTM) are used for the machine learning models. SARIMAX-ANN, SARIMAX-SVR, and SARIMAX-LSTM are used for the hybrid models. The results indicate that the hybrid models exhibit significant improvement over the SARIMAX model. The LSTM-based models outperformed the others; the single and hybrid LSTM models did not exhibit a significant performance difference. In the case of Korea's highest peak load in 2019, the predictive power of the LSTM model proved to be greater than that of the SARIMAX-LSTM model. The LSTM, SARIMAX-SVR, and SARIMAX-LSTM models outperformed the current time series-based forecasting model used in Korea. Thus, Korea's peak load-forecasting performance can be improved by including machine learning or hybrid models.
Valley-loss regular simplex support vector machine for robust multiclass classification
2021, Knowledge-Based Systems
Citation Excerpt :
Support vector machine (SVM) [1,2] is a typical statistical-learning based approach for classification, from which many variants have been derived and applied successfully to a wide scope of fields [3–6].
Noise and outlier data processing are important issues to support vector machine (SVM). Although the pinball-loss SVM (Pin-SVM) and ramp-loss SVM (Ramp-SVM) are able to deal with the feature noise and outlier labels respectively, neither can handle both and promoting them from binary-classification to multiclass classification usually requires partitioning strategies. Since regular simplex support vector machine (RSSVM) has been proposed as a novel all-in-one $K$ -classification model with clear advantages over partitioning strategies, developing a novel loss function with feature noise robustness and outlier labels insensitivity meanwhile and embedding it into the framework of RSSVM is potentially promising. In this paper, a newly proposed valley-loss regular simplex support vector machine (V-RSSVM) for robust multiclass classification is presented. Inheriting the merits of both the pinball-type loss and ramp-type loss, valley-loss enjoys not only the robustness to feature noise and outlier labels but also excellent sparseness. To train the V-RSSVM fast, a Concave–Convex Procedure (CCCP) assisted sequential minimization optimization (SMO)-type solver and a speeding up oriented initial solution strategy were developed. We also investigated the robustness, generalization error bound and sparseness of V-RSSVM in theory. Numerical results on twenty-five real-life data sets verify the effectiveness of our proposed V-RSSVM model.
Structural improved regular simplex support vector machine for multiclass classification
2020, Applied Soft Computing Journal
Citation Excerpt :
As an effective statistical learning tool, support vector machine (SVM) [1,2] has been extensively studied and commonly used [3–6].
Although the structural regularized support vector machine (SRSVM) can enhance the generalization capability of the standard support vector machine (SVM), its current version is used only for binary classification. To make SRSVM adapt to the K-class classification, the most direct approach is combining it with partitioning strategies, which may however lead to the following shortcomings: (1) Extracting structural information repeatedly for individual classifiers based on different class partitions increases the computational complexity. (2) Individual classifiers can hardly utilize complete data structural information. Under the basic framework of regular simplex support vector machine (RSSVM), we developed a novel structural improved regular simplex support vector machine (SIRSSVM). SIRSSVM generates only a single primal optimization problem, into which the data structural information within all classes is embedded, rather than using only partial structural information to construct individual classifiers as partitioning strategies do. Additionally, we modified the sequential minimization optimization (SMO)-type solver for RSSVM to adapt the proposed SIRSSVM model. Experimental results verified that our SIRSSVM could achieve excellent performance on both generalization capability and training efficiency.
Multipoint hoop strain measurement based pipeline leakage localization with an optimized support vector regression approach
2019, Journal of Loss Prevention in the Process Industries
Pipelines are used to carry fluids across long distances. Given the costly and hazardous nature of some of these fluids, timely and accurate inspection for damages is imperative to prevent harmful financial and environmental consequences. In the previous research, a fiber optic based hoop strain sensor is developed and reported to accomplish the goal of pipeline corrosion and leakage monitoring. The sensors form a foundation upon which advanced damaged detection algorithms can be carried out. In this paper, the application of distributed hoop strain sensing information combined with support vector regression (SVR) is demonstrated to pinpoint leakage position on a long-distance pressurized pipeline. The SVR parameters were further optimized by a genetic algorithm (GA) in order to improve accuracy. The resulting leakage detection system had a mean squared error as low as 0.076 when no noise was present. The effect of noise (approximated by Gaussian white noise) was studied in a simulation, showing an effective 5% error for a 55 km simulated pipeline. Results also showed that the use of more sensors corresponded to heightened robustness towards different noise levels.

View all citing articles on Scopus

View full text

Lagrangian support vector regression via unconstrained convex minimization

Abstract

Introduction

Section snippets

Support vector regression (SVR)

Least squares support vector regression (LS-SVR)

Lagrangian dual of 2-norm SVR as unconstrained minimization problem

Experimental results

Conclusions

Acknowledgments

Expert Systems with Applications

Neurocomputing

Neurocomputing

Pattern Recognition

Neural Networks

Neurocomputing

Neurocomputing

Automatica

Finite Newton method for implicit Lagrangian support vector regression

International Journal of Knowledge-based and Intelligent Engineering Systems

On finite Newton method for support vector regression

Neural Computing & Applications

Time series analysis: forecasting and control

An introduction to support vector machines and other kernel based learning method

Statistical comparisons of classifiers over multiple data sets

Journal of Machine Learning Research

A feature selection Newton method for support vector machine classification

Computational optimization and Applications

An extension on “Statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons

Journal of Machine Learning Research

Matrix computations