Elsevier

Neural Networks

Volume 57, September 2014, Pages 1-11
Neural Networks

Noise model based ν-support vector regression with its application to short-term wind speed forecasting

https://doi.org/10.1016/j.neunet.2014.05.003Get rights and content

Abstract

Support vector regression (SVR) techniques are aimed at discovering a linear or nonlinear structure hidden in sample data. Most existing regression techniques take the assumption that the error distribution is Gaussian. However, it was observed that the noise in some real-world applications, such as wind power forecasting and direction of the arrival estimation problem, does not satisfy Gaussian distribution, but a beta distribution, Laplacian distribution, or other models. In these cases the current regression techniques are not optimal. According to the Bayesian approach, we derive a general loss function and develop a technique of the uniform model of ν-support vector regression for the general noise model (N-SVR). The Augmented Lagrange Multiplier method is introduced to solve N-SVR. Numerical experiments on artificial data sets, UCI data and short-term wind speed prediction are conducted. The results show the effectiveness of the proposed technique.

Introduction

Regression is an old topic in the domain of learning functions from a set of samples (Hastie, Tibshirani, & Friedman, 2009). It provides researchers and engineers with a powerful tool to extract hidden rules of data. The trained model is used to predict future events with the information of past or present events. Regression analysis is now successfully applied in nearly all fields of science and technology, including the social sciences, economics, finance, wind power prediction for grid operation. However this domain is still attracting much attention from research and application domains.

Generally speaking, there are three important issues in designing a regression algorithm: model structures, objective functions and optimization strategies. The model structures include linear or nonlinear functions (Park & Lee, 2005), neural networks (Spech, 1990), decision trees (Esposito, Malerba, & Semeraro, 1997), and so on; optimization objectives include ϵ-insensitive loss (Cortes and Vapnik, 1995, Vapnik, 1995, Vapnik et al., 1996), squared loss (Suykens et al., 2000, Wu, 2010, Wu and Law, 2011), robust Huber loss (Olvi & David, 2000) etc. According to the formulation of optimization functions, a collection of optimization algorithms (Ma, 2010) have been developed. In this work, we focus on the problem which optimal formulation should be considered with respect to different error models.

Suppose we are given a set of training data Dl={(x1,y1),(x2,y2),,(xl,yl)}, where xiRL,yiR,i=1,2,,l. Take a multivariate linear regression task f as an example. The form is f(x)=ωTx+b, where ωRL,bR,i=1,2,,l. The task is to learn the parameter vectors ω and parameter b, by minimizing the objective functiongLR=i=1l(yiωTxib)2. The objective function of the sum-of-squares error is usually used in regression. The trained model is optimal, if the samples have been corrupted by independent and identical probability distributions (i.i.d.). Noise satisfying Gaussian distribution with zeros mean and variance σ2, i.e., yi=f(xi)+ξi,i=1,,l,ξiN(0,σ2).

In the recent years, the support vector regressor (SVR) is growing up as a popular technique (Cortes and Vapnik, 1995, Cristianini and Shawe, 2000, Smola and Schölkopf, 2004, Vapnik, 1995, Vapnik, 1998, Vapnik, 1999, Vapnik et al., 1996, Wu, 2010, Wu and Law, 2011). It is a universal regression machine based on the V–C dimension theory. This technique is developed with the Structural Risk Minimization (SRM) principle, which has shown its effectiveness in applications. The classical SVR is optimized by minimizing Vapnik’s ϵ-insensitive loss function of residuals and has achieved good performance in a variety of practical applications (Bayro-Corrochano and Arana-Daniel, 2010, Duan et al., 2012, Huang et al., 2012, Kwok and Tsang, 2003, Lopez and Dorronsoro, 2012, Yang and Ong, 2011).

In 1995, ϵ-SV R was proposed by Vapnik and his research team (Cortes and Vapnik, 1995, Vapnik, 1995, Vapnik et al., 1996). In 2000, ν-SV R was introduced by Schölkopf, Smola, Williamson, and Bartlett (2000), which automatically computes ϵ. Suykens et al. (2000) constructed least squares support vector regression with Gaussian noise (LS-SVR). Wu (Wu, 2010, Wu and Law, 2011) and Pontil, Mukherjee, and Girosi (1998) constructed ν-support vector regression with Gaussian noise (GN-SVR). If the noise obeys the Gaussian distribution, the outputs of the models are optimal. However, it was found that the noise in some real-world applications, just like wind power forecast and direction-of-arrival estimation problem, does not satisfy Gaussian distribution, but a beta distribution or Laplace distribution, respectively. In these cases these regression techniques are not optimal.

The principle of ν-support vector regression (ν-SV R) can be written as (Chalimourda et al., 2004, Chih-Chung and Chih-Jen, 2002, Schölkopf et al., 2000): min{gPν-SVR=12ω2+C(νϵ+1li=1l(ξi+ξi))}Subject to :ωTxi+byiϵ+ξiyiωTxibϵ+ξiξi,ξi0,i=1,2,,l,ϵ0, where ξi,ξi are two slack variables. The constant C>0 determines the trade-off between the flatness of f and the amount up to which deviations larger than ϵ are tolerated. ν(0,1] is a constant which controls the number of support vectors. In the ν-SV R the size of ϵ is not given a priori but a variable. Its value is traded off against the model complexity and slack variables via a constant ν (Chalimourda et al., 2004). This corresponds to dealing with a so-called ϵ-insensitive loss function (Cortes and Vapnik, 1995, Vapnik, 1995) described bycϵ(ξ)=|ξ|ϵ={0,if  |ξ|ϵ,|ξ|ϵ,otherwise .

In 2002, ϵ-SV R for a general noise model was proposed in Schölkopf and Smola (2002): min{gϵ-SV R=12ω2+C(i=1lc˜(ξi)+c˜(ξi))}Subject to :ωTxi+byiϵ+ξiyiωTxibϵ+ξiξi,ξi0,i=1,2,,l, where c(x,y,f(x))=c˜(|yf(x)|ϵ) is a general convex loss function in the sample point (xi,yi) of Dl. |yf(x)|ϵ in (5) is Vapnik’s ϵ-insensitive loss function.

Using Lagrange multiplier techniques (Cortes and Vapnik, 1995, Vapnik, 1995), Problem (4) can be transformed to a convex optimization problem with a global minimum. At the optimum, the regression estimate takes the form f(x)=i=1l(αiαi)(xix)+b, where (xix) is the inner product.

In 2002, Bofinger, Luig, and Beyer (2002) found that the output of wind turbine systems is limited between zero and the maximum power and the error statistics do not follow a normal distribution. In 2005, Fabbri, Román, Abbad, and Quezada (2005) believed that the normalized produced power p must be within the interval [0,1] and the beta function is more appropriate to fit the error than the standard normal distribution function. Bludszuweit, Antonio, and Llombart (2008) showed the advantages of using the beta probability distribution function (PDF), instead of the Gaussian PDF, for approximating the forecast error distribution. The error ϵ between the predicted values xp and the measured values xm obeys the beta distribution in the forecast of wind power, and the PDF of ϵ is f(ϵ)=ϵm1(1ϵ)n1h,ϵ(0,1), the parameters m and n are often called hyperparameters because they control the distribution of the variable ϵ (m>1,n>1), h is the normalization factor and parameters m and n are determined by the values of the mean (which is the predicted power) and the standard deviation (Bishop, 2006, Canavos, 1984). Fig. 1 shows plots of Gaussian distribution and the beta distribution for different values of hyperparameters. In 2007, Zhang, Wan, Zhao, and Yang (2007) and Randazzo, Abou-Khousa, Pastorino, and Zoughi (2007) presented the estimation results under a Laplacian noise environment in the direction-of-arrival of coherent electromagnetic waves impinging estimation problem. Laplace distribution is frequently encountered in various machine learning areas, e.g., the over-complete wavelet transform coefficients of images, processing in Natural images, etc. (Eltoft et al., 2006, Park and Lee, 2005).

Based on the above analysis, we know that the error distributions do not satisfy Gaussian distribution in some real-world applications. We try to study the optimal loss functions for different error models.

It is not suitable to apply the GN-SVR to fit functions from data with non-Gaussian noise. In order to solve the above problems, we derive a general loss function and construct ν-support vector regression machines for a general noise model.

Finally, we design a technique to find the optimal solution to the corresponding regression tasks. While there are a large number of implementations of SVR algorithms in the past years, we introduce the Augmented Lagrange Multiplier (ALM) method, presented in Section  4. If the task is non-differentiable or discontinuous, the sub-gradient descent method can be used (Ma, 2010), and if there are very large scale of samples, SMO can also be used (Shevade, Keerthi, Bhattacharyya, & Murthy, 2000).

The main contributions of our work are listed as follows: (1) we derive the optimal loss functions for different error models by the use of Bayesian approach and optimization theory; (2) we develop the uniform ν-support vector regression model for the general noise with inequality constraints (N-SV R); (3) the Augmented Lagrange Multiplier method is applied to solve N-SV R, which guarantees the stability and validity of the solution in N-SV R; (4) we utilize N-SV R to short-term wind speed prediction and show the effectiveness of the proposed model in practical applications.

This paper is organized as follows: in Section  2 we derive the optimal loss function corresponding to a noise model by using the Bayesian approach; in Section  3 we describe the proposed ν-support vector regression technique for general noise model (N-SV R); in Section  4 we give the solution and algorithm design of N-SV R; numerical experiments are conducted on artificial data sets, UCI data and short-term wind speed prediction in Sections  5 Experimental analysis, 6 Short-term wind speed prediction with the proposed algorithm; finally, we conclude the work in Section  7.

Section snippets

Bayesian approach to the general loss function

Given a set of noisy training samples Dl, we require to estimate an unknown function f(x). Following Chu et al., 2004, Girosi, 1991, Klaus-Robert and Sebastian, 2001, Pontil et al., 1998, the general approach is to minimize H[f]=i=1lc(ξi)+λΦ[f], where c(ξi)=c(yif(xi)) is a loss function, λ is a positive number and Φ[f] is a smoothness functional.

We assume the noise is additive yi=f(xi)+ξi,i=1,2,,l, where ξi is random, independent, identical probability distributions (i.i.d.) with P(ξi) of

Noise model based ν-support vector regression

Given samples Dl, we construct a linear regression function f(x)=ωTx+b. In order to deal with nonlinear functions the following generalization can be done (Schölkopf and Smola, 2002, Vapnik, 1995, Vapnik, 1998): we map the input vectors xiRL into a high-dimensional feature space H through some nonlinear mapping, Φ:RLH (H is Hilbert space), chosen a priori. We then solve the optimization problem (4) in the feature space H. In this case, the inner product of the input vectors (xix) is

Solution based on the Augmented Lagrange Multiplier method

Theorem 1, Theorem 2 supply an algorithm to effectively recognize the model of N-SV R. We obtain the solution based on the Augmented Lagrange Multiplier (ALM) method and the algorithm design of ν-support vector regression machine with the noise model (N-SV R) in this section.

(1) Let data set Dl={(x1,y1),(x2,y2),,(xl,yl)}, where xiRL,yiR,i=1,,l.

(2) Use a 10-fold cross validation strategy to search the optimal parameters C,ν,m,n, and select a kernel function K(,).

(3) Construct and solve the

Experimental analysis

In this section, we present some experiments to test the proposed model on several regression tasks with artificial data and UCI data.

The following criteria, including mean absolute error (MAE), mean absolute percentage error (MAPE), the root mean square error (RMSE), and the standard error of prediction (SEP), are introduced to evaluate the performance of the ν-SV R, GN-SVR and BN-SVR models: MAE=1li=1l|yixi|,MAPE=1li=1l|yixi|xi,RMSE=1li=1l(yixi)2,SEP=RMSEx¯, where l is the size of the

Short-term wind speed prediction with the proposed algorithm

The forecasting model of N-SV R is applied in a real-world sample set of wind speed in Heilongjiang Province. The data record more than a year wind speeds. The average wind speeds in 10 min are stored. As a whole, 62 466 samples with 4 attributes, mean, variance, minimum, maximum, are given.

We analyze one-month time series of wind speeds. Let us investigate the error distribution by using the persistence method (Bludszuweit et al., 2008). The result shows that the error ϵ of wind speed with the

Conclusions and future work

Most existing regression techniques take the assumption that the error model is Gaussian. However, it was found that the noise in some real-world applications, just like wind speed forecast, does not satisfy Gaussian distribution, but a beta distribution. In this case, these regression techniques are not optimal. In this work, we describe the main results of our work: (1) we derive the optimal loss functions for different error models; (2) we develop the uniform ν-support vector regression for

Acknowledgments

This work was supported by National Program on Key Basic Research Project (973 Program) under Grant 2012CB215201, and National Natural Science Foundation of China (NSFC) under Grants 61222210, 61170107, 61105054, and 61300121.

References (50)

  • E.J. Bayro-Corrochano et al.

    Clifford support vector machines for classification, regression, and recurrence

    IEEE Transactions on Neural Networks

    (2010)
  • C.M. Bishop

    Pattern recognition and machine learning

    (2006)
  • H. Bludszuweit et al.

    Statistical analysis of wind power forecast error

    IEEE Transactions on Power Systems

    (2008)
  • Bofinger, S., Luig, A., & Beyer, H. G. (2002). Qualification of wind power forecasts. In Proc. global wind power conf.,...
  • Léon Bottou

    Large-scale machine learning with stochastic gradient descent

  • S. Boyd et al.

    Convex optimization

    (2004)
  • G.C. Canavos

    Applied probability and statistical methods

    (1984)
  • A. Chalimourda et al.

    Experimentally optimal ν in support vector regression for different noise models and parameter settings

    Neural Networks

    (2004)
  • D.S. Chen et al.

    A robust backpropagation learning algorithm for function approximation

    IEEE Transactions on Neural Networks

    (1994)
  • V. Cherkassky et al.

    Practical selection of SVM parameters and noise estimation for SVM regression

    Neural Networks

    (2004)
  • C. Chih-Chung et al.

    Training v-support vector regression: theory and algorithms

    Neural Computation

    (2002)
  • W. Chu et al.

    Bayesian support vector regression using a unified loss function

    IEEE Transactions on Neural Networks

    (2004)
  • C. Cortes et al.

    Support vector networks

    Machine Learning

    (1995)
  • N. Cristianini et al.

    An introduction to support vector machines

    (2000)
  • L. Duan et al.

    Domain adaptation from multiple sources: a domain-dependent regularization approach

    IEEE Transactions on Neural Networks and Learning Systems

    (2012)
  • T. Eltoft et al.

    On the multivariate Laplace distribution

    IEEE Signal Processing Letters

    (2006)
  • F. Esposito et al.

    A comparative analysis of methods for pruning decision trees

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1997)
  • A. Fabbri et al.

    Assessment ofthe cost associated with wind generation prediction errors in a liberalized electricity market

    IEEE Transactions on Power Systems

    (2005)
  • S. Fan et al.

    Forecasting the wind generation using a two-stage network based on meteorological information

    IEEE Transactions on Energy Conversion

    (2009)
  • A.V. Fiacco et al.

    Nonlinear programming. Sequential unconstrained minimization techniques

    (1990)
  • Girosi, F. (1991). Models of noise and robust estimates. A.I. memo 1287, Artificial Intelligence Laboratory,...
  • Z.H. Guo et al.

    A corrected hybrid approach for wind speed prediction in Hexi Corridor of China

    Energy

    (2011)
  • T. Hastie et al.

    The elements of statistical learning: data mining, inference, and prediction

    (2009)
  • G. Huang et al.

    Robust support vector regression for uncertain input and output data

    IEEE Transactions on Neural Networks and Learning Systems

    (2012)
  • P.J. Huber

    Robust estimation of a location parameter

    The Annals of Mathematical Statistics

    (1964)
  • Cited by (0)

    View full text