Elsevier

Neurocomputing

Volume 230, 22 March 2017, Pages 345-358
Neurocomputing

Robust regularized extreme learning machine for regression using iteratively reweighted least squares

https://doi.org/10.1016/j.neucom.2016.12.029Get rights and content

Abstract

Extreme learning machine (ELM) for regression has been used in many fields because of its easy-implementation, fast training speed and good generalization performance. However, basic ELM with ℓ2-norm loss function is sensitive to outliers. Recently, ℓ1-norm loss function and Huber loss function have been used in ELM to enhance the robustness. However, the ℓ1-norm loss function and the Huber loss function can also be effected by outliers because of their linear correlation with the errors. Moreover, existing robust ELM methods only use ℓ2-norm regularization or have no regularization term. In this study, we propose a unified model for robust regularized ELM regression using iteratively reweighted least squares (IRLS), and call it RELM-IRLS. We perform a comprehensive study on the robust loss function and regularization term for robust ELM regression. Four loss functions (i.e., ℓ1-norm, Huber, Bisquare and Welsch) are used to enhance the robustness, and two types of regularization (ℓ2-norm and ℓ1-norm) are used to avoid overfitting. Experiments show that our proposed RELM-IRLS with ℓ2-norm and ℓ1-norm regularization is stable and accurate for data with 040% outlier levels, and that RELM-IRLS with ℓ1-norm regularization can obtain a compact network because of the highly sparse output weights of the network.

Introduction

The extreme learning machine (ELM) [1] is proposed for training single-hidden layer feedforward networks (SLFNs). It directly approximates nonlinear mapping of input data by randomly generating the hidden node parameters without tuning. This model has been proven to exhibit the universal approximation capability [2]. ELM has the following merits: (1) easy-implementation, (2) extremely fast training speed, (3) good generalization performance. ELM has recently gained increasing interest in regression problems, such as stock market forecasting [3], electricity price forecasting [4], wind power forecasting [5], affective analogical reasoning [6], because of the aforementioned merits.

The performance of ELM regression crucially relies on the given labels of training data. The basic ELM with the ℓ2-norm loss function assumes that the training labels is a normal error distribution. However, training samples for real tasks cannot be guaranteed to have a normal error distribution. Many factors can corrupt the training samples with outliers, such as instrument errors, sample errors and modeling errors. The performance of basic ELM regression is heavily deteriorated because ℓ2-norm loss can be easily effected by the large deviations of the outliers.

To solve this problem, Deng et al. [7] proposed a regularized ELM with weighted least square to enhance the robustness. Their algorithm consists of two stages of the reweighted ELM. Zhang et al. [8] proposed the outlier-robust ELM with the ℓ1-norm loss function and the ℓ2-norm regularization term. They used augmented Lagrange multiplier algorithm to solve the objective loss function and effectively reduced the influence of outliers. Horata et al. [9] adopted the Huber function to enhance the robustness. They used iteratively reweighted least squares (IRLS) algorithm to solve the Huber loss function without a regularization term. The model without regularization is easy to overfit.

However, the loss functions of existing robust ELM regression, namely, ℓ1-norm or Huber function, can also be effected by the outliers with large deviations because ℓ1-norm or Huber loss functions are linear with the deviations. Moreover, existing robust ELM methods use only ℓ2-norm regularization or have no regularization term. When the number of hidden nodes is large, the ℓ2-norm regularization will train a large ELM model due to non-zero output weights of the network. In a word, there lacks a study that considers different loss functions and regularization terms simultaneously.

Thus, we conduct a comprehensive study on the loss function and regularization term of the robust ELM regression in this work. We propose a unified model for robust regularized ELM regression using IRLS (RELM-IRLS). Four loss functions (i.e., ℓ1-norm, Huber, Bisquare and Welsch) are used to enhance the robustness, and two types of regularization (ℓ2-norm and ℓ1-norm) are used to avoid overfitting. These loss functions, also known as M-estimation functions, have been widely used in robust statistics [10]. IRLS is used to optimize the objective function with robust loss function and regularization term. Each IRLS iteration is equivalent to solving a weighted least-squares ELM regression. Our RELM-IRLS algorithm can also be trained efficiently because of the fast training speed of ELM. The experimental results on synthetic and real data sets show that our proposed RELM-IRLS is stable and accurate at 040% outlier levels.

Compared to existing ELM methods for robust regression, the main contributions of this paper are highlighted as follows:

  • (1)

    A unified model is proposed for robust regularized ELM regression. Different kinds of robust loss functions and regularization terms can be used in this model.

  • (2)

    RELM-IRLS with ℓ2-norm regularization is proposed to achieve better generalization.

  • (3)

    RELM-IRLS with ℓ1-norm regularization is proposed to realize better generalization performance and more compact network architecture.

The rest of this paper is organized as follows. Basic ELM and its robust variants are reviewed in Section 2. In Section 3, we present the unified model and the RELM-IRLS with ℓ2-norm regularization and ℓ1-norm regularization. Section 4 demonstrates the experimental results of our proposed algorithms and Section 5 presents our conclusion.

Section snippets

ELM for regression

For a given set of training samples S={(x(i),y(i))|i=1,..,N}Rd×R for regression problem, ELM is a unified SLFN whose output with L hidden nodes can be represented asfL(x)=i=1Lhi(x)βi=h(x)β,xRdwhere the h(x)=[h1(x),h2(x),,hL(x)] and β=[β1,β2,,βL]. hi(x) is the hidden layer function g(ai,bi,x) between the input layer and the ith hidden node. ai and bi are randomly generated independent of the training data. βi is the output weight between the ith hidden node to the output node. The

Proposed method

In this section, we explain our proposed RELM-IRLS algorithm. First, we provide a unified model for robust regularized regression. Then, we present the RELM-IRLS with ℓ2-norm and ℓ1-norm regularization respectively. Finally, we discuss the advantages and disadvantages of our proposed method.

Experiment

MATLAB code for our algorithm is available at: https://github.com/KaenChan/robust-elm-irls.

Conclusion

In this paper, we propose a robust regularized ELM for regression problem. A unified model is proposed for robust regularized ELM regression using IRLS optimization method. Four robust loss functions (i.e. ℓ1-norm, Huber, Bisquare and Welsch) are used in our algorithms to enhance the robustness of basic ELM. Moreover, two types of regularization terms (ℓ2-norm and ℓ1-norm) are used to consider the structural risk minimization. We propose robust ELM with ℓ2-norm regularization, which can achieve

Acknowledgement

This work is partially supported by Natural Science Foundation of China (61125201, 61303070, U1435219).

Kai Chen received his B.S. degree in computer science and technology at Northwestern Polytechnical University, in 2010, and received his M.S. degree in computer science and technology at National University of Defense Technology in 2012, and now he is a Ph.D. candidate at National University of Defense Technology. His research interests include high performance computer architecture, machine learning, and constraint satisfaction.

References (19)

There are more references available in the full text version of this article.

Cited by (0)

Kai Chen received his B.S. degree in computer science and technology at Northwestern Polytechnical University, in 2010, and received his M.S. degree in computer science and technology at National University of Defense Technology in 2012, and now he is a Ph.D. candidate at National University of Defense Technology. His research interests include high performance computer architecture, machine learning, and constraint satisfaction.

Qi Lv received his B.S. degree in computer science and technology in Tsinghua University, Beijing, in 2009, and received his M.S. degree in computer science and technology at National University of Defense Technology in 2011, and now he is a Ph.D. candidate at National University of Defense Technology. His research interests include high performance computer architecture, machine learning, and remote sensing image processing.

Yao Lu received his B.S. degree in Computer Science and Technology in Shihezi University, in 2010, and received his M.S. degree in Computer Science and Technology at National University of Defense Technology in 2012, and now he is an assistant Engineer at National University of Defense Technology. His research interests include high performance computer architecture, parallel computing, and machine learning.

Yong Dou is professor, Ph.D. supervisor, senior membership of China Computer Federation. He received his B.S., M.S., and Ph.D. degrees in Computer Science and Technology at National University of Defense Technology. His research interests include high performance computer architecture, high performance embedded microprocessor, reconfigurable computing, and bioinformatics, machine learning. He is a member of the IEEE and the ACM.

View full text