Brief papersOrthogonal extreme learning machine for image classification
Introduction
Extreme learning machine [1] has been proved as an efficient and effective method to train single hidden layer feed neural networks (SLFNs), providing us an unified framework for both multi-class classification and regression tasks. The basic ELM model can be simply seen as a random feature mapping followed by the least squares regression formula. The main contribution of ELM to general SLFNs is that the parameters of hidden units, including the input weights between the input layer and hidden layer as well as the biases of hidden units, can be randomly generated, which leads to the analytical determination of the output weights between the hidden layer and output layer. Such improvement greatly alleviates the burden of weight tuning caused by the widely used back-propagation algorithms and thus guarantees the fast learning speed of ELM. As a variant of SLFNs, though the mathematical formula of ELM is simple, the universal approximation capacity [2], [3] can be also kept. Furthermore, the rationality of the randomly generated input weights and biases was analyzed by some recently published studies [4], [5]. ELM fills gaps among many types of SLFNs such as feedforward networks (e.g., sigmoid networks), RBF networks, SVM (considered as a special type of SLFNs), polynomial networks and proposes that it need not have different learning algorithms for different SLFNs if universal approximation and classification capabilities are considered [2], [6], [7]. Further, ELM theories and philosophy show that some earlier learning theories such as ridge regression theory, Bartlett’s neural network generalization performance theory and SVM’s maximal margin are actually consistent in machine learning [8], [9]. Inspired by deep learning but different from it, the hierarchical models using ELM as building block do not require intensive tuning in hidden layers and hidden units and also obtain amazing performance [10], [11]. Due to the success of ELM in diverse applications, ELM research has been a hotspot in machine learning communities and many studies are conducted from many aspects such as theoretical investigation [4], [5], model improvements [11], [12] and applications [13], [14]. Some of recent progresses were briefly reviewed in [15], [16].
From the hidden layer to output layer, ELM essentially learns the output weight matrix based on the least squares regression formula. Therefore, many approaches were proposed to do discriminant analysis based on the least squares regression. The central task is to find a proper transformation matrix to minimize the sum-of-squares error function, which will be further used for dimensionality reduction or classification. Xiang et al. proposed a framework of discriminative least squares regression for multiclass classification whose idea is to utilize the ε-dragging to enlarge the distance of samples from different classes [17]. Similar work conducted by Zhang et al. aims to directly learn regression targets from data that can better evaluate the classification error than conventional predefined regression targets [18]. In most cases, ELM is viewed as classifier in which the hidden layer data representation (ELM feature space) is projected to output layer (label space), we expect to learn proper output weight matrix (transformation matrix) to make ELM more effective in classification. To this end, many efforts were made to impose different properties on the output weight matrix. Peng et al. proposed to enhance the label consistency property of ELM and formulated the graph regularized extreme learning machine that shows excellent performance in face recognition [19] and EEG-based emotion recognition [20]. Shi et al. introduced the elastic net regularization into ELM which can simultaneously bring the sparsity of output weight matrix and avoid the singularity problem [21]. Among different existing strategies, orthogonal constraint on transformation matrix has been widely employed in both subspace learning and least squares-based classification, which shows excellent performance in both situations. Cai et al. proposed the orthogonal locality preserving projection (OLPP) method that produces orthogonal basis functions and can have more locality preserving power than LPP [22]. Since it has been shown that the locality preserving power is directly related to the discriminating power, OLPP obtains better performance than LPP [23]. In [24], Nie et al. showed that the orthogonal least squares discriminant analysis is better than the basic counterpart without orthogonal constraint. Similar work was conducted to do feature extraction based on orthogonal least squares regression [25]. Motivated by these studies, in this paper, we propose to learn orthogonal output weight matrix from hidden layer to output layer that is expect to the transformation matrix under the orthogonal constraint can preserve more structure information between these two layers and thus have more discriminating power for classification.
The remainder of this paper is organized as follows. Section 2 gives a brief description of the basic ELM model. The model formulation, optimization method, convergence as well as computational complexity of the proposed OELM are detailed in Section 3. Experimental studies are conducted in Section 4 to show the effectiveness of OELM. Section 5 concludes the whole paper.
Section snippets
Extreme learning machine
Suppose we have n labeled training samples where each sample and its corresponding label vector (c is the number of classes). If xi is labeled as class p, then the pth element of yi is 1 and the other elements of yi are 0. Consider a SLFN with input weight matrix hidden bias vector and output weight matrix where k is the number of hidden units. For an input vector x, the output of this SLFN can be represented as
Model formulation and optimization
By introducing the orthogonal constraint, we formulate the objective of orthogonal ELM as where and . Under the orthogonal constraint, the data will be projected to an orthogonal subspace where the data metric structure can be preserved. Some properties of OELM will be discussed in Section 3.3 in detail. This section focuses on its model formulation as well as the optimization method.
Since k > c, objective (7) is an unbalanced orthogonal procrustes problem
Experimental settings and datasets
In this section, we conduct pairwise comparison between OELM and ELM (with ℓ2 regularization) to show the effectiveness of the orthogonal constraint on output weight matrix. The activation function for ELM is the ‘sigmoid’ function, and the number of hidden neurons is set as three times of the input dimension of data. The regularization parameter for ELM is searched from candidates .
The properties of the three images data sets used in our experiments are descried as follows
- •
UMIST.
Conclusion
In this paper we proposed a new ELM model, termed OELM, in which the output weight matrix is enforced to be orthogonal. The main contributions of this work lie in three aspects: (1) formulating the objective of OELM and analyzing its effectiveness from the perspective of discriminative analysis; (2) presenting an effective iterative procedure to optimize the OELM objective by solving a balanced orthogonal procrustes problem via singular value decomposition; (3) demonstrating the effectiveness
Acknowledgments
This work was partially supported by National Natural Science Foundation of China (61602140, 61671193 61402143), Science and Technology Program of Zhejiang Province (2017C33049), Natural Science Foundation of Zhejiang Province (LQ14F020012), Jiangsu Key Laboratory of Image and Video Understanding for Social Safety, Nanjing University of Science and Technology (30916014107) and Guangxi High School Key Laboratory of Complex System and Computational Intelligence (2016CSCI04).
Yong Peng received the B.S. degree from Hefei New Star Research Institute of Applied Technology, the M.S. degree from Graduate University of Chinese Academy of Sciences, and the Ph.D. degree from Shanghai Jiao Tong University, all in computer science, in 2006, 2010, and 2015, respectively. From September 2012 to August 2014, he was a visiting Ph.D. student in the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor. He joined in School of Computer Science
References (25)
- et al.
Convex incremental extreme learning machine
Neurocomputing
(2007) - et al.
Enhanced random search based incremental extreme learning machine
Neurocomputing
(2008) - et al.
No-reference image quality assessment using modified extreme learning machine classifier
Appl. Soft Comput.
(2009) - et al.
Trends in extreme learning machines: a review
Neural Netw.
(2015) - et al.
Discriminative graph regularized extreme learning machine and its application to face recognition
Neurocomputing
(2015) - et al.
Discriminative manifold extreme learning machine and applications to image and EEG signal classification
Neurocomputing
(2016) - et al.
EEG-based vigilance estimation using extreme learning machines
Neurocomputing
(2013) - et al.
Orthogonal vs. uncorrelated least squares discriminant analysis for feature extraction
Pattern Recognit. Lett.
(2012) - et al.
Orthogonal least squares regression for feature extraction
Neurocomputing
(2016) - et al.
Extreme learning machine: a new learning scheme of feedforward neural networks
Proceedings of the IEEE International Joint Conference on Neural Networks
(2004)
Universal approximation using incremental constructive feedforward networks with random hidden nodes
IEEE Trans. Neural Netw.
Universal approximation of extreme learning machine with adaptive growth of hidden nodes
IEEE Trans Neural Netw Learn. Syst.
Cited by (0)
Yong Peng received the B.S. degree from Hefei New Star Research Institute of Applied Technology, the M.S. degree from Graduate University of Chinese Academy of Sciences, and the Ph.D. degree from Shanghai Jiao Tong University, all in computer science, in 2006, 2010, and 2015, respectively. From September 2012 to August 2014, he was a visiting Ph.D. student in the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor. He joined in School of Computer Science and Technology, Hangzhou Dianzi University as an Assistant Professor in June 2015 where he is currently a Research Associate Professor. He was awarded by the President Scholarship, Chinese Academy of Sciences in 2009 and National Scholarship for Graduate Students, Ministry of Education in 2012. His research interests include machine learning, pattern recognition, and brain-computer interface.
Wanzeng Kong received both bachelor degree and Ph.D. degree from Electrical Engineering Department, Zhejiang University, Hangzhou, China, in 2003 and 2008, respectively. He is currently a professor and vice dean of college of computer science, Hangzhou Dianzi University, Hangzhou, China. From November 2012 to November 2013, Dr. Kong is a visiting research associate in department of biomedical engineering, University of Minnesota, Twin Cities, USA. His research interests include cognitive computing, pattern recognition and BCI-based electronic system. Dr. Kong is also a member of IEEE, ACM, and CCF.
Bing Yang received her Ph.D. degree in Computer Science from Zhejiang University in 2013 and then joined in School of Computer Science and Technology, Hangzhou Dianzi University where she is now serving as an associate professor. Her main research interests include computer vision and machine learning.