LettersRandom optimized geometric ensembles
Introduction
Geometric reasoning is related to the design of learning machine in editing rules for the nearest neighbor technique [1] as well as in analyzing the relationship between large margin classifier and geometry [2]. The margin maximization is a famous strategy to design the optimal separating hyperplane for a linearly separable problem, leading to the well-known support vector machines (SVMs). From a geometric reasoning viewpoint, in linear cases the maximal margin of an SVM can be defined as the minimum distance between the closest boundary data point. However, it becomes much more complex in nonlinear cases, which may be dealt with the kernel trick, involving some difficulties in the selection of kernel functions [3], [4] and the interpretability/transparency of metric changes [5].
In order to intuitively define a maximal margin for nonlinear cases, piecewise linear learning would come to be a good strategy [6]. However, Oriol and David introduced another strategy—“geometry-based ensembles” [7] as a novel binary discriminative learning technique, which approximate the decision border by means of a Tikhonov regularized optimization procedure in an additive model of characteristic boundary points (CBPs). Such points belong to the optimal boundary following certain definitions of robustness and margin. In the model of geometry-based ensembles, CBPs are used to create a set of locally optimal and robust hyperplanes, which are linearly assembled into the final λ-smooth decision rule using a Tikhonov regularization framework for solution of ill-posed problems [8]. Since the Tikhonov framework smoothes the classification boundaries by controlling the robustness in front of noise with a parameter λ, it can yield an optimized geometric ensemble (OGE) less prone to overfitting.
According to the previous discussion, computation of CBPs is the basis of OGE. However, the number of CBPs for a data set is generally (sometimes even much) larger than the size of the data set. For Table 1, one may see the CBP numbers for 12 data sets from the standard UCI database [9], listed by the “CN” column and computed by the algorithm mentioned in Section 2. It is easy to notice that 10 of the data sets have a larger number of CBPs than their size, by comparing with the other (bre) and (brc). Moreover, the three data sets of (sph), (cvr) and (bas) have much larger numbers (17270, 5258 and 2887) of CBPs than their corresponding sizes (267, 435 and 625) respectively.
On the other hand, OGE can be actually interpreted as a single-hidden layer feedforward neural network (SLFN) (see Section 3), which uses CBPs as its hidden nodes. Meanwhile, it has been shown that with the input weights and the bias values chosen randomly, an SLFN with at most M random hidden nodes can learn M distinct observations with arbitrarily small error [10]. According to the interpretation of OGE as an SLFN, there should be a possibility for OGE to successfully assemble a small random subset of CBPs, not the whole set. With this possibility, we are motivated to simplify OGE into random optimized geometric ensembles (ROGE), and compare their performances.
The structure of the paper is as follows. In Section 2, we give a fast algorithm to compute CBPs by modifying Algorithm 1 in [7], with a typo corrected. We discuss the relationship of OGE to SLFN in Section 3 and simplify OGE into ROGE in Section 4. Section 5 compares the performance of ROGE to OGE and Section 6 makes a few conclusions.
Section snippets
Computation of CBPs
Given a binary training set S={(xk, lk):xk∈Rd, lk∈{+1,−1}, 1≤k≤M} with (xi,+1)∈S+ and (xj,−1)∈S−, is called a characteristic boundary point (CBP) if and only if ∀(xk, lk)∈S, (xi,+1) and (xj,−1) fulfill the Gabriel neighboring rule [7],
As shown in Fig. 1, xm could be a CBP if no other points lie in the hypershpere centered on xm with radius ‖xi−xj‖/2. Given the distances among xi, xj and xk, Algorithm 1 in [7] computes the distance
Relationship of OGE to SLFN
Using the whole set of CBPs with , the OGE creates a piecewise linear smooth additive model as follows [7]:where , , .
In Eq. (3), sign stands for the signum function; the weight vector α=(α1, α2,…,αN)T is determined using a Tikhonov regularization framework, which is a conservative approach to control the robustness in front of noise with a parameter λ. More
Using SLFN to simplify OGE
A general single-output SLFN with N hidden nodes and activation function g is mathematically modeled as the formwhere αk, α0, bk∈R, wk∈Rd, with α0=0 generally.
The form is also called the random vector version of the functional-link (RVFL) net [11], if the parameters wk, bk of the hidden layer are selected randomly and independently in advance and parameters αk of the output layer are learned using simple quadratic optimization.
It has been shown that in the RVFL an SLFN
Comparative experiments
The parameters in SLFN (7) can be reliably estimated by the OGE using the whole set of CBPs. In order to quantitatively validate that this can also be done using a small random subset of CBPs, we compare the performances of OGE to ROGE, on the 12 data sets listed in Table 1. Each feature of these data sets is linearly scaled to [0,1] before conducting comparative experiments. This may lead to some difference of the OGE results from those in [7], where all data sets were normalized with respect
Conclusions
By interpreting the OGE as an SLFN, this paper proposed a random version of OGE, i.e, ROGE, which randomly selects a small subset of CBPs to construct the hidden layer in the final decision model from the viewpoint of neural networks. Although it is very time-consuming to choose the optimal number of CBPs for a dataset, comparative experiments validate that the classification accuracy of ROGE is on par with OGE in less space requirement and less training time after getting the optimal number.
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grants nos. 61175004 and 60775010, the Natural Science Foundation of Beijing under Grants nos. 4112009 and 4113067, the Beijing Municipal Education Commission Science and Technology Development Plan key project under Grant no. KZ201210005007, and Beijing University of Technology High-level Personnel Development Project.
The authors would like to thank the anonymous associate editor and reviewers, whose
Yujian Li received the MS degree in mathematics from the Institute of Mathematics, Chinese Academy of Sciences, Beijing, China, in 1993, and the Ph.D. degree in semiconductor devices and microelectronics from the Institute of Semiconductors, Chinese Academy of Sciences, in 1999. He was an Associate Professor at the College of Computer Science and Technology, Beijing University of Technology, Beijing, since June 2001, and has been a Professor since December 2007.
His current research interests
References (15)
- et al.
Improving neural network training solutions using regularisation
Neurocomputing
(2001) - et al.
Comparing error minimized extreme learning machines and support vector sequential feed-forward neural networks
Neural Networks
(2012) - B.K. Bhattacharya, K. Mukherjee, G.T. Toussaint, Geometric decision rules for instance-based learning algorithms, in:...
- W. Zhang, I. King, A study of the relationship between support vector machine and Gabriel graph, in: Neural Networks,...
- et al.
An introduction to kernel-based learning algorithms
IEEE Trans. Neural Networks
(2001) - et al.
MultiK-MHKS: a novel multiple kernel earning algorithm
IEEE Trans. Pattern Anal. Mach. Intell.
(2008) - et al.
Additive support vector machines for pattern classification
IEEE Trans. Syst. Man Cybern. B: Cybern.
(2007)
Cited by (0)
Yujian Li received the MS degree in mathematics from the Institute of Mathematics, Chinese Academy of Sciences, Beijing, China, in 1993, and the Ph.D. degree in semiconductor devices and microelectronics from the Institute of Semiconductors, Chinese Academy of Sciences, in 1999. He was an Associate Professor at the College of Computer Science and Technology, Beijing University of Technology, Beijing, since June 2001, and has been a Professor since December 2007.
His current research interests include pattern analysis, machine intelligence, especially piecewise linear learning that he has pioneered as a new research direction.
Dongxia Meng is currently pursuing the MS degree at the College of Computer Science and Technology, Beijing University of Technology, Beijing, China. Her current research interests include pattern recognition and machine learning.
Zhiming Gui received the Ph.D. degree from Peking University in 2005. He is now a lecturer at the College of Computer Science and Technology, Beijing University of Technology, Beijing. His current research interests include spatiotemporal data mining, data warehouse and artificial intelligence.