Elsevier

Neurocomputing

Volume 94, 1 October 2012, Pages 159-163
Neurocomputing

Letters
Random optimized geometric ensembles

https://doi.org/10.1016/j.neucom.2012.03.005Get rights and content

Abstract

We use a single-hidden layer feedforward neural network (SLFN) to interpret the model of optimized geometric ensembles (OGE). Based on the SLFN, we simplify OGE into random optimized geometric ensembles (ROGE), which may contain much less hidden nodes than that of OGE. Furthermore, on 12 UCI data sets we verify that ROGE can achieve the same level of classification performance as OGE in less consumption of space and time.

Introduction

Geometric reasoning is related to the design of learning machine in editing rules for the nearest neighbor technique [1] as well as in analyzing the relationship between large margin classifier and geometry [2]. The margin maximization is a famous strategy to design the optimal separating hyperplane for a linearly separable problem, leading to the well-known support vector machines (SVMs). From a geometric reasoning viewpoint, in linear cases the maximal margin of an SVM can be defined as the minimum distance between the closest boundary data point. However, it becomes much more complex in nonlinear cases, which may be dealt with the kernel trick, involving some difficulties in the selection of kernel functions [3], [4] and the interpretability/transparency of metric changes [5].

In order to intuitively define a maximal margin for nonlinear cases, piecewise linear learning would come to be a good strategy [6]. However, Oriol and David introduced another strategy—“geometry-based ensembles” [7] as a novel binary discriminative learning technique, which approximate the decision border by means of a Tikhonov regularized optimization procedure in an additive model of characteristic boundary points (CBPs). Such points belong to the optimal boundary following certain definitions of robustness and margin. In the model of geometry-based ensembles, CBPs are used to create a set of locally optimal and robust hyperplanes, which are linearly assembled into the final λ-smooth decision rule using a Tikhonov regularization framework for solution of ill-posed problems [8]. Since the Tikhonov framework smoothes the classification boundaries by controlling the robustness in front of noise with a parameter λ, it can yield an optimized geometric ensemble (OGE) less prone to overfitting.

According to the previous discussion, computation of CBPs is the basis of OGE. However, the number of CBPs for a data set is generally (sometimes even much) larger than the size of the data set. For Table 1, one may see the CBP numbers for 12 data sets from the standard UCI database [9], listed by the “CN” column and computed by the algorithm mentioned in Section 2. It is easy to notice that 10 of the data sets have a larger number of CBPs than their size, by comparing with the other (bre) and (brc). Moreover, the three data sets of (sph), (cvr) and (bas) have much larger numbers (17270, 5258 and 2887) of CBPs than their corresponding sizes (267, 435 and 625) respectively.

On the other hand, OGE can be actually interpreted as a single-hidden layer feedforward neural network (SLFN) (see Section 3), which uses CBPs as its hidden nodes. Meanwhile, it has been shown that with the input weights and the bias values chosen randomly, an SLFN with at most M random hidden nodes can learn M distinct observations with arbitrarily small error [10]. According to the interpretation of OGE as an SLFN, there should be a possibility for OGE to successfully assemble a small random subset of CBPs, not the whole set. With this possibility, we are motivated to simplify OGE into random optimized geometric ensembles (ROGE), and compare their performances.

The structure of the paper is as follows. In Section 2, we give a fast algorithm to compute CBPs by modifying Algorithm 1 in [7], with a typo corrected. We discuss the relationship of OGE to SLFN in Section 3 and simplify OGE into ROGE in Section 4. Section 5 compares the performance of ROGE to OGE and Section 6 makes a few conclusions.

Section snippets

Computation of CBPs

Given a binary training set S={(xk, lk):xkRd, lk∈{+1,−1}, 1≤kM} with (xi,+1)∈S+ and (xj,−1)∈S, xcpi,j=xm=1/2(xi+xj)Rd is called a characteristic boundary point (CBP) if and only if ∀(xk, lk)∈S, (xi,+1) and (xj,−1) fulfill the Gabriel neighboring rule [7],xixcpi,j=xjxcpi,jxkxcpi,j

As shown in Fig. 1, xm could be a CBP if no other points lie in the hypershpere centered on xm with radius ‖xixj‖/2. Given the distances among xi, xj and xk, Algorithm 1 in [7] computes the distance

Relationship of OGE to SLFN

Using the whole set of CBPs with N=CN=|{xcpi,j}|, the OGE creates a piecewise linear smooth additive model as follows [7]:lˆ=sign((x)α0)=sign(k=1Nαksign(πk(x))α0)where (x)=k=1Nαksign(πk(x)), πxcpi,j(x)=nxcpi,j(xxcpi,j), nxcpi,j=(xixj)/(xixj).

In Eq. (3), sign stands for the signum function; the weight vector α=(α1, α2,…,αN)T is determined using a Tikhonov regularization framework, which is a conservative approach to control the robustness in front of noise with a parameter λ. More

Using SLFN to simplify OGE

A general single-output SLFN with N hidden nodes and activation function g is mathematically modeled as the formfN(x)=k=1Nαkg(wkx+bk)α0where αk, α0, bkR, wkRd, with α0=0 generally.

The form is also called the random vector version of the functional-link (RVFL) net [11], if the parameters wk, bk of the hidden layer are selected randomly and independently in advance and parameters αk of the output layer are learned using simple quadratic optimization.

It has been shown that in the RVFL an SLFN

Comparative experiments

The parameters in SLFN (7) can be reliably estimated by the OGE using the whole set of CBPs. In order to quantitatively validate that this can also be done using a small random subset of CBPs, we compare the performances of OGE to ROGE, on the 12 data sets listed in Table 1. Each feature of these data sets is linearly scaled to [0,1] before conducting comparative experiments. This may lead to some difference of the OGE results from those in [7], where all data sets were normalized with respect

Conclusions

By interpreting the OGE as an SLFN, this paper proposed a random version of OGE, i.e, ROGE, which randomly selects a small subset of CBPs to construct the hidden layer in the final decision model from the viewpoint of neural networks. Although it is very time-consuming to choose the optimal number of CBPs for a dataset, comparative experiments validate that the classification accuracy of ROGE is on par with OGE in less space requirement and less training time after getting the optimal number.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants nos. 61175004 and 60775010, the Natural Science Foundation of Beijing under Grants nos. 4112009 and 4113067, the Beijing Municipal Education Commission Science and Technology Development Plan key project under Grant no. KZ201210005007, and Beijing University of Technology High-level Personnel Development Project.

The authors would like to thank the anonymous associate editor and reviewers, whose

Yujian Li received the MS degree in mathematics from the Institute of Mathematics, Chinese Academy of Sciences, Beijing, China, in 1993, and the Ph.D. degree in semiconductor devices and microelectronics from the Institute of Semiconductors, Chinese Academy of Sciences, in 1999. He was an Associate Professor at the College of Computer Science and Technology, Beijing University of Technology, Beijing, since June 2001, and has been a Professor since December 2007.

His current research interests

References (15)

There are more references available in the full text version of this article.

Cited by (0)

Yujian Li received the MS degree in mathematics from the Institute of Mathematics, Chinese Academy of Sciences, Beijing, China, in 1993, and the Ph.D. degree in semiconductor devices and microelectronics from the Institute of Semiconductors, Chinese Academy of Sciences, in 1999. He was an Associate Professor at the College of Computer Science and Technology, Beijing University of Technology, Beijing, since June 2001, and has been a Professor since December 2007.

His current research interests include pattern analysis, machine intelligence, especially piecewise linear learning that he has pioneered as a new research direction.

Dongxia Meng is currently pursuing the MS degree at the College of Computer Science and Technology, Beijing University of Technology, Beijing, China. Her current research interests include pattern recognition and machine learning.

Zhiming Gui received the Ph.D. degree from Peking University in 2005. He is now a lecturer at the College of Computer Science and Technology, Beijing University of Technology, Beijing. His current research interests include spatiotemporal data mining, data warehouse and artificial intelligence.

View full text