Sample-dependent graph construction with application to dimensionality reduction

doi:10.1016/j.neucom.2010.03.019

Neurocomputing

Volume 74, Issues 1–3, December 2010, Pages 301-314

https://doi.org/10.1016/j.neucom.2010.03.019 Get rights and content

Abstract

Graph construction plays a key role on learning algorithms based on graph Laplacian. However, the traditional graph construction approaches of ε-neighborhood and k-nearest-neighbor need to predefine the same neighbor parameter ε (or k) for all samples, which usually suffers from the difficulty of parameter selection and generally fail to effectively fit intrinsic structures of data. To mitigate these limitations to a certain extent, in this paper we present a novel and sample-dependent approach of graph construction, and name the so-constructed graph as Sample-dependent Graph (SG). Specifically, instead of predefining the same neighbor parameter for all samples, the SG depends on samples in question to determine neighbors of each sample and similarities between sample pairs. As a result, it not only avoids the intractability and high expense of neighbor parameter selection but also can more effectively fit the intrinsic structures of data. Further, in order to show the effectiveness of the SG, we apply it to the dimensionality reduction based on graph embedding, and incorporate it into the state-of-the-art off-the-shelf unsupervised locality preserving projection (LPP) to develop the sample-dependent LPP (SLPP). SLPP naturally inherits the merits of SG and maintains the attractive properties of the traditional LPP. The experiments on the toy and benchmark (UCI, face recognition, object category and handwritten digits recognition) datasets show the effectiveness and feasibility of the SG and SLPP with promising results.

Introduction

Graph Laplacian based learning has attracted growing interest; thus many corresponding algorithms have recently been presented, which widely involve dimensionality reduction (DR) [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], semi-supervised learning [11], [12], [13], [14], [15], label propagation [16], [17], [18], clustering [19], [20], [21], [22], [23], image match [18], [24], etc. For these algorithms, the graph construction has become a heart. As claimed in [13], constructing a ‘good’ graph is even more important than choosing a ‘good’ algorithm. So, it is undoubtedly necessary and valuable to study graph construction and propose new construction approaches. In fact, some researchers have recently dedicated to study the graph construction [25], [26], [27], [28], [29]. Maier and Luxburg [25] showed that k-nearest-neighbor and ε-neighborhood graphs can produce extraordinarily different influences on unsupervised spectral clustering. Jebara et al. [26] presented a b-matching graph, which is alternative to k-nearest-neighbor graph. The authors in [27], [28] focused on how to combine different graphs in a way that tends to give heavier weight to a better graph while authors in [29] presented an in-depth study about the convergence of graph Laplacians on random neighborhood graphs. Obviously, these works provide some insight into the study of graph construction. However, it is still untouched how to construct a more effective graph depending on samples at hand; so, in this paper, we attempt to present an approach of sample-dependent graph construction.

More concretely, the sample-dependent graph construction is motivated by the following examinations: the determination of neighbors of each sample, i.e., construction of edges for a graph plays a key role in the process of graph construction. In traditional to graph constructions, neighbors of a sample are searched by the two main approaches of ε-neighborhood and k-nearest-neighbor [4], [19]. Both approaches need to predefine the same neighbor parameter ε (or k) for all samples. However, it is not easy to construct an effective graph for learning tasks by the two approaches due to the two main reasons:

(i)
The parameter is in advance set to the same value for all samples. Clearly, such setting is not very reasonable because local structures of each sample usually are different. Thus, the so-constructed graph generally fails to effectively capture and fit the intrinsic structures of data.
(ii)
The selection of neighbor parameter is still intractable. It is well known that up to date, how to select parameter is still an open problem. Although there are some approaches to address it, such as cross-validation, they not only are time-consuming and expensive in a high-dimensional space but also waste the limited training samples. Especially when the available training samples are relatively small, such as the face recognition task with only one sample per person [30], existing approaches tend to fail.

To mitigate the harassment of the traditional approaches, in this paper, we propose a sample-dependent approach of graph construction and name the so-constructed graph as Sample-dependent Graph (SG), which is our first contribution. Different from the traditional ε-neighborhood and k-nearest-neighbor graphs, instead of setting in advance the same neighbor parameter for all samples, the SG depends on samples in question to determine the neighbors of each sample and similarities between sample pairs. More specifically, for sample x_i, if the similarity between it and sample x_j is greater than the mean of similarities between sample x_i and all samples, then x_j becomes its neighbor. In other words, the Mean of Similarities (MS) for x_i defines its own similarity neighborhood, i.e., the hypersphere whose center is x_i and radius is MS. So the similarities of the samples within the hypersphere are greater than MS and they become automatically its neighbors. Correspondingly, the entry ${W_{i j}}^{S}$ of adjacency matrix W^S is set to the similarity value. Notice that ${W_{j i}}^{S}$ is not set to the same value because the defined similarity function (see its definition in subsection 3.1) varies from sample to sample and from data to data.

Compared with the traditional ε-neighborhood and k-nearest-neighbor graphs, the SG has some favorable and attractive properties:

(i)
The SG depends on samples in question to determine the neighbors of each sample and similarities between sample pairs. Such strategy avoids the stiff criterion as used in the two traditional graphs, i.e., the same neighbor parameter are predefined for all the samples.
(ii)
Although the SG is not yet parameter-free due to the similarity parameter, it reduces the number of parameters compared to the traditional graphs and thus is convenient and simple to use in learning algorithms. Undoubtedly, this can reduce the complexity in real applications.
(iii)
From the construction definition of the SG, it can be demonstrated (see Proposition 1 in Section 3 and the toy experiment in Section 4.1) that its adjacency matrix is generally asymmetric even if x_i and x_j are neighbors of each other, which is generally more reasonable and intuitive for effectively capturing and fitting the intrinsic structures of data.
(iv)
In the SG, generally, a sample in high-density area has more neighbors; on the contrary, a sample in low-density area has relatively less neighbors (this characteristic is simply shown in the toy experiment in Section 4.1), which reflects certain statistics property.
(v)
There are some similarities between the SG and the two traditional graphs, but the SG is not their special case (see (B) in Section 3.2).
(vi)
The SG is very general. It can potentially be adapted to many learning algorithms based on graph Laplacian.

As a construction strategy of graph, the proposed approach can potentially be applied to many learning algorithms based on graph Laplacian; however, in this paper, we just concern on an application of SG to DR to demonstrate the effectiveness of our strategy mainly due to its attractive properties and popularity.¹ In fact, many DR algorithms based on graphs [1], [2], [4], [10], [20], [22], [23], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44] have recently been proposed, typically like the locality preserving projection (LPP) [1], neighborhood preserving embedding (NPE) [33], Bayesian tensor analysis (BTA) [38], [39], local coordinates alignment (LCA) [40], etc. Their implementations share a common routine, i.e., first construct an affinity graph and then embed it into an objective function to optimize the function for preserving some (locally geometrical or structural) properties of original high-dimensional space; in such a way, it get a projection matrix for dimensionality reduction. We follow the routine as well. Specifically, we incorporate the SG into the state-of-the-art off-the-shelf unsupervised LPP to develop the sample-dependent LPP (SLPP), which is our second contribution.

SLPP also has some attractive properties:

(i)
SLPP naturally inherits all merits of the SG, as mentioned in (i)–(vi) above.
(ii)
It is worth to point out that though the adjacency matrix of SG is generally asymmetric, the Laplacian matrix induced from it is still symmetric (see Proposition 2 in Section 3).
(iii)
SLPP shares some good characteristics of LPP, such as avoidance of the “out-of-sample” problem [21].
(iv)
The SG can be directly incorporated into the off-the-shelf LPP, hence does not bring much more computational complexity compared to LPP.

The rest of this paper is organized as follows: Section 2 briefly reviews the traditional approaches of graph construction, and takes LPP as an example to introduce unsupervised DR algorithms based on graph embedding. Section 3 firstly presents the novel SG, demonstrates its property and then develops the unsupervised SLPP algorithm. Section 4 performs the experiments to evaluate the effectiveness and feasibility of the SG and SLPP. Section 5 gives the conclusion remarks and discusses the future work.

Section snippets

Traditional graph construction

Given a set of n samples X={x₁,…, x_n}, x_i∈R^D, a weighted graph G={V, E, W} can be constructed, where V corresponds to the sample points in set X, E denotes the edge set between the sample pairs and entry W_ij of the adjacency matrix and W denotes similarity between x_i and x_j. The construction process of graph G can typically be decomposed into two steps: (1) construction of edges and (2) calculation of weight value for each edge. Step (1) mainly includes the two categories of ε-neighborhood and k

The construction of sample-dependent graph (SG) and sample-dependent LPP (SLPP)

As discussed above, the limitations of ε-neighborhood and k-nearest-neighbor preclude the so-constructed graph from effectively serving subsequent learning tasks. To mitigate them to a certain extent, in this section, we firstly present a sample-dependent approach of graph construction, and then incorporate the sample-dependent graph into LPP to develop the sample-dependent LPP (SLPP).

Experiments

To show the properties of SG, in this section, firstly, we design a synthetic dataset and show similarities and differences of SG between ε-neighborhood and k-nearest-neighbor graphs on the dataset. To show parameters sensitivity of the three graphs, we select Wine dataset from UCI [47] to compare SLPP and LPP based on two traditional graphs, and further show visualization effect of several competitive algorithms on Wine dataset. To show effectiveness and feasibility of both SG and SLPP

Conclusions and future works

In this paper, we present an approach of sample-dependent graph construction for learning algorithms based on graph Laplacian and name the so-constructed graph as sample-dependent graph (SG). The SG depends on samples in question to determine neighbors of each sample and similarities between sample pairs; thus it reduces the number of parameters compared to the traditional graphs and leads to asymmetrical adjacency matrix, which often can more effectively capture and fit intrinsic structures of

Acknowledgements

This work is partly supported by National Natural Science Foundation of China under Grant no. 60973097 and the Higher Education University Doctor foundation under Grant no. 200802870003. Thank to Xiaofei He and Deng Cai for providing the codes of LPP and NPE on their homepages.

Bo Yang received the B.Sc. degree in Computer Applications from Jilin Normal University in 2002. She received the M.Sc. degree in Computer Applications at Anhui University of Technology in 2007. She is currently a Ph.D. candidate in Department of Computer Science and Engineering, Nanjing University of Aeronautics & Astronautics (NUAA). Her research interests focus on computer vision, dimensionality reduction, pattern recognition, machine learning and their applications.

References (53)

Y. Fu et al.
Locally adaptive subspace and similarity metric learning for visual data clustering and retrieval
Comput. Vis. Image Understand.
(2008)
X. Tan et al.
Face recognition from a single image per person: A survey
Pattern Recog.
(2006)
X. He, P. Niyogi, Locality preserving projections, in: Advances in Neural Information Processing Systems,...
S. Yan et al.
Graph embedding and extension: a general framework for dimensionality reduction
IEEE Trans. Pattern Anal. Mach. Intell.
(2007)
X. He et al.
Face recognition using laplacian faces
IEEE Trans. Pattern Anal. Mach. Intell.
(2005)
M. Belkin et al.
Laplacian eigenmaps for dimensionality reduction and data representation
Neural Comput.
(2003)
J. Tenenbaum et al.
A global geometric framework for nonlinear dimensionality reduction
Science
(2000)
S. Roweis et al.
Nonlinear dimensionality reduction by locally linear embedding
Science
(2000)
D. Cai, X. He, K. Zhou, J. Han, and H. Bao, Locality sensitive discriminant analysis, in: International Joint...
J. Yang et al.
Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics
IEEE Trans. Pattern Anal. Mach. Intell.
(2007)

W. Deng et al.

Comments on globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics

IEEE Trans. Pattern Anal. Mach. Intell.

(2008)

Y. Fu, T.S. Huang, Locally linear embedded eigenspace analysis, IFP-TR, UIUC,...

D. Cai, X. He, J. Han, Semi-supervised discriminant analysis, in: IEEE International Conference on Computer Vision, Rio...

X. Yang, H. Fu, H. Zha, J.L. Barlow, Semi-supervised nonlinear dimensionality reduction, in: International Conference...

X. Zhu, Semi-supervised learning literature survey, Computer Sciences Technical Report 1530, University of...

D. Zhou, O. Bousquet, T.N. Lal, J..Weston, B. Schölkopf, Learning with local and global consistency, in: Advances in...

W. Tong, R. Jin, Semi-supervised learning by mixed label propagation, in: The Twenty-Second AAAI Conference on...

Y. Zhang, Z. Zhou, Non-metric label propagation, in: International Joint Conference on Artificial Intelligence,...

F. Wang, C. Zhang, Label propagation through linear neighborhoods, in: International Conference on Machine Learning,...

X. Zhu, Z. Ghahramani, J. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions, in:...

M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in: Advances in Neural...

D. Zhou, C. Burges, Spectral clustering and transductive learning with multiple views, in: International Conference on...

Y. Bengio, J. Paiement, P. Vincent, Out-of-sample extensions for LLE, Isomap, MDS, eigenmaps, and spectral clustering,...

Y. Fu, T.S. Huang, Unsupervised locally embedded clustering for automatic high-dimensional data labeling, in: IEEE...

C. Schellewald, C. Schnorr, Probabilistic subgraph matching approach based on convex relaxation, in: Energy...

M. Maier, U. Luxburg, Influence of graph construction on graph-based clustering measures, in: Advances in The Neural...

Cited by (56)

Dimensionality reduction by collaborative preserving Fisher discriminant analysis
2019, Neurocomputing
Sparse representation-based classifier (SRC) and collaborative representation-based classifier (CRC) are two commonly used classifiers. There has been pointed out that the utilization of all the training samples in representing a query sample (i.e. the least square part), which reflects the collaborative representation mechanism of SRC and CRC, is more important than the norm constraint on the coding coefficients for classification. From this perspective, both SRC and CRC can be viewed as collaborative representation (CR) but with different norm (i.e. L1 and L2) constraints on the coding coefficients. In this paper, two collaborative preserving Fisher discriminant analysis approaches are proposed for linear dimensionality reduction, in which both the local geometric information hidden in the CR coefficients and the global discriminant information inherited from Fisher/linear discriminant analysis (FDA/LDA) are effectively fused. Specifically, a datum adaptive graph is first built via CR with L1 or L2 norm constraint (corresponding to L1CPFDA and L2CPFDA, respectively), and then incorporated into the LDA framework to seek a powerful projection subspace with analytic solution. Both theoretical and experimental analysis of L1CPFDA and L2CPFDA show that they can best preserve the collaborative reconstruction relationship of the data and discriminate samples of different classes as well. Moreover, LDA is a special case of L1CPFDA and L2CPFDA and the available number of projection directions of them are twice that of LDA empirically. Experimental results on ORL, AR and FERET face databases and COIL-20 object database demonstrate their effectiveness, especially in low dimensions and small training sample size.
Joint graph optimization and projection learning for dimensionality reduction
2019, Pattern Recognition
Nowadays, graph-based dimensionality reduction approaches have become more and more popular due to their successful utilization for classification and clustering tasks. In these approaches, how to establish an appropriate graph is critical. To address this issue, a novel graph-based dimensionality reduction framework termed joint graph optimization and projection learning (JGOPL) is proposed in this paper. Compared with existing dimensionality reduction approaches, there are three main advantages of JGOPL. First, through performing the graph optimization and low-dimensional feature learning simultaneously, our proposed approach can accomplish the tasks of graph construction and dimensionality reduction jointly. Second, the l₂₁-norm based distance measurement is adopted in the loss function of our JGOPL so that its robustness to the negative influence caused by the outliers or variations of data can be improved. Third, in order to well exploit and preserve the local structure information of high-dimensional data, a locality constraint is introduced into the proposed JGOPL to discourage a sample from connecting with the distant samples during graph optimization. Extensive classification and clustering experiments are carried out on seven publicly available databases to demonstrate the effectiveness of our approach. At last, the locality constraint and graph optimization strategy proposed in this paper is not only limited to dimensionality reduction, but also can be incorporated into other relevant graph-based tasks (such as spectral clustering).
Heterogeneous domain adaptation network based on autoencoder
2018, Journal of Parallel and Distributed Computing
Heterogeneous domain adaptation is a more challenging problem than homogeneous domain adaptation. The transfer effect is not ideally caused by shallow structure which cannot adequately describe the probability distribution and obtain more effective features. In this paper, we propose a heterogeneous domain adaptation network based on autoencoder, in which two sets of autoencoder networks are used to project the source-domain and target-domain data to a shared feature space to obtain more abstractive feature representations. In the last feature and classification layer, the marginal and conditional distributions can be matched by empirical maximum mean discrepancy metric to reduce distribution difference. To preserve the consistency of geometric structure and label information, a manifold alignment term based on labels is introduced. The classification performance can be improved further by making full use of label information of both domains. The experimental results of 16 cross-domain transfer tasks verify that HDANA outperforms several state-of-the-art methods.
Locality constrained Graph Optimization for Dimensionality Reduction
2017, Neurocomputing
Citation Excerpt :
Among the methods to be compared, NPE [16], LPP [17] NMF [40] and GNMF [41] are classical unsupervised linear or graph based dimensionality reduction algorithms which construct the graph using k nearest neighbor or ɛ ball scheme. SGLPP [20] utilizes the sample dependent strategy for graph construction. SPP [21] is a typical L1 graph based dimensionality reduction algorithm.
Recently, graph-based dimensionality reduction methods have attracted much attention due to their widely applications in many practical tasks such as image classification and data clustering. However, an inappropriate graph which cannot accurately reflect the underlying structure and distribution of input data will dramatically deteriorate the performances of these methods. In this paper, we propose a novel algorithm termed Locality Constrained Graph Optimization Dimensionality Reduction (LC-GODR) to address the limitations of existing graph-based dimensionality reduction methods. Firstly, unlike most graph-based dimensionality reduction methods in which the graphs are constructed in advance and kept unchanged during dimensionality reduction, our LC-GODR combines the graph optimization and projection matrix learning into a joint framework. Therefore, the graph in the proposed algorithm can be adaptively updated during the procedure of dimensionality reduction. Secondly, through introducing the locality constraints into our LC-GODR, the local information of high-dimensional input data can be discovered and well preserved, which makes the proposed algorithm distinct from other graph optimization based dimensionality reduction methods. Moreover, an effective updating scheme is also provided to solve the proposed LC-GODR. Extensive experiments on two UCI and five image databases are conducted to demonstrate the effectiveness of our algorithm. The experimental results indicate that the proposed LC-GODR outperforms other related methods.
Attribute-based Decision Graphs: A framework for multiclass data classification
2017, Neural Networks
Graph-based algorithms have been successfully applied in machine learning and data mining tasks. A simple but, widely used, approach to build graphs from vector-based data is to consider each data instance as a vertex and connecting pairs of it using a similarity measure. Although this abstraction presents some advantages, such as arbitrary shape representation of the original data, it is still tied to some drawbacks, for example, it is dependent on the choice of a pre-defined distance metric and is biased by the local information among data instances. Aiming at exploring alternative ways to build graphs from data, this paper proposes an algorithm for constructing a new type of graph, called Attribute-based Decision Graph–AbDG. Given a vector-based data set, an AbDG is built by partitioning each data attribute range into disjoint intervals and representing each interval as a vertex. The edges are then established between vertices from different attributes according to a pre-defined pattern. Classification is performed through a matching process among the attribute values of the new instance and AbDG. Moreover, AbDG provides an inner mechanism to handle missing attribute values, which contributes for expanding its applicability. Results of classification tasks have shown that AbDG is a competitive approach when compared to well-known multiclass algorithms. The main contribution of the proposed framework is the combination of the advantages of attribute-based and graph-based techniques to perform robust pattern matching data classification, while permitting the analysis the input data considering only a subset of its attributes.
A learning-based CT prostate segmentation method via joint transductive feature selection and regression
2016, Neurocomputing
In recent years, there has been a great interest in prostate segmentation, which is an important and challenging task for CT image guided radiotherapy. In this paper, a learning-based segmentation method via joint transductive feature selection and transductive regression is presented, which incorporates the physician׳s simple manual specification (only taking a few seconds), to aid accurate segmentation, especially for the case with large irregular prostate motion. More specifically, for the current treatment image, experienced physician is first allowed to manually assign the labels for a small subset of prostate and non-prostate voxels, especially in the first and last slices of the prostate regions. Then, the proposed method follows the two step: in prostate-likelihood estimation step, two novel algorithms, tLasso and wLapRLS, will be sequentially employed for transductive feature selection and transductive regression, respectively, aiming to generate the prostate-likelihood map. In multi-atlases based label fusion step, the final segmentation result will be obtained according to the corresponding prostate-likelihood map and the previous images of the same patient. The proposed method has been substantially evaluated on a real prostate CT dataset including 24 patients with 330 CT images, and compared with several state-of-the-art methods. Experimental results show that the proposed method outperforms the state-of-the-arts in terms of higher Dice ratio, higher true positive fraction, and lower centroid distances. Also, the results demonstrate that simple manual specification can help improve the segmentation performance, which is clinically feasible in real practice.

View all citing articles on Scopus

Songcan Chen received the B.Sc. degree in Mathematics from Hangzhou University (now merged into Zhejiang University) in 1983. In December 1985, he completed the M.Sc. degree in Computer Applications at Shanghai Jiaotong University and then worked at Nanjing university of Aeronautics & Astronautics (NUAA) in January 1986 as an Assistant Lecturer. There he received a Ph.D. degree in Communication and Information Systems in 1997. Since 1998, as a full Professor, he has been with the Department of Computer Science and Engineering at NUAA. His research interests include pattern recognition, machine learning and neural computing. In these fields, he has authored or coauthored over 130 scientific journal papers.

View full text

Sample-dependent graph construction with application to dimensionality reduction

Abstract

Introduction

Section snippets

Traditional graph construction

The construction of sample-dependent graph (SG) and sample-dependent LPP (SLPP)

Experiments

Conclusions and future works

Acknowledgements

Comput. Vis. Image Understand.

Pattern Recog.

Graph embedding and extension: a general framework for dimensionality reduction

IEEE Trans. Pattern Anal. Mach. Intell.

Face recognition using laplacian faces

IEEE Trans. Pattern Anal. Mach. Intell.

Laplacian eigenmaps for dimensionality reduction and data representation

Neural Comput.

A global geometric framework for nonlinear dimensionality reduction

Science

Nonlinear dimensionality reduction by locally linear embedding

Science

Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics

IEEE Trans. Pattern Anal. Mach. Intell.

Comments on globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics

IEEE Trans. Pattern Anal. Mach. Intell.