Elsevier

Neurocomputing

Volume 74, Issues 1–3, December 2010, Pages 301-314
Neurocomputing

Sample-dependent graph construction with application to dimensionality reduction

https://doi.org/10.1016/j.neucom.2010.03.019Get rights and content

Abstract

Graph construction plays a key role on learning algorithms based on graph Laplacian. However, the traditional graph construction approaches of ε-neighborhood and k-nearest-neighbor need to predefine the same neighbor parameter ε (or k) for all samples, which usually suffers from the difficulty of parameter selection and generally fail to effectively fit intrinsic structures of data. To mitigate these limitations to a certain extent, in this paper we present a novel and sample-dependent approach of graph construction, and name the so-constructed graph as Sample-dependent Graph (SG). Specifically, instead of predefining the same neighbor parameter for all samples, the SG depends on samples in question to determine neighbors of each sample and similarities between sample pairs. As a result, it not only avoids the intractability and high expense of neighbor parameter selection but also can more effectively fit the intrinsic structures of data. Further, in order to show the effectiveness of the SG, we apply it to the dimensionality reduction based on graph embedding, and incorporate it into the state-of-the-art off-the-shelf unsupervised locality preserving projection (LPP) to develop the sample-dependent LPP (SLPP). SLPP naturally inherits the merits of SG and maintains the attractive properties of the traditional LPP. The experiments on the toy and benchmark (UCI, face recognition, object category and handwritten digits recognition) datasets show the effectiveness and feasibility of the SG and SLPP with promising results.

Introduction

Graph Laplacian based learning has attracted growing interest; thus many corresponding algorithms have recently been presented, which widely involve dimensionality reduction (DR) [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], semi-supervised learning [11], [12], [13], [14], [15], label propagation [16], [17], [18], clustering [19], [20], [21], [22], [23], image match [18], [24], etc. For these algorithms, the graph construction has become a heart. As claimed in [13], constructing a ‘good’ graph is even more important than choosing a ‘good’ algorithm. So, it is undoubtedly necessary and valuable to study graph construction and propose new construction approaches. In fact, some researchers have recently dedicated to study the graph construction [25], [26], [27], [28], [29]. Maier and Luxburg [25] showed that k-nearest-neighbor and ε-neighborhood graphs can produce extraordinarily different influences on unsupervised spectral clustering. Jebara et al. [26] presented a b-matching graph, which is alternative to k-nearest-neighbor graph. The authors in [27], [28] focused on how to combine different graphs in a way that tends to give heavier weight to a better graph while authors in [29] presented an in-depth study about the convergence of graph Laplacians on random neighborhood graphs. Obviously, these works provide some insight into the study of graph construction. However, it is still untouched how to construct a more effective graph depending on samples at hand; so, in this paper, we attempt to present an approach of sample-dependent graph construction.

More concretely, the sample-dependent graph construction is motivated by the following examinations: the determination of neighbors of each sample, i.e., construction of edges for a graph plays a key role in the process of graph construction. In traditional to graph constructions, neighbors of a sample are searched by the two main approaches of ε-neighborhood and k-nearest-neighbor [4], [19]. Both approaches need to predefine the same neighbor parameter ε (or k) for all samples. However, it is not easy to construct an effective graph for learning tasks by the two approaches due to the two main reasons:

  • (i)

    The parameter is in advance set to the same value for all samples. Clearly, such setting is not very reasonable because local structures of each sample usually are different. Thus, the so-constructed graph generally fails to effectively capture and fit the intrinsic structures of data.

  • (ii)

    The selection of neighbor parameter is still intractable. It is well known that up to date, how to select parameter is still an open problem. Although there are some approaches to address it, such as cross-validation, they not only are time-consuming and expensive in a high-dimensional space but also waste the limited training samples. Especially when the available training samples are relatively small, such as the face recognition task with only one sample per person [30], existing approaches tend to fail.

To mitigate the harassment of the traditional approaches, in this paper, we propose a sample-dependent approach of graph construction and name the so-constructed graph as Sample-dependent Graph (SG), which is our first contribution. Different from the traditional ε-neighborhood and k-nearest-neighbor graphs, instead of setting in advance the same neighbor parameter for all samples, the SG depends on samples in question to determine the neighbors of each sample and similarities between sample pairs. More specifically, for sample xi, if the similarity between it and sample xj is greater than the mean of similarities between sample xi and all samples, then xj becomes its neighbor. In other words, the Mean of Similarities (MS) for xi defines its own similarity neighborhood, i.e., the hypersphere whose center is xi and radius is MS. So the similarities of the samples within the hypersphere are greater than MS and they become automatically its neighbors. Correspondingly, the entry WijS of adjacency matrix WS is set to the similarity value. Notice that WjiS is not set to the same value because the defined similarity function (see its definition in subsection 3.1) varies from sample to sample and from data to data.

Compared with the traditional ε-neighborhood and k-nearest-neighbor graphs, the SG has some favorable and attractive properties:

  • (i)

    The SG depends on samples in question to determine the neighbors of each sample and similarities between sample pairs. Such strategy avoids the stiff criterion as used in the two traditional graphs, i.e., the same neighbor parameter are predefined for all the samples.

  • (ii)

    Although the SG is not yet parameter-free due to the similarity parameter, it reduces the number of parameters compared to the traditional graphs and thus is convenient and simple to use in learning algorithms. Undoubtedly, this can reduce the complexity in real applications.

  • (iii)

    From the construction definition of the SG, it can be demonstrated (see Proposition 1 in Section 3 and the toy experiment in Section 4.1) that its adjacency matrix is generally asymmetric even if xi and xj are neighbors of each other, which is generally more reasonable and intuitive for effectively capturing and fitting the intrinsic structures of data.

  • (iv)

    In the SG, generally, a sample in high-density area has more neighbors; on the contrary, a sample in low-density area has relatively less neighbors (this characteristic is simply shown in the toy experiment in Section 4.1), which reflects certain statistics property.

  • (v)

    There are some similarities between the SG and the two traditional graphs, but the SG is not their special case (see (B) in Section 3.2).

  • (vi)

    The SG is very general. It can potentially be adapted to many learning algorithms based on graph Laplacian.

As a construction strategy of graph, the proposed approach can potentially be applied to many learning algorithms based on graph Laplacian; however, in this paper, we just concern on an application of SG to DR to demonstrate the effectiveness of our strategy mainly due to its attractive properties and popularity.1 In fact, many DR algorithms based on graphs [1], [2], [4], [10], [20], [22], [23], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44] have recently been proposed, typically like the locality preserving projection (LPP) [1], neighborhood preserving embedding (NPE) [33], Bayesian tensor analysis (BTA) [38], [39], local coordinates alignment (LCA) [40], etc. Their implementations share a common routine, i.e., first construct an affinity graph and then embed it into an objective function to optimize the function for preserving some (locally geometrical or structural) properties of original high-dimensional space; in such a way, it get a projection matrix for dimensionality reduction. We follow the routine as well. Specifically, we incorporate the SG into the state-of-the-art off-the-shelf unsupervised LPP to develop the sample-dependent LPP (SLPP), which is our second contribution.

SLPP also has some attractive properties:

  • (i)

    SLPP naturally inherits all merits of the SG, as mentioned in (i)–(vi) above.

  • (ii)

    It is worth to point out that though the adjacency matrix of SG is generally asymmetric, the Laplacian matrix induced from it is still symmetric (see Proposition 2 in Section 3).

  • (iii)

    SLPP shares some good characteristics of LPP, such as avoidance of the “out-of-sample” problem [21].

  • (iv)

    The SG can be directly incorporated into the off-the-shelf LPP, hence does not bring much more computational complexity compared to LPP.

The rest of this paper is organized as follows: Section 2 briefly reviews the traditional approaches of graph construction, and takes LPP as an example to introduce unsupervised DR algorithms based on graph embedding. Section 3 firstly presents the novel SG, demonstrates its property and then develops the unsupervised SLPP algorithm. Section 4 performs the experiments to evaluate the effectiveness and feasibility of the SG and SLPP. Section 5 gives the conclusion remarks and discusses the future work.

Section snippets

Traditional graph construction

Given a set of n samples X={x1,…, xn}, xiRD, a weighted graph G={V, E, W} can be constructed, where V corresponds to the sample points in set X, E denotes the edge set between the sample pairs and entry Wij of the adjacency matrix and W denotes similarity between xi and xj. The construction process of graph G can typically be decomposed into two steps: (1) construction of edges and (2) calculation of weight value for each edge. Step (1) mainly includes the two categories of ε-neighborhood and k

The construction of sample-dependent graph (SG) and sample-dependent LPP (SLPP)

As discussed above, the limitations of ε-neighborhood and k-nearest-neighbor preclude the so-constructed graph from effectively serving subsequent learning tasks. To mitigate them to a certain extent, in this section, we firstly present a sample-dependent approach of graph construction, and then incorporate the sample-dependent graph into LPP to develop the sample-dependent LPP (SLPP).

Experiments

To show the properties of SG, in this section, firstly, we design a synthetic dataset and show similarities and differences of SG between ε-neighborhood and k-nearest-neighbor graphs on the dataset. To show parameters sensitivity of the three graphs, we select Wine dataset from UCI [47] to compare SLPP and LPP based on two traditional graphs, and further show visualization effect of several competitive algorithms on Wine dataset. To show effectiveness and feasibility of both SG and SLPP

Conclusions and future works

In this paper, we present an approach of sample-dependent graph construction for learning algorithms based on graph Laplacian and name the so-constructed graph as sample-dependent graph (SG). The SG depends on samples in question to determine neighbors of each sample and similarities between sample pairs; thus it reduces the number of parameters compared to the traditional graphs and leads to asymmetrical adjacency matrix, which often can more effectively capture and fit intrinsic structures of

Acknowledgements

This work is partly supported by National Natural Science Foundation of China under Grant no. 60973097 and the Higher Education University Doctor foundation under Grant no. 200802870003. Thank to Xiaofei He and Deng Cai for providing the codes of LPP and NPE on their homepages.

Bo Yang received the B.Sc. degree in Computer Applications from Jilin Normal University in 2002. She received the M.Sc. degree in Computer Applications at Anhui University of Technology in 2007. She is currently a Ph.D. candidate in Department of Computer Science and Engineering, Nanjing University of Aeronautics & Astronautics (NUAA). Her research interests focus on computer vision, dimensionality reduction, pattern recognition, machine learning and their applications.

References (53)

  • Y. Fu et al.

    Locally adaptive subspace and similarity metric learning for visual data clustering and retrieval

    Comput. Vis. Image Understand.

    (2008)
  • X. Tan et al.

    Face recognition from a single image per person: A survey

    Pattern Recog.

    (2006)
  • X. He, P. Niyogi, Locality preserving projections, in: Advances in Neural Information Processing Systems,...
  • S. Yan et al.

    Graph embedding and extension: a general framework for dimensionality reduction

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • X. He et al.

    Face recognition using laplacian faces

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • M. Belkin et al.

    Laplacian eigenmaps for dimensionality reduction and data representation

    Neural Comput.

    (2003)
  • J. Tenenbaum et al.

    A global geometric framework for nonlinear dimensionality reduction

    Science

    (2000)
  • S. Roweis et al.

    Nonlinear dimensionality reduction by locally linear embedding

    Science

    (2000)
  • D. Cai, X. He, K. Zhou, J. Han, and H. Bao, Locality sensitive discriminant analysis, in: International Joint...
  • J. Yang et al.

    Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2007)
  • W. Deng et al.

    Comments on globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2008)
  • Y. Fu, T.S. Huang, Locally linear embedded eigenspace analysis, IFP-TR, UIUC,...
  • D. Cai, X. He, J. Han, Semi-supervised discriminant analysis, in: IEEE International Conference on Computer Vision, Rio...
  • X. Yang, H. Fu, H. Zha, J.L. Barlow, Semi-supervised nonlinear dimensionality reduction, in: International Conference...
  • X. Zhu, Semi-supervised learning literature survey, Computer Sciences Technical Report 1530, University of...
  • D. Zhou, O. Bousquet, T.N. Lal, J..Weston, B. Schölkopf, Learning with local and global consistency, in: Advances in...
  • W. Tong, R. Jin, Semi-supervised learning by mixed label propagation, in: The Twenty-Second AAAI Conference on...
  • Y. Zhang, Z. Zhou, Non-metric label propagation, in: International Joint Conference on Artificial Intelligence,...
  • F. Wang, C. Zhang, Label propagation through linear neighborhoods, in: International Conference on Machine Learning,...
  • X. Zhu, Z. Ghahramani, J. Lafferty, Semi-supervised learning using Gaussian fields and harmonic functions, in:...
  • M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in: Advances in Neural...
  • D. Zhou, C. Burges, Spectral clustering and transductive learning with multiple views, in: International Conference on...
  • Y. Bengio, J. Paiement, P. Vincent, Out-of-sample extensions for LLE, Isomap, MDS, eigenmaps, and spectral clustering,...
  • Y. Fu, T.S. Huang, Unsupervised locally embedded clustering for automatic high-dimensional data labeling, in: IEEE...
  • C. Schellewald, C. Schnorr, Probabilistic subgraph matching approach based on convex relaxation, in: Energy...
  • M. Maier, U. Luxburg, Influence of graph construction on graph-based clustering measures, in: Advances in The Neural...
  • Cited by (56)

    • Heterogeneous domain adaptation network based on autoencoder

      2018, Journal of Parallel and Distributed Computing
    • Locality constrained Graph Optimization for Dimensionality Reduction

      2017, Neurocomputing
      Citation Excerpt :

      Among the methods to be compared, NPE [16], LPP [17] NMF [40] and GNMF [41] are classical unsupervised linear or graph based dimensionality reduction algorithms which construct the graph using k nearest neighbor or ɛ ball scheme. SGLPP [20] utilizes the sample dependent strategy for graph construction. SPP [21] is a typical L1 graph based dimensionality reduction algorithm.

    View all citing articles on Scopus

    Bo Yang received the B.Sc. degree in Computer Applications from Jilin Normal University in 2002. She received the M.Sc. degree in Computer Applications at Anhui University of Technology in 2007. She is currently a Ph.D. candidate in Department of Computer Science and Engineering, Nanjing University of Aeronautics & Astronautics (NUAA). Her research interests focus on computer vision, dimensionality reduction, pattern recognition, machine learning and their applications.

    Songcan Chen received the B.Sc. degree in Mathematics from Hangzhou University (now merged into Zhejiang University) in 1983. In December 1985, he completed the M.Sc. degree in Computer Applications at Shanghai Jiaotong University and then worked at Nanjing university of Aeronautics & Astronautics (NUAA) in January 1986 as an Assistant Lecturer. There he received a Ph.D. degree in Communication and Information Systems in 1997. Since 1998, as a full Professor, he has been with the Department of Computer Science and Engineering at NUAA. His research interests include pattern recognition, machine learning and neural computing. In these fields, he has authored or coauthored over 130 scientific journal papers.

    View full text