Sample-dependent graph construction with application to dimensionality reduction
Introduction
Graph Laplacian based learning has attracted growing interest; thus many corresponding algorithms have recently been presented, which widely involve dimensionality reduction (DR) [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], semi-supervised learning [11], [12], [13], [14], [15], label propagation [16], [17], [18], clustering [19], [20], [21], [22], [23], image match [18], [24], etc. For these algorithms, the graph construction has become a heart. As claimed in [13], constructing a ‘good’ graph is even more important than choosing a ‘good’ algorithm. So, it is undoubtedly necessary and valuable to study graph construction and propose new construction approaches. In fact, some researchers have recently dedicated to study the graph construction [25], [26], [27], [28], [29]. Maier and Luxburg [25] showed that k-nearest-neighbor and ε-neighborhood graphs can produce extraordinarily different influences on unsupervised spectral clustering. Jebara et al. [26] presented a b-matching graph, which is alternative to k-nearest-neighbor graph. The authors in [27], [28] focused on how to combine different graphs in a way that tends to give heavier weight to a better graph while authors in [29] presented an in-depth study about the convergence of graph Laplacians on random neighborhood graphs. Obviously, these works provide some insight into the study of graph construction. However, it is still untouched how to construct a more effective graph depending on samples at hand; so, in this paper, we attempt to present an approach of sample-dependent graph construction.
More concretely, the sample-dependent graph construction is motivated by the following examinations: the determination of neighbors of each sample, i.e., construction of edges for a graph plays a key role in the process of graph construction. In traditional to graph constructions, neighbors of a sample are searched by the two main approaches of ε-neighborhood and k-nearest-neighbor [4], [19]. Both approaches need to predefine the same neighbor parameter ε (or k) for all samples. However, it is not easy to construct an effective graph for learning tasks by the two approaches due to the two main reasons:
- (i)
The parameter is in advance set to the same value for all samples. Clearly, such setting is not very reasonable because local structures of each sample usually are different. Thus, the so-constructed graph generally fails to effectively capture and fit the intrinsic structures of data.
- (ii)
The selection of neighbor parameter is still intractable. It is well known that up to date, how to select parameter is still an open problem. Although there are some approaches to address it, such as cross-validation, they not only are time-consuming and expensive in a high-dimensional space but also waste the limited training samples. Especially when the available training samples are relatively small, such as the face recognition task with only one sample per person [30], existing approaches tend to fail.
To mitigate the harassment of the traditional approaches, in this paper, we propose a sample-dependent approach of graph construction and name the so-constructed graph as Sample-dependent Graph (SG), which is our first contribution. Different from the traditional ε-neighborhood and k-nearest-neighbor graphs, instead of setting in advance the same neighbor parameter for all samples, the SG depends on samples in question to determine the neighbors of each sample and similarities between sample pairs. More specifically, for sample xi, if the similarity between it and sample xj is greater than the mean of similarities between sample xi and all samples, then xj becomes its neighbor. In other words, the Mean of Similarities (MS) for xi defines its own similarity neighborhood, i.e., the hypersphere whose center is xi and radius is MS. So the similarities of the samples within the hypersphere are greater than MS and they become automatically its neighbors. Correspondingly, the entry of adjacency matrix WS is set to the similarity value. Notice that is not set to the same value because the defined similarity function (see its definition in subsection 3.1) varies from sample to sample and from data to data.
Compared with the traditional ε-neighborhood and k-nearest-neighbor graphs, the SG has some favorable and attractive properties:
- (i)
The SG depends on samples in question to determine the neighbors of each sample and similarities between sample pairs. Such strategy avoids the stiff criterion as used in the two traditional graphs, i.e., the same neighbor parameter are predefined for all the samples.
- (ii)
Although the SG is not yet parameter-free due to the similarity parameter, it reduces the number of parameters compared to the traditional graphs and thus is convenient and simple to use in learning algorithms. Undoubtedly, this can reduce the complexity in real applications.
- (iii)
From the construction definition of the SG, it can be demonstrated (see Proposition 1 in Section 3 and the toy experiment in Section 4.1) that its adjacency matrix is generally asymmetric even if xi and xj are neighbors of each other, which is generally more reasonable and intuitive for effectively capturing and fitting the intrinsic structures of data.
- (iv)
In the SG, generally, a sample in high-density area has more neighbors; on the contrary, a sample in low-density area has relatively less neighbors (this characteristic is simply shown in the toy experiment in Section 4.1), which reflects certain statistics property.
- (v)
There are some similarities between the SG and the two traditional graphs, but the SG is not their special case (see (B) in Section 3.2).
- (vi)
The SG is very general. It can potentially be adapted to many learning algorithms based on graph Laplacian.
As a construction strategy of graph, the proposed approach can potentially be applied to many learning algorithms based on graph Laplacian; however, in this paper, we just concern on an application of SG to DR to demonstrate the effectiveness of our strategy mainly due to its attractive properties and popularity.1 In fact, many DR algorithms based on graphs [1], [2], [4], [10], [20], [22], [23], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43], [44] have recently been proposed, typically like the locality preserving projection (LPP) [1], neighborhood preserving embedding (NPE) [33], Bayesian tensor analysis (BTA) [38], [39], local coordinates alignment (LCA) [40], etc. Their implementations share a common routine, i.e., first construct an affinity graph and then embed it into an objective function to optimize the function for preserving some (locally geometrical or structural) properties of original high-dimensional space; in such a way, it get a projection matrix for dimensionality reduction. We follow the routine as well. Specifically, we incorporate the SG into the state-of-the-art off-the-shelf unsupervised LPP to develop the sample-dependent LPP (SLPP), which is our second contribution.
SLPP also has some attractive properties:
- (i)
SLPP naturally inherits all merits of the SG, as mentioned in (i)–(vi) above.
- (ii)
It is worth to point out that though the adjacency matrix of SG is generally asymmetric, the Laplacian matrix induced from it is still symmetric (see Proposition 2 in Section 3).
- (iii)
SLPP shares some good characteristics of LPP, such as avoidance of the “out-of-sample” problem [21].
- (iv)
The SG can be directly incorporated into the off-the-shelf LPP, hence does not bring much more computational complexity compared to LPP.
The rest of this paper is organized as follows: Section 2 briefly reviews the traditional approaches of graph construction, and takes LPP as an example to introduce unsupervised DR algorithms based on graph embedding. Section 3 firstly presents the novel SG, demonstrates its property and then develops the unsupervised SLPP algorithm. Section 4 performs the experiments to evaluate the effectiveness and feasibility of the SG and SLPP. Section 5 gives the conclusion remarks and discusses the future work.
Section snippets
Traditional graph construction
Given a set of n samples X={x1,…, xn}, xi∈RD, a weighted graph G={V, E, W} can be constructed, where V corresponds to the sample points in set X, E denotes the edge set between the sample pairs and entry Wij of the adjacency matrix and W denotes similarity between xi and xj. The construction process of graph G can typically be decomposed into two steps: (1) construction of edges and (2) calculation of weight value for each edge. Step (1) mainly includes the two categories of ε-neighborhood and k
The construction of sample-dependent graph (SG) and sample-dependent LPP (SLPP)
As discussed above, the limitations of ε-neighborhood and k-nearest-neighbor preclude the so-constructed graph from effectively serving subsequent learning tasks. To mitigate them to a certain extent, in this section, we firstly present a sample-dependent approach of graph construction, and then incorporate the sample-dependent graph into LPP to develop the sample-dependent LPP (SLPP).
Experiments
To show the properties of SG, in this section, firstly, we design a synthetic dataset and show similarities and differences of SG between ε-neighborhood and k-nearest-neighbor graphs on the dataset. To show parameters sensitivity of the three graphs, we select Wine dataset from UCI [47] to compare SLPP and LPP based on two traditional graphs, and further show visualization effect of several competitive algorithms on Wine dataset. To show effectiveness and feasibility of both SG and SLPP
Conclusions and future works
In this paper, we present an approach of sample-dependent graph construction for learning algorithms based on graph Laplacian and name the so-constructed graph as sample-dependent graph (SG). The SG depends on samples in question to determine neighbors of each sample and similarities between sample pairs; thus it reduces the number of parameters compared to the traditional graphs and leads to asymmetrical adjacency matrix, which often can more effectively capture and fit intrinsic structures of
Acknowledgements
This work is partly supported by National Natural Science Foundation of China under Grant no. 60973097 and the Higher Education University Doctor foundation under Grant no. 200802870003. Thank to Xiaofei He and Deng Cai for providing the codes of LPP and NPE on their homepages.
Bo Yang received the B.Sc. degree in Computer Applications from Jilin Normal University in 2002. She received the M.Sc. degree in Computer Applications at Anhui University of Technology in 2007. She is currently a Ph.D. candidate in Department of Computer Science and Engineering, Nanjing University of Aeronautics & Astronautics (NUAA). Her research interests focus on computer vision, dimensionality reduction, pattern recognition, machine learning and their applications.
References (53)
- et al.
Locally adaptive subspace and similarity metric learning for visual data clustering and retrieval
Comput. Vis. Image Understand.
(2008) - et al.
Face recognition from a single image per person: A survey
Pattern Recog.
(2006) - X. He, P. Niyogi, Locality preserving projections, in: Advances in Neural Information Processing Systems,...
- et al.
Graph embedding and extension: a general framework for dimensionality reduction
IEEE Trans. Pattern Anal. Mach. Intell.
(2007) - et al.
Face recognition using laplacian faces
IEEE Trans. Pattern Anal. Mach. Intell.
(2005) - et al.
Laplacian eigenmaps for dimensionality reduction and data representation
Neural Comput.
(2003) - et al.
A global geometric framework for nonlinear dimensionality reduction
Science
(2000) - et al.
Nonlinear dimensionality reduction by locally linear embedding
Science
(2000) - D. Cai, X. He, K. Zhou, J. Han, and H. Bao, Locality sensitive discriminant analysis, in: International Joint...
- et al.
Globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics
IEEE Trans. Pattern Anal. Mach. Intell.
(2007)
Comments on globally maximizing, locally minimizing: unsupervised discriminant projection with applications to face and palm biometrics
IEEE Trans. Pattern Anal. Mach. Intell.
Cited by (56)
Joint graph optimization and projection learning for dimensionality reduction
2019, Pattern RecognitionHeterogeneous domain adaptation network based on autoencoder
2018, Journal of Parallel and Distributed ComputingLocality constrained Graph Optimization for Dimensionality Reduction
2017, NeurocomputingCitation Excerpt :Among the methods to be compared, NPE [16], LPP [17] NMF [40] and GNMF [41] are classical unsupervised linear or graph based dimensionality reduction algorithms which construct the graph using k nearest neighbor or ɛ ball scheme. SGLPP [20] utilizes the sample dependent strategy for graph construction. SPP [21] is a typical L1 graph based dimensionality reduction algorithm.
Attribute-based Decision Graphs: A framework for multiclass data classification
2017, Neural Networks
Bo Yang received the B.Sc. degree in Computer Applications from Jilin Normal University in 2002. She received the M.Sc. degree in Computer Applications at Anhui University of Technology in 2007. She is currently a Ph.D. candidate in Department of Computer Science and Engineering, Nanjing University of Aeronautics & Astronautics (NUAA). Her research interests focus on computer vision, dimensionality reduction, pattern recognition, machine learning and their applications.
Songcan Chen received the B.Sc. degree in Mathematics from Hangzhou University (now merged into Zhejiang University) in 1983. In December 1985, he completed the M.Sc. degree in Computer Applications at Shanghai Jiaotong University and then worked at Nanjing university of Aeronautics & Astronautics (NUAA) in January 1986 as an Assistant Lecturer. There he received a Ph.D. degree in Communication and Information Systems in 1997. Since 1998, as a full Professor, he has been with the Department of Computer Science and Engineering at NUAA. His research interests include pattern recognition, machine learning and neural computing. In these fields, he has authored or coauthored over 130 scientific journal papers.