Low rank representation with adaptive distance penalty for semi-supervised subspace classification

doi:10.1016/j.patcog.2017.02.017

Pattern Recognition

Volume 67, July 2017, Pages 252-262

https://doi.org/10.1016/j.patcog.2017.02.017 Get rights and content

Highlights

•
A novel LRR with adaptive distance penalty method is proposed for SSL.
•
LRRADP can better capture both the global structure and local affinity of the data.
•
The projected based LRRADP(LRRADP²) shows impressive robustness.
•
Extensive experiments demonstrate the effectiveness of the proposed method.

Abstract

The graph based Semi-supervised Subspace Learning (SSL) methods treat both labeled and unlabeled data as nodes in a graph, and then instantiate edges among these nodes by weighting the affinity between the corresponding pairs of samples. Constructing a good graph to discover the intrinsic structures of the data is critical for these SSL tasks such as subspace clustering and classification. The Low Rank Representation (LRR) is one of powerful subspace clustering methods, based on which a weighted affinity graph can be constructed. Generally, adjacent samples usually belong to a union of subspace and thereby nearby points in the graph should have large edge weights. Motivated by this, in this paper, we proposed a novel LRR with Adaptive Distance Penalty (LRRADP) to construct a good affinity graph. The graph identified by the LRRADP can not only capture the global subspace structure of the whole data but also effectively preserve the neighbor relationship among samples. Furthermore, by projecting the data set into an appropriate subspace, the LRRADP can be further improved to construct a more discriminative affinity graph. Extensive experiments on different types of baseline datasets are carried out to demonstrate the effectiveness of the proposed methods. The improved method, named as LRRADP², shows impressive performance on real world handwritten and noisy data. The MATLAB codes of the proposed methods will be available at http://www.yongxu.org/lunwen.html.

Introduction

In many big data related applications, the problem of effectively connecting unlabeled data with labeled data is of central importance [1], [2]. For example, in the applications of image based web searching and image based object recognition, the labeled data is usually limited and the unlabeled data are rich and available in internet. In these problems, the target goal is to build the connection between unlabeled data and labeled data and then identify the labels of the unlabeled data. Semi-supervised Subspace Learning (SSL) [3], [4], [5], [6] is a family of techniques that exploits the “manifold structure” of the data by using both labeled and unlabeled samples [7], [8].

Constructing a graph of the local connectivity of data is an effective strategy for SSL due to its success in practice [9], [10], [11]. The graph based SSL methods treat both labeled and unlabeled samples from the data set as nodes in a graph, and then instantiate edges among these nodes which are weighted according to the affinity between the corresponding pairs of samples. Suppose that the data set is noiseless and embeds in independent subspaces, the graph identified by the respective SSL method should be a block-diagonal matrix and each block corresponding to a subspace. Subspace clustering is able to produce exactly correct clustering result based on the block-diagonal matrix. To address this issue, we build a weight graph $G = (V, W)$ , where V is the vertex set denoting nodes of the graph corresponding to N data points and W ∈ R^{N × N} is a symmetric non-negative weight matrix representing the relationship among the nodes. A non-zero weight reflects the affinity between corresponding nodes and a zero weight denotes that there is no edge jointing them. An ideal similarity matrix W, hence an ideal weight graph G, is one in which nodes that correspond to points from the same subspace are connected to each other and there is no edge between any two nodes that correspond to points belonging to different subspaces. Thus, given a data set, the problem of graph construction is to determine the weight matrix W. A perfect similarity graph built by SSL has n independent connected components corresponding to n subspaces and then by applying spectral clustering the labels can be propagated from the labeled samples to unlabeled samples over the graph [12], [13], [14], [15].

The recently Low Rank Representation (LRR) [13], [14] is a promising weight graph construction method. The target of the LRR aims at finding the lowest-rankness representation among all candidates that can express the data vectors as linear combinations of the basis in a proper dictionary. Consider a set of data $X = [x_{1}, x_{2}, \dots, x_{n}]$ in R^d, each column of which is a sample that can be represented by the linear combination of a basis of d vectors. If we form the basis matrix as $A = [a_{1}, a_{2}, \dots, a_{m}]$ , the X can be represented as: $X = A Z,$ where $Z = [z_{1}, z_{2}, \dots, z_{n}]$ is the coefficient matrix with each z_i characterizing how other samples contribute to the representation of x_i. Since the data can only be essentially represented by those data in the same subspace, the nonzero elements in z_i represents that the corresponding samples are in the same subspace. Therefore, minimizing rankness of the data vector space could be an appropriate criterion to cluster data drawn from multiple linear subspaces. That is, LRR discovers the lowest rankness of the representation of the data set as follows. $min_{Z} r a n k (Z), s . t . X = A Z,$ where rank(•) denotes the rankness of a matrix. The low rankness constraint guarantees that the coefficients of samples coming from the same subspace are highly correlated. When the data are clean and exactly from linearly independent subspaces, the similarity matrix built by this way is an ideally n block diagonal matrix corresponding to n subspaces.

LRR is an effective framework for exploring the multiple subspace structures of data. Based on the LRR, lots of recent efforts have been made to exploit ways of constructing a discriminative graph for SSL [16], [17], [18], [19]. Liu et al. [18] proposed a latent low rank representation method for subspace clustering by approximating and using the unobserved data hidden in the observed data to resolve the issue of insufficient sampling. Zhang et al. [19] extended the latent LRR by choosing the sparest solution in the solution set to increase the robustness of the method. Wei et al. [20] proposed a robust shape interaction by preprocessing the data using robust PCA [21] and then applying LRR to build the similarity matrix. By combining the sparsity and global structure, Zhuang et al. [22], [23] proposed a nonnegative low-rank and sparse graph for semi-supervised learning. Fang et al. [24], [25] combined the nonnegative low-rank representation with the semi-supervised clustering learning within one framework achieving acceptable classification performance.

Conventional LRR based methods usually consider much on construction of the global subspace structure. However, a good graph should not only capture global structures of all the data but also reveal the intrinsic neighbor relationship among the data [26]. In this paper, we propose a Low Rank Representation with Adaptive Distance Penalty (LRRADP) method, which constructs the linear combination by using the nearby samples as much as possible via the adaptive distance penalty. The affinity graph built by the LRRADP can better both capture the global subspace structure of a whole data set and preserve local neighbor relationships among the data samples. The similarity graph/matrix identified by the LRRADP can work well with conventional semi-supervised classification method, such as Gaussian Fields and Harmonic Functions (GFHF) [8], for the label prediction of unlabeled samples. Moreover, the LRRADP is improved to LRRADP² by projecting the data set into an appropriate subspace.

The remainder of this paper is organized as follows. Section 2 introduces the related works of the low rank representation and semi-supervised subspace classification methods. Section 3 proposes an LRR with adaptive distance penalty method (LRRADP) for subspace classification. Section 4 extends the LRRADP to LRRADP² by projecting the data into an appropriate subspace. Section 5 presents the experimental results and Section 6 concludes this paper.

Section snippets

Low rank representation

To capture the global structure of data, LRR [13], [14] is to construct the affinities of an undirected graph. A LRR graph obtains the representation of all data under a global low-rank constraint, thus is better at capturing the global data structures. It has been proven that under suitable conditions, LRR can correctly preserve the membership of samples that belong to the same subspace [13]. Given a set of data, the data usually can be represented by other data that lie in the same subspace.

Low rank representation with adaptive distance penalty

Throughout this paper, all the matrices are written as uppercase. For matrix M, the (i, j)th element of M is denoted as [M]_{i, j}. The ith row of M is denoted as [M]_{i, *} and the jth column of M is denoted as [M]_{*, j}. The trace of M is denoted as tr(M). The $l_{p} - n o r m$ of M is denoted as ||M||_p. Specially, the Frobenius norm and nuclear norm of matrix M are denoted as ||M||_F and ||M||_*, respectively. The transpose of M is denoted as M^T. M ≥ 0 mean all elements of M are larger than or equal to zero. I

LRRADP²

In general, the quality of data representation will greatly affect the quality of graph. A good data representation could improve the quality of the graph and then improve the performance of the SSL. Previous works [22] have shown that by projecting the data with a projection matrix, the embedded data will facilitate the subsequent data representation and increase the classification accuracy. To improve the data representation, we propose to learn an appropriate subspace in which the graph

Experiments

In this section, we evaluate the performance of the proposed methods on baseline databases, as well as other state-of-the-art graph construction methods. We combine the graphs identified by the LRRADP, LRRADP² and conventional popular graph construction methods with the GFHF method to perform the semi-supervised classification, and quantitatively evaluate their performance. We test and compare these solvers on six representative data sets, including the COIL20, AR, Extended Yale B, Isolet5,

Conclusions

In this paper, we considered the general problem of learning from labeled and unlabeled samples and classifying the unlabeled samples and proposed a novel low rank representation with adaptive distance penalty, named LRRADP, to learn the affinity graph of the data set. By embedding the adaptive distance penalty into the LRR, the obtained affinity graph can better not only capture the global clustering structure of the whole data but also preserve the local neighbor relationship of those data.

Acknowledgment

This paper is partially supported by the National Natural Science Foundation of China (No. 61370163) and Shenzhen Municipal Science and Technology Innovation Council (No. JCYJ20130329154017293).

Lunke Fei received the B.S. and M.S. degree in computer science and technology from East China Jiaotong University, China, in 2004 and 2007, He is currently pursuing the Ph.D. degree in computer science and technology at Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China. His current research interests include pattern recognition and biometrics.

References (39)

F. Nie et al.
Semi-supervised orthogonal discriminant analysis via label propagation
Pattern Recogn.
(2009)
H. Zhang et al.
Robust latent low rank representation for subspace clustering
Neurocomputing
(2014)
L. He et al.
Iterative ensemble normalized cuts
Pattern Recogn.
(2016)
J. Yan et al.
A general framework for motion segmentation: independent, articulated, rigid, non-rigid, degenerate and nondegenerate
D. Cai et al.
Semi-supervised discriminant analysis
X. Zhu, Semi-supervised learning literature survey, (2005),...
S. Yan et al.
Graph embedding and extensions: a general framework for dimensionality reduction
IEEE Trans. Pattern Anal. Mach. Intell.
(2007)
B. Ni et al.
Learning a propagable graph for semisupervised learning: classification and regression
IEEE Trans. Knowl. Data Eng.
(2012)
V. Sindhwani et al.
Linear manifold regularization for large scale semi-supervised learning
X. Zhu et al.
Semi-supervised learning using Gaussian fields and harmonic functions

Y. Luo et al.

Manifold regularized multitask learning for semi-supervised multilabel image classification

IEEE Trans. Image Process.

(2013)

F. Nie et al.

Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction

IEEE Trans. Image Process.

(2010)

R. He et al.

Nonnegative sparse coding for discriminative semi-supervised learning

J. Wang et al.

Linear neighborhood propagation and its applications

IEEE Trans. Pattern Anal. Mach. Intell.

(2009)

G. Liu et al.

Robust recovery of subspace structures by low-rank representation

IEEE Trans. Pattern Anal. Mach. Intell.

(2013)

G. Liu et al.

Robust subspace segmentation by low-rank representation

J. Shi et al.

Normalized cuts and image segmentation

IEEE Trans. Pattern Anal. Mach. Intell.

(2000)

K. Tang et al.

Structure-constrained low-rank representation

IEEE Trans. Neural Networks Learn. Syst.

(2014)

J. Chen et al.

Robust subspace segmentation via low-rank representation

IEEE Trans. Cybern.

(2014)

Cited by (52)

Learning local graph from multiple kernels
2023, Neurocomputing
Graph is a widely applied technique to characterize the relationship among data. Due to the excellent ability to handle nonlinear data and extract the useful information contained in base kernels, learning the connecting graph based on multiple kernel learning has been extensively discussed. Many existing algorithms construct the connecting graph based on the optimal kernel which is learned from base kernels. Observing these methods, we find they (1) ignore the local structure of data; (2) cannot assure that the optimal kernel is positive semi-definite; (3) cannot fully utilize the information contained in base kernels. Therefore, we introduce a novel local graph based on multiple kernel learning (LGMKL) in this paper. Specifically, LGMKL is constructed based on the optimal kernel which is automatically learned from base kernels with a nonlinear strategy and the information contained in different base kernels is also utilized in LGMKL. Then an iterative scheme with proven convergence is developed to optimize the objective function of LGMKL. Unlike most MKL-based graph learning methods, LGMKL focuses on the local structure of data. Finally, nine benchmark datasets and two synthetic datasets are adopted to test the performance of LGMKL. Extensive experiments demonstrate the superiorities of the proposed method.
Projection-preserving block-diagonal low-rank representation for subspace clustering
2023, Neurocomputing
In this paper, a novel model named projection-preserving block-diagonal low-rank representation (PBDIR) is proposed and can obtain a more distinguishable representation matrix for clustering. PBDIR acquires a more advantageous representation by extracting the essential features. Specifically, we introduce a projection matrix to our model to learn a new feature space that can capture more significant features. Therefore, our model learns a more robust representation, which can reduce noise interference. Meanwhile, we introduce a block-diagonal regularization to ensure that the obtained representation matrix involves a k-block diagonal, where k denotes the number of clusters. This term brings more benefits for clustering tasks. Experimental results on real datasets show that our model can significantly improve the clustering performance and the proposed approach is robust against Gaussian noise, Multiplicative noise, and Salt-and-Pepper noise.
Fast subspace clustering by learning projective block diagonal representation
2023, Pattern Recognition
Citation Excerpt :
Inspired by these findings, we naturally intend to learn an effective projection mapping to quickly predict the block diagonal coding. Subspace clustering models have numerous and diverse real-world applications [34–39]. This section mainly discusses subspace clustering approaches based on block diagonal structure prior, which are the most relevant work to our model.
Block Diagonal Representation (BDR) has attracted massive attention in subspace clustering, yet the high computational cost limits its widespread application. To address this issue, we propose a novel approach called Projective Block Diagonal Representation (PBDR), which rapidly pursues a representation matrix with the block diagonal structure. Firstly, an effective sampling strategy is utilized to select a small subset of the original large-scale data. Then, we learn a projection mapping to match the block diagonal representation matrix on the selected subset. After training, we employ the learned projection mapping to quickly generate the representation matrix with an ideal block diagonal structure for the original large-scale data. Additionally, we further extend the proposed PBDR model ( $i . e .$ , PBDR $_{ℓ_{1}}$ and PBDR $_{*}$ ) by capturing the global or local structure of the data to enhance block diagonal coding capability. This paper also proves the effectiveness of the proposed model theoretically. Especially, this is the first work to directly learn a representation matrix with a block diagonal structure to handle the large-scale subspace clustering problem. Finally, experimental results on publicly available datasets show that our approaches achieve faster and more accurate clustering results compared to the state-of-the-art block diagonal-based subspace clustering approaches, which demonstrates its practical usefulness.
Auto-weighted low-rank representation for clustering
2022, Knowledge-Based Systems
Low-rank representation (LRR) is an effective method to learn the subspace structure embedded in the data. However, most LRR methods make use of different features equally, causing the useless features may degrade the performance of the model. In order to solve this problem, a novel unsupervised low-rank representation model, i.e., Auto-weighted Low-Rank Representation (ALRR), is proposed to construct a more favorable similarity graph (SG) for clustering. In particular, ALRR enhances the discriminability of SG by capturing the multi-subspace structure and extracting the salient features simultaneously. Specifically, an auto-weighted distance penalty is introduced to learn a similarity graph by highlighting the effective features, and meanwhile, overshadowing the disturbed features. Consequently, ALRR obtains a similarity graph that can preserve the intrinsic geometrical structures within the data by enforcing a smaller similarity on two dissimilar samples. Moreover, a block-diagonal regularizer is employed to guarantee that the learned graph contains $c$ diagonal blocks. This can facilitate a more discriminative representation learning for clustering tasks. Extensive experimental results on synthetic and real databases demonstrate the superiority of ALRR over other state-of-the-art methods.
Constructing a prior-dependent graph for data clustering and dimension reduction in the edge of AIoT
2022, Future Generation Computer Systems
The Artificial Intelligence Internet of Things (AIoT) is an emerging concept aiming to perceive, understand and connect the ‘intelligent things’ to make the intercommunication of various networks and systems more efficient. A key step in achieving this goal is to carry out high-precision data analysis at the edge and cloud level. Clustering and dimensionality reduction in AIoT can facilitate efficient data management, storage, computing, and transmission of various data-driven AIoT applications. For high-efficiency data clustering and dimensionality reduction, this paper develops a prior-dependent graph (PDG) construction method to model and discover the complex relations of data. With the proper utilization and incorporation of data priors, i.e., (a) element local sparsity; (b) pair-wise symmetry; (c) multi-instance manifold smoothness; and (d) matrix low-rankness, the obtained graph has the characteristics of local sparsity, symmetry, low-rank, and can well reveal the complex multi-instance proximity among data points. The developed PDG model is then applied for two typical data analysis tasks, i.e., unsupervised data clustering and dimensionality reduction. Experimental results on multiple benchmark databases verify that, compared with some existing graph learning models, the PDG model can achieve substantial performance, which can be deployed in edge computing modules to provide efficient solutions for massive data management and applications in AIoT.
Learnable low-rank latent dictionary for subspace clustering
2021, Pattern Recognition
Recently, Self-Expressive-based Subspace Clustering (SESC) has been widely applied in pattern clustering and machine learning as it aims to learn a representation that can faithfully reflect the correlation between data points. However, most existing SESC methods directly use the original data as the dictionary, which miss the intrinsic structure (e.g., low-rank and nonlinear) of the real-word data. To address this problem, we propose a novel Projection Low-Rank Subspace Clustering (PLRSC) method by integrating feature extraction and subspace clustering into a unified framework. In particular, PLRSC learns a projection transformation to extract the low-dimensional features and utilizes a low-rank regularizer to ensure the informative and important structures of the extracted features. The extracted low-rank features effectively enhance the self-expressive property of the dictionary. Furthermore, we extend PLRSC to a nonlinear version (i.e., NPLRSC) by integrating a nonlinear activator into the projection transformation. NPLRSC cannot only effectively extract features but also guarantee the data structure of the extracted features. The corresponding optimization problem is solved by the Alternating Direction Method (ADM), and we also prove that the algorithm converges to a stationary point. Experimental results on the real-world datasets validate the superior of our model over the existing subspace clustering methods.

View all citing articles on Scopus

Yong Xu received the B.S. and M.S. degrees in 1994 and 1997, respectively, and the Ph.D. degree in pattern recognition and intelligence system from the Nanjing University of Science and Technology, Nanjing, China, in 2005. Currently, he is with the Bio-Computing Research Center, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China. His current research interests include pattern recognition, biometrics, bioinformatics, machine learning, image processing, and video analysis.

Xiaozhao Fang received the M.S. degree in computer science from Guangdong University of Technology in 2008, and the Ph.D. degree in computer science and technology at Shenzhen Graduate School, Harbin Institute of Technology in 2016. He has published more than 15 journal papers. His current research interests include pattern recognition and machine learning.

Jian Yang received the B.S. degree in mathematics from Xuzhou Normal University, in 1995, the M.S. degree in applied mathematics from Chang sha Railway University, in 1998, and the Ph.D. degree from the Nanjing University of Science and Technology (NUST), in 2002, with a focus on pattern recognition and intelligence systems. In 2003, he was a Post-Doctoral Researcher with the University of Zaragoza. He was a Post-Doctoral Fellow with the Biometrics Centre, the Hong Kong Poly technic University, from 2004 to 2006, and the Department of Computer Science, New Jersey Institute of Technology, from 2006 to 2007. He is currently a Professor with the School of Computer Scienceand Technology, NUST. His journal papers have been cited more than 1600 times in the ISI Web of Science, and 2800 times in Google Scholar. He has authored over 80 scientific papers in patternre cognition and computer vision. His research interests include pattern recognition, computer vision, and machine learning. He is currently an Associate Editor of Pattern Recognition Letters and the IEEE Transactions on Neural Networks and Learning Systems, respectively.

View full text

Low rank representation with adaptive distance penalty for semi-supervised subspace classification

Highlights

Abstract

Introduction

Section snippets

Low rank representation

Low rank representation with adaptive distance penalty

LRRADP2

Experiments

Conclusions

Acknowledgment

Pattern Recogn.

Neurocomputing

Pattern Recogn.

A general framework for motion segmentation: independent, articulated, rigid, non-rigid, degenerate and nondegenerate

Semi-supervised discriminant analysis

Graph embedding and extensions: a general framework for dimensionality reduction

IEEE Trans. Pattern Anal. Mach. Intell.

Learning a propagable graph for semisupervised learning: classification and regression

IEEE Trans. Knowl. Data Eng.

Linear manifold regularization for large scale semi-supervised learning

Semi-supervised learning using Gaussian fields and harmonic functions

Manifold regularized multitask learning for semi-supervised multilabel image classification

IEEE Trans. Image Process.

Flexible manifold embedding: a framework for semi-supervised and unsupervised dimension reduction

IEEE Trans. Image Process.

Nonnegative sparse coding for discriminative semi-supervised learning

Linear neighborhood propagation and its applications

IEEE Trans. Pattern Anal. Mach. Intell.

Robust recovery of subspace structures by low-rank representation

IEEE Trans. Pattern Anal. Mach. Intell.

Robust subspace segmentation by low-rank representation

Normalized cuts and image segmentation

IEEE Trans. Pattern Anal. Mach. Intell.

Structure-constrained low-rank representation

IEEE Trans. Neural Networks Learn. Syst.

Robust subspace segmentation via low-rank representation

IEEE Trans. Cybern.

LRRADP²