Global and intrinsic geometric structure embedding for unsupervised feature selection

doi:10.1016/j.eswa.2017.10.008

Expert Systems with Applications

Volume 93, 1 March 2018, Pages 134-142

https://doi.org/10.1016/j.eswa.2017.10.008 Get rights and content

Highlights

•
Considering both the loss of information and structure embedding.
•
Global and intrinsic geometric structure is preserved by low-rank-sparse graph embedding.
•
l_2,1/2-Norm on projection matrix help to select more sparse and discriminative features.
•
Experiments prove that our GGEFS outperforms other methods.

Abstract

Dimensionality reduction becomes a significant problem due to the proliferation of high dimensional data. Sparse preserving projection (SPP) obtains the intrinsic geometric structure of the data, which contains natural discriminating information, and avoids the selection of parameters as well. However, SPP neglects the global structures since it computes the sparse representation of each data individually. Low rank representation (LRR), another commonly used dimensionality reduction method, finds the lowest rank representation of all data jointly, and is capable of capturing the global structures of data. Therefore in this paper, we propose a method, global and intrinsic geometric structure embedding for unsupervised feature selection (GGEFS), by constructing a low-rank-sparse graph. Our GGEFS method contains the loss of information, the preservation of structural information and the sparse regularization of projection matrix, on which we impose l_2,1/2-matrix norm to select sparser and discriminative features. An effective iterative algorithm based on Lagrange Multiplier method is described to solve GGEFS. Extensive experimental results demonstrate that the proposed algorithm outperform several state-of-the-art unsupervised feature selection methods.

Introduction

High dimension data is commonly encountered in many applicable fields, such as data mining (Agrawal et al., 1999), pattern recognition (Yu et al., 2001) and biomedical science (Clarke et al., 2008). Such kinds of data increase storage space, which is in need of well-performed hardware, and also introduce noise and redundancy. So dimensionality reduction becomes an urgent problem. Dimension reduction methods are of two main categories, feature selection and subspace learning. Feature selection is to select the most representative features from the original feature space under some certain criteria, and the collection of selected features is a subset of the original features. Different methods of subspace learning, however, aim to learn a transformation, which maps the original high dimensional feature space into a lower dimensional subspace, and thus new features are generated. A classical subspace learning method is Principal Component Analysis (PCA (Jiang et al., 2014)), which retains the variance of the data in the maximum extent to get the low dimensional representation of the data from a global perspective. Local structure of the data also contains important discriminative information (Bottou et al., 1992), therefore some dimensionality reduction algorithms by different methods of preserving local structure are proposed, such as Locality Preserving Projection (LPP (He & Niyogi, 2005)) and Local Linear Embedding (LLE (Roweis et al., 2001)).The core idea of these local structure preserving methods is embedding the neighborhood relationship, that is learned from the original data, into the lower dimensional subspace in different ways. Due to the high efficiency of local structure preservation of data, these methods are widely used in many feature selection methods (Cai et al., 2010, Zhou et al., 2016). Laplacian Score (He et al., 2005) selects features by evaluating features based on LPP, Li et al. (2008) propose discriminative locally linear embedding based on LLE. Wang et al.(2016) propose neighborhood embedding feature selection (NEFS), which learns the sparse representation by considering the nearest neighbors of each sample as a dictionary and then embeds the representation into the model of feature selection. For such graph-based locality preservation methods, there are still some challenges: (1) Using KNN to construct adjacency graph is not efficient enough to get discriminative information (Zhu, 2008). (2) The parameters of the neighborhood and heat kernel width are hard to set. (3) The eigen decomposition of dense matrix is time-consuming and in need of large storage. To address such challenges, Qiao et al. (2010) propose sparsity preserving projection (SPP) method, which aims to preserve the structure information by learning sparse reconstruction relationship between the original data, thus the intrinsic geometric structure of the original data can be reflected, containing more natural discriminating information. And many SPP-based methods have been proposed (Lu et al., 2013, Wang et al., 2016). However, these SPP-based methods are not robust to the noise because the latent global structure of data is neglected. Low-rank representation (LRR (Liu et al., 2010)) is better at capturing the global structure of data by seeking the lowest rank representation among all the candidates and represents the data samples as linear combinations of the bases in a dictionary. Du et al. (2016) propose low rank sparse preserve projection for face recognition, which seeks the projective matrix by preserving both the global structure and locally linear structure of the data after constructing a low rank and sparse graph. As for the sparse regularization of projection matrix, being implemented by l₁-norm (lasso (Tibshirani, 2011)), although it is convenient to compute, it is not effective in selecting sufficient sparse features. Some researchers have extended it to l_p-norm (0 < p < 1) (Foucart et al., 2009), and Xu et al. (2012) demonstrate that when p is 1/2, the performance of feature selection is the best. Due to the neglection of the correlationship between features, Nie et al. (2010) propose joint l_2,1-norm for feature selection and has been widely applied in many feature reduction methods (Zhou et al., 2016, Zhu et al., 2016). As l_p-norm can select sparser features than l₁-norm, Wang et al. (2013) propose l_2,p-matrix norm (0 < p < 1) and empirically point out that when p equals 1/2, the regularization selects the sparsest and more robust features. However, less works based on l_2,p-matrix norm are proposed.

In the above research, we note that LRSPP owns better performance in structure learning. However, it lacks the measurement of information between the original data space and the learned subspace which is spanned by the selected features. Beside this, the using ofl_2,1-norm fails to select sufficient sparse and discriminative features. To jointly address these two problems, we incorporate the loss of information, the embedding of low rank and sparse graph and l_2,1/2-matrix norm into a joint framework, named GGEFS, for dimensionality reduction. Now we state several characteristics of our algorithm as follow:

1.
This approach considers both the information discrepancy between the original feature space and the lower dimensional subspace, which efficiently reduces the loss of information, and the structure preserving term is based on low rank sparse graph, which acquires adequate discriminative information and avoids problems of parameters selection.
2.
We use l_2,1/2-matrix norm on the projection matrix, thus select sparser and discriminative features (Wang et al., 2013).
3.
Lagrange Multiplier method is adopted to solve the optimization problem. Algorithm and convergence analysis in this paper are presented in Section 3.

The reminder of this paper is organized as follow. In Section 2, we present a generic select model and some background knowledge. In the following sections, our feature selection method is described in detail as well as the corresponding solution. Experimental results are reported in Section 4. And finally, we present our conclusion and the perspective of this work.

Section snippets

Related work

In this section, we briefly review the related research about our method, first we give a generic framework feature selection model, and low rank sparse representation is subsequently described.

Proposed method

In this section, we introduce GGEFS, which consist of the measurement of information discrepancy between the original data space and the lower dimensional subspace, structure preservation and sparse regularization of projection matrix, where the structure preserving term is shaped by embedding the weight matrix containing structure information of the data into the lower dimensional subspace. As follows, we describe our model of dimensionality reduction and the corresponding algorithm.

Experiments

In this section, we compare the clustering performance of our method with four state-of-the-art methods of dimensionality reduction on six benchmark datasets. Description of selected datasets, experimental setup and performance analysis are presented in this section.

Conclusion

In this paper, we propose a novel unsupervised feature selection method, GGEFS. This method learns the low rank sparse representation of samples as the structure information and embeds them into lower dimensional space. Such strategy not only captures the global structure and the intrinsic geometric structure of the data, but also avoids the selection of parameters. Apart from this, the use of l_2,1/2-matrix norm on the projection matrix ensures that the features selected by our model are more

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this work.

References (32)

S. Foucart et al.
Sparsest solutions of underdetermined linear systems via ℓ_q-minimization for 0<q≤1
Applied & Computational Harmonic Analysis
(2009)
Q. Jiang et al.
Plant-wide process monitoring based on mutual information–multiblock principal component analysis
Isa Transactions
(2014)
C.Y. Lu et al.
Face recognition via weighted sparse representation
Journal of Visual Communication & Image Representation
(2013)
L. Qiao et al.
Sparsity preserving projections with applications to face recognition
Pattern Recognition
(2010)
S.P. Wang et al.
Sparse graph embedding unsupervised feature selection
IEEE Transactions on Systems, Man, and Cybernetics: Systems
(2016)
X. Wang et al.
A non-negative sparse semi-supervised dimensionality reduction algorithm for hyperspectral data
Neurocomputing
(2016)
H. Yu et al.
A direct LDA algorithm for high-dimensional data — with application to face recognition
Pattern Recognition
(2001)
N. Zhou et al.
Global and local structure preserving sparse subspace learning
Pattern Recognition
(2016)
R. Agrawal et al.
Automatic subspace clustering of high dimensional data for data mining applications
L. Bottou et al.
Local learning algorithms
Neural Computation
(1992)

R. Clarke et al.

The properties of high-dimensional data spaces: Implications for exploring gene and protein expression data

Nature Reviews Cancer

(2008)

D. Cai et al.

Graph regularized nonnegative matrix factorization for data representation

IEEE Transactions on Pattern Analysis & Machine Intelligence

(2010)

D. Cai et al.

Unsupervised feature selection for multi-cluster data

S. Du et al.

Low rank sparse preserve projection for face recognition

C. Ding et al.

Convex and semi-nonnegative matrix factorizations

IEEE Transactions on Software Engineering

(2010)

R.M. Gray

Entropy and information theory

(2011)

Cited by (22)

Sparse and low-dimensional representation with maximum entropy adaptive graph for feature selection
2022, Neurocomputing
Traditional feature selection algorithms usually explore the relationship between data and cluster structure in a single space, so the internal relationship obtained is not very rich, and it is not enough to select more valuable features. To solve the above problems, in order to fully mine the intrinsic correlation information in different spaces, so that the relationship between data, features and clustering structure can be explored at the same time, this paper proposes a feature selection method based on sparse and low-dimensional representation with maximum entropy adaptive graph (SLMEA). Firstly, the SLMEA combines the sparse transform representation with pseudo-label matrix learning to optimize, and uses the pseudo-label matrix to guide the learning of sparse low-dimensional space. It can not only explore the relationship between data and pseudo-labels in data space, but also mine the association between features and pseudo-labels in feature space, so as to select more discriminative features. Secondly, based on the maximum entropy theory, the similarity matrix is constructed adaptively, so that the two different manifold structures corresponding to sparse transform representation and pseudo label matrix learning can be adaptively learned and retained in the iterative process. In addition, in order to ensure the sparsity of the transformation matrix, the constraint of $ℓ_{2, 1 / 2}$ -norm is applied to the matrix, which can better deal with the redundant features and obtain more sparse solutions. Finally, the SLMEA uses an alternate iterative update method to optimize the objective function, and carries out extensive experiments on eight mainstream datasets. Compared with seven state-of-the-art algorithms, SLMEA can have higher clustering accuracy and normalized mutual information.
Graph regularized virtual label regression for unsupervised feature selection
2022, Digital Signal Processing: A Review Journal
Feature selection is an important technique to deal with high-dimensional unlabeled data. Over the past decades, various unsupervised feature selection methods have been presented. However, some of these methods fail to consider the discriminative ability due to lack of label information or ignoring the local structure. To address this issue, we propose a novel unsupervised feature selection approach, called graph regularized virtual label regression (GVLR). In GVLR, we learn a virtual label matrix to guide the selection of features, which can ensure the selected features be more discriminative. Besides, we utilize graph Laplacian to preserve the geometric structure of the feature space. To improve the robustness of our model, we further adopt an $ℓ_{2, 1}$ -norm constraint on the feature selection matrix and loss function. Finally, we incorporate the label regression and graph Laplacian into the subspace learning based feature selection framework. Thus, our method can enhance the discriminative ability of selected features, and preserve the local structural information of features simultaneously. Extensive experimental results on several public datasets show that the proposed GVLR method can obtain superior performance in comparison with some state-of-the-art feature selection algorithms.
Subspace learning for unsupervised feature selection via adaptive structure learning and rank approximation
2020, Neurocomputing
Citation Excerpt :
As noise and redundant features are inevitable, the efficiency of data processing is reduced. And only a small number of features are discriminative and important, so it is necessary to reduce the dimension of data [3,4]. The commonly used dimensionality reduction techniques include feature extraction and feature selection [5,6].
Traditional unsupervised feature selection methods usually construct a fixed similarity matrix. This matrix is sensitive to noise and becomes unreliable, which affects the performance of feature selection. The researches have shown that both the global reconstruction information and local structure information are important for feature selection. To solve the above problem effectively and make use of the global and local information of data simultaneously, a novel algorithm is proposed in this paper, called subspace learning for unsupervised feature selection via adaptive structure learning and rank approximation (SLASR). Specifically, SLASR learns the manifold structure adaptively, thus the preserved local geometric structure can be more accurate and more robust to noise. As a result, the learning of the similarity matrix and the low-dimensional embedding is completed in one step, which improves the effect of feature selection. Meanwhile, SLASR adopts the matrix factorization subspace learning framework. By minimizing the reconstruction error of subspace learning residual matrix, the global reconstruction information of data is preserved. Then, to guarantee more accurate manifold structure of the similarity matrix, a rank constraint is used to constrain the Laplacian matrix. Additionally, the l_2,1/2 regularization term is used to constrain the projection matrix to select the most sparse and robust features. Experimental results on twelve benchmark datasets show that SLASR is superior to the six comparison algorithms from the literature.
Unsupervised feature selection via adaptive hypergraph regularized latent representation learning
2020, Neurocomputing
Citation Excerpt :
In the past decades, a variety of feature selection methods have been proposed based on different priori knowledge of data. Considering the availability of data labels, previous feature selection methods can be generally classified into three classes: supervised [15], semi-supervised [16] and unsupervised methods [17–29]. As to supervised feature selection methods, labels of training samples are known in advance [30,31], these methods aim to select discriminative features by distinguishing samples from different classes.
Due to the rapid development of multimedia technology, a large number of unlabelled data with high dimensionality need to be processed. The high dimensionality of data not only increases the computation burden of computer hardware, but also hinders algorithms to obtain optimal performance. Unsupervised feature selection, which is regarded as a means of dimensionality reduction, has been widely recognized as an important and challenging pre-step for many machine learning and data mining tasks. However, we observe that there are at least two issues in previous unsupervised feature selection methods. Firstly, traditional unsupervised feature selection algorithms usually assume that the data instances are identically distributed and there is no dependency between them. However, the data instances are not only associated with high dimensional features but also inherently interconnected with each other. Secondly, the traditional similarity graph used in previous methods can only describe the pair-wise relations of data, but cannot capture the high-order relations, so that the complex structures implied in the data cannot be sufficiently exploited. In this work, we propose a robust unsupervised feature selection method which embeds the latent representation learning into feature selection. Instead of measuring the feature importances in original data space, the feature selection is carried out in the learned latent representation space which is more robust to noises. In order to capture the local manifold geometrical structure of original data in a high-order manner, a hypergraph is adaptively learned and embedded into the resultant model. An efficient alternating algorithm is developed to optimize the problem. Experimental results on eight benchmark data sets demonstrate the effectiveness of the proposed method.
A comparative study on network alignment techniques
2020, Expert Systems with Applications
Network alignment is a method to align nodes that belong to the same entity from different networks. A well-known application of network alignment is to map user accounts from different social networks that belong to the same person. As network alignment has a wide range of applications from recommendation to link prediction, there are several proposed approaches to aligning nodes from different networks. These techniques, however, have been rarely compared and analyzed under the same setting, rendering a right choice for a particular set of networks very difficult. Addressing this problem, this paper presents a benchmark that offers a comprehensive empirical study on the performance comparison of network alignment methods. Specifically, we integrate several state-of-the-art network alignment techniques in a comparable manner, and measure distinct characteristics of these techniques with various settings. We then provide in-depth analysis of the benchmark results, obtained by using both real data and synthetic data. We believe that the findings from the benchmark will serve as a practical guideline for potential applications.
Sparse and low-redundant subspace learning-based dual-graph regularized robust feature selection
2020, Knowledge-Based Systems
Citation Excerpt :
With the rapid development of information technology, high-dimensional data have emerged in the fields of computer vision, pattern recognition and machine learning [1,2].
Feature selection can reduce the dimension of data and select the representative features. The available researches have shown that the underlying geometric structures of both the data and the feature manifolds are important for feature selection. However, few feature selection methods utilize the two geometric structures simultaneously in subspace learning. To solve this issue, this paper proposes a novel algorithm, called sparse and low-redundant subspace learning-based dual-graph regularized robust feature selection (SLSDR). Based on the framework of subspace learning-based graph regularized feature selection, SLSDR extends it by introducing the data graph. Specifically, both data graph and feature graph are introduced into subspace learning, so SLSDR preserves the geometric structures of the data and feature manifolds, simultaneously. Consequently, the features which best preserve the manifold structures are selected. Additionally, the inner product regularization term, which guarantees the sparsity of rows and considers the correlations between features, is imposed on the feature selection matrix to select the representative and low-redundant features. Meanwhile, the $l_{2,1}$ -norm is imposed on the residual matrix of subspace learning to ensure the robustness to outlier samples. Experimental results on twelve benchmark datasets show that the proposed SLSDR is superior to the six state-of-the-art algorithms from the literature.

View all citing articles on Scopus

View full text

Global and intrinsic geometric structure embedding for unsupervised feature selection

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed method

Experiments

Conclusion

Conflict of interest

Applied & Computational Harmonic Analysis

Isa Transactions

Journal of Visual Communication & Image Representation

Pattern Recognition

IEEE Transactions on Systems, Man, and Cybernetics: Systems

Neurocomputing

Pattern Recognition

Pattern Recognition

Automatic subspace clustering of high dimensional data for data mining applications

Local learning algorithms

Neural Computation

The properties of high-dimensional data spaces: Implications for exploring gene and protein expression data

Nature Reviews Cancer

Graph regularized nonnegative matrix factorization for data representation

IEEE Transactions on Pattern Analysis & Machine Intelligence

Unsupervised feature selection for multi-cluster data

Low rank sparse preserve projection for face recognition

Convex and semi-nonnegative matrix factorizations

IEEE Transactions on Software Engineering

Entropy and information theory