Global and intrinsic geometric structure embedding for unsupervised feature selection
Introduction
High dimension data is commonly encountered in many applicable fields, such as data mining (Agrawal et al., 1999), pattern recognition (Yu et al., 2001) and biomedical science (Clarke et al., 2008). Such kinds of data increase storage space, which is in need of well-performed hardware, and also introduce noise and redundancy. So dimensionality reduction becomes an urgent problem. Dimension reduction methods are of two main categories, feature selection and subspace learning. Feature selection is to select the most representative features from the original feature space under some certain criteria, and the collection of selected features is a subset of the original features. Different methods of subspace learning, however, aim to learn a transformation, which maps the original high dimensional feature space into a lower dimensional subspace, and thus new features are generated. A classical subspace learning method is Principal Component Analysis (PCA (Jiang et al., 2014)), which retains the variance of the data in the maximum extent to get the low dimensional representation of the data from a global perspective. Local structure of the data also contains important discriminative information (Bottou et al., 1992), therefore some dimensionality reduction algorithms by different methods of preserving local structure are proposed, such as Locality Preserving Projection (LPP (He & Niyogi, 2005)) and Local Linear Embedding (LLE (Roweis et al., 2001)).The core idea of these local structure preserving methods is embedding the neighborhood relationship, that is learned from the original data, into the lower dimensional subspace in different ways. Due to the high efficiency of local structure preservation of data, these methods are widely used in many feature selection methods (Cai et al., 2010, Zhou et al., 2016). Laplacian Score (He et al., 2005) selects features by evaluating features based on LPP, Li et al. (2008) propose discriminative locally linear embedding based on LLE. Wang et al.(2016) propose neighborhood embedding feature selection (NEFS), which learns the sparse representation by considering the nearest neighbors of each sample as a dictionary and then embeds the representation into the model of feature selection. For such graph-based locality preservation methods, there are still some challenges: (1) Using KNN to construct adjacency graph is not efficient enough to get discriminative information (Zhu, 2008). (2) The parameters of the neighborhood and heat kernel width are hard to set. (3) The eigen decomposition of dense matrix is time-consuming and in need of large storage. To address such challenges, Qiao et al. (2010) propose sparsity preserving projection (SPP) method, which aims to preserve the structure information by learning sparse reconstruction relationship between the original data, thus the intrinsic geometric structure of the original data can be reflected, containing more natural discriminating information. And many SPP-based methods have been proposed (Lu et al., 2013, Wang et al., 2016). However, these SPP-based methods are not robust to the noise because the latent global structure of data is neglected. Low-rank representation (LRR (Liu et al., 2010)) is better at capturing the global structure of data by seeking the lowest rank representation among all the candidates and represents the data samples as linear combinations of the bases in a dictionary. Du et al. (2016) propose low rank sparse preserve projection for face recognition, which seeks the projective matrix by preserving both the global structure and locally linear structure of the data after constructing a low rank and sparse graph. As for the sparse regularization of projection matrix, being implemented by l1-norm (lasso (Tibshirani, 2011)), although it is convenient to compute, it is not effective in selecting sufficient sparse features. Some researchers have extended it to lp-norm (0 < p < 1) (Foucart et al., 2009), and Xu et al. (2012) demonstrate that when p is 1/2, the performance of feature selection is the best. Due to the neglection of the correlationship between features, Nie et al. (2010) propose joint l2,1-norm for feature selection and has been widely applied in many feature reduction methods (Zhou et al., 2016, Zhu et al., 2016). As lp-norm can select sparser features than l1-norm, Wang et al. (2013) propose l2,p-matrix norm (0 < p < 1) and empirically point out that when p equals 1/2, the regularization selects the sparsest and more robust features. However, less works based on l2,p-matrix norm are proposed.
In the above research, we note that LRSPP owns better performance in structure learning. However, it lacks the measurement of information between the original data space and the learned subspace which is spanned by the selected features. Beside this, the using ofl2,1-norm fails to select sufficient sparse and discriminative features. To jointly address these two problems, we incorporate the loss of information, the embedding of low rank and sparse graph and l2,1/2-matrix norm into a joint framework, named GGEFS, for dimensionality reduction. Now we state several characteristics of our algorithm as follow:
- 1.
This approach considers both the information discrepancy between the original feature space and the lower dimensional subspace, which efficiently reduces the loss of information, and the structure preserving term is based on low rank sparse graph, which acquires adequate discriminative information and avoids problems of parameters selection.
- 2.
We use l2,1/2-matrix norm on the projection matrix, thus select sparser and discriminative features (Wang et al., 2013).
- 3.
Lagrange Multiplier method is adopted to solve the optimization problem. Algorithm and convergence analysis in this paper are presented in Section 3.
The reminder of this paper is organized as follow. In Section 2, we present a generic select model and some background knowledge. In the following sections, our feature selection method is described in detail as well as the corresponding solution. Experimental results are reported in Section 4. And finally, we present our conclusion and the perspective of this work.
Section snippets
Related work
In this section, we briefly review the related research about our method, first we give a generic framework feature selection model, and low rank sparse representation is subsequently described.
Proposed method
In this section, we introduce GGEFS, which consist of the measurement of information discrepancy between the original data space and the lower dimensional subspace, structure preservation and sparse regularization of projection matrix, where the structure preserving term is shaped by embedding the weight matrix containing structure information of the data into the lower dimensional subspace. As follows, we describe our model of dimensionality reduction and the corresponding algorithm.
Experiments
In this section, we compare the clustering performance of our method with four state-of-the-art methods of dimensionality reduction on six benchmark datasets. Description of selected datasets, experimental setup and performance analysis are presented in this section.
Conclusion
In this paper, we propose a novel unsupervised feature selection method, GGEFS. This method learns the low rank sparse representation of samples as the structure information and embeds them into lower dimensional space. Such strategy not only captures the global structure and the intrinsic geometric structure of the data, but also avoids the selection of parameters. Apart from this, the use of l2,1/2-matrix norm on the projection matrix ensures that the features selected by our model are more
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this work.
References (32)
- et al.
Sparsest solutions of underdetermined linear systems via ℓq-minimization for 0<q≤1
Applied & Computational Harmonic Analysis
(2009) - et al.
Plant-wide process monitoring based on mutual information–multiblock principal component analysis
Isa Transactions
(2014) - et al.
Face recognition via weighted sparse representation
Journal of Visual Communication & Image Representation
(2013) - et al.
Sparsity preserving projections with applications to face recognition
Pattern Recognition
(2010) - et al.
Sparse graph embedding unsupervised feature selection
IEEE Transactions on Systems, Man, and Cybernetics: Systems
(2016) - et al.
A non-negative sparse semi-supervised dimensionality reduction algorithm for hyperspectral data
Neurocomputing
(2016) - et al.
A direct LDA algorithm for high-dimensional data — with application to face recognition
Pattern Recognition
(2001) - et al.
Global and local structure preserving sparse subspace learning
Pattern Recognition
(2016) - et al.
Automatic subspace clustering of high dimensional data for data mining applications
- et al.
Local learning algorithms
Neural Computation
(1992)
The properties of high-dimensional data spaces: Implications for exploring gene and protein expression data
Nature Reviews Cancer
Graph regularized nonnegative matrix factorization for data representation
IEEE Transactions on Pattern Analysis & Machine Intelligence
Unsupervised feature selection for multi-cluster data
Low rank sparse preserve projection for face recognition
Convex and semi-nonnegative matrix factorizations
IEEE Transactions on Software Engineering
Entropy and information theory
Cited by (22)
Graph regularized virtual label regression for unsupervised feature selection
2022, Digital Signal Processing: A Review JournalSubspace learning for unsupervised feature selection via adaptive structure learning and rank approximation
2020, NeurocomputingCitation Excerpt :As noise and redundant features are inevitable, the efficiency of data processing is reduced. And only a small number of features are discriminative and important, so it is necessary to reduce the dimension of data [3,4]. The commonly used dimensionality reduction techniques include feature extraction and feature selection [5,6].
Unsupervised feature selection via adaptive hypergraph regularized latent representation learning
2020, NeurocomputingCitation Excerpt :In the past decades, a variety of feature selection methods have been proposed based on different priori knowledge of data. Considering the availability of data labels, previous feature selection methods can be generally classified into three classes: supervised [15], semi-supervised [16] and unsupervised methods [17–29]. As to supervised feature selection methods, labels of training samples are known in advance [30,31], these methods aim to select discriminative features by distinguishing samples from different classes.
A comparative study on network alignment techniques
2020, Expert Systems with ApplicationsSparse and low-redundant subspace learning-based dual-graph regularized robust feature selection
2020, Knowledge-Based SystemsCitation Excerpt :With the rapid development of information technology, high-dimensional data have emerged in the fields of computer vision, pattern recognition and machine learning [1,2].