Semi-supervised locally discriminant projection for classification and recognition

doi:10.1016/j.knosys.2010.11.002

Knowledge-Based Systems

Volume 24, Issue 2, March 2011, Pages 341-346

https://doi.org/10.1016/j.knosys.2010.11.002 Get rights and content

Abstract

Semi-supervised dimensional reduction methods play an important role in pattern recognition, which are likely to be more suitable for plant leaf and palmprint classification, since labeling plant leaf and palmprint often requires expensive human labor, whereas unlabeled plant leaf and palmprint is far easier to obtain at very low cost. In this paper, we attempt to utilize the unlabeled data to aid plant leaf and palmprint classification task with the limited number of the labeled plant leaf or palmprint data, and propose a semi-supervised locally discriminant projection (SSLDP) algorithm for plant leaf and palmprint classification. By making use of both labeled and unlabeled data in learning a transformation for dimensionality reduction, the proposed method can overcome the small-sample-size (SSS) problem under the situation where labeled data are scant. In SSLDP, the labeled data points, combined with the unlabeled data ones, are used to construct the within-class and between-class weight matrices incorporating the neighborhood information of the data set. The experiments on plant leaf and palmprint databases demonstrate that SSLDP is effective and feasible for plant leaf and palmprint classification.

Introduction

Dimensionality reduction is a very important process in the plant leaf and palmprint image data analysis. It is often used as a preprocessing step before further plant leaf and palmprint classification, since it helps to eliminate unimportant and noisy factors, and to avoid the ‘curse-of-dimensionality’. Multiresolution Vector Quantized (MVQ) [1] approximation, along with a distance function, keeps both local and global information about the data. Instead of keeping low-level time series values, it maintains high-level feature information (key subsequences), facilitating the introduction of more meaningful similarity measures. Wang and Megalooikonomou [2] proposed a dimensionality reduction technique for time series analysis that significantly improves the efficiency and accuracy of similarity searches. Zhang et al. [3] used frequent itemsets for document clustering under the consideration of the demand of dimensionality reduction for representation. Recently, manifold learning is becoming the most promising dimensionality reduction algorithm, which plays an important role in many applications such as face, palmprint and gait recognition. Isometric Feature Mapping (Isomap) [4], [5], Locally Linear Embedding (LLE) [6] and Laplacian Eigenmap (LE) [7] are effective methods for data visualization and dimensional reduction, but they are defined only on the training data set, i.e., it is unclear how to evaluate the maps for new test data set. Locality Preserving Projection (LPP) [8] is defined everywhere in the ambient space rather than just on the training data set. So LPP outperforms LLE and LE in locating and explaining new test data in the feature subspace. However, LPP, like Isomap, LLE and LE, does not make use of the class label information, which is much available for recognition and classification tasks. Chen et al. [9] proposed a Local Discriminant Embedding (LDE) by incorporating the class information into the construction of embedding and deriving the embedding for nearest-neighbor classification in a low-dimensional space. Nevertheless, distant points are deemphasized efficiently by LDE, which may weaken the performance of classification. In the Locality Discriminating Projection (LDP) algorithm [10], the overlap among the class-specific manifolds is approximated by an invader graph, and a locality discriminant criterion is proposed to find the projections that best preserve the within-class local structures while decrease the between-class overlap. By discovering the local manifold structure, Locality Sensitive Discriminant Analysis (LSDA) [11] finds a projection which maximizes the margin between data points from different classes at each local area. Specifically, the data points are mapped into a subspace in which the nearby points with the same label are close to each other while the nearby points with different labels are far apart. These supervised methods use only the labeled samples including intra-class samples and inter-class samples to construct adjacent graph but do not consider the usefulness of abundant unlabeled data. However, in real-world pattern recognition and classification, it is impractical to expect the availability of large quantities of labeled samples because labeling data requires much laborious human effort and takes much time. On the other hand, unlabeled data may be abundant and can be easily and cheaply obtained, which is possible due to the fast growth of digital photography industry. Under such circumstances, we frequently meet a data set of relatively small labeled instances and large unlabeled instances. In this case, the above mentioned supervised methods often cause the small-sample-size problem when the labeled (i.e. training) sample number is less than the sample dimension. Moreover, they ignore or discard the possibility of the potential classification of unlabeled data. Hence, it may yield sub-optimal results. In plant leaf and palmprint classification, we can easily obtain a large number of unlabeled leaves or palmprints, how to make use of all labeled and unlabeled plant leaves or palmprints to improve the discriminant performance of the leaf and palmprint classification algorithm? A possible solution to deal with insufficient training (labeled) leaves or palmprints is to construct the transformation matrix by combining both labeled and unlabeled leaves or palmprints in dimensionality reduction methods. Recently, semi-supervised dimensional reduction methods have attracted great interest in many real-world problems [12], [13], [14], [15], [16], [17], [18]. In Locality Sensitive Semi-Supervised Feature Selection (called LSDF) [18], the labeled points are used to maximize the margin between data points from different classes, while the unlabeled points are used to discover the geometrical structure of the data space. In this paper, motivated by the LSDA and LSDF, we aim at extending LSDA in the semi-supervised case and propose a semi-supervised locally discriminant projection (SSLDP) for leaf and palmprint classification.

The rest of paper is organized as follows: LSDA is briefly introduced in Section 2. The SSLDP algorithm is proposed in Section 3. Experimental results are given in Section 4. Finally, conclusions and future work are drawn in Section 5.

Section snippets

Related works

For mathematical convenience, in LSDA algorithm [11], we denote the original data matrix as X = [X₁, X₂, … , X_n] ∈ R^D×n and the projected data matrix as Y = [Y₁, Y₂, … , Y_n] ∈ R^d×n, where d ≪ D and Y_i = A^TX_i. For each data point X_i, its k nearest neighborhood N(X_i) can be split into two subsets, within-class-neighbor N_w(X_i) and between-class-neighbor N_b(X_i), where N_w(X_i) contains the neighbors sharing the same label with X_i, while N_b(X_i) contains the neighbors having different labels. Then two graphs are

Semi-supervised locally discriminant projection

In this section, we attempt to extend the LSDA model by incorporating the labeled data and unlabeled data.

Experimental results and analysis

In this section, the performance of SSLDP is evaluated on the plant leaf and palmprint image databases and compared with the performances of LPP [8], LSDA [11] and LSDF [18]. LSDA only makes use of the labeled data points. However, LSDF and our proposed method make use of both labeled and unlabeled data points. Although LPP makes use of all data points, the labeled data are regarded as a part of the unlabeled data, i.e., the label information is ignored. In the following experiments, the 1-NN

Conclusions and future work

In this paper, a semi-supervised dimensional reduction algorithm named supervised orthogonal projection embedding (SSLDP) was proposed and applied successfully to plant leaf and palmprint classification. SSLDP utilizes all labeled and unlabeled data points to construct the within-class and between-class weight matrices which characterize the possible intra-class compactness and inter-class separability. The experiments on plant leaf and palmprint databases demonstrated that SSLDP is effective

Acknowledgements

This work was supported by the grants of the National Science Foundation of China, Nos. 60975005 and 60805021, and the grant of the Guide Project of Innovative Base of Chinese Academy of Sciences (CAS), No. KSCX1-YW-R-30.

References (21)

Q. Wang et al.
Time series analysis with multiple resolutions
Information Systems
(2010)
Q. Wang et al.
A dimensionality reduction technique for efficient time series similarity analysis
Information Systems
(2008)
W. Zhang et al.
Text clustering using frequent itemsets
Knowledge-Based Systems
(2010)
J. Tenenbaum et al.
A global geometric framework for nonlinear dimension reduction
Science
(2000)
F. Lin et al.
The use of hybrid manifold learning and support vector machines in the prediction of business failure
Knowledge-Based Systems
(2011)
L.K. Saul et al.
Nonlinear dimensional reduction by locally linear embedding
Science
(2000)
M. Belkin et al.
Laplacian eigenmaps for dimensional reduction and data representation
Neural Computation
(2003)
X.F. He et al.
Locality preserving projections
Proceedings of the Conference on Advances in Neural Information Processing Systems
(2003)
H.T. Chen et al.
Local discriminant embedding and its variant
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition
(2005)
J. Hu et al.
Learning a locality discriminating projection for classification
Knowledge-Based Systems
(2009)

There are more references available in the full text version of this article.

Cited by (32)

Plant species recognition methods using leaf image: Overview
2020, Neurocomputing
Plant plays an important role in agricultural, industrial, medicine, environmental and ecological protection. Recently, with global warming, biodiversity loss, rapid urban development and environmental damage, people have been seriously destroying the natural environments, which results in that a large number of plant species constantly dying and even dying out every year. It is essential to protect plant species. The first step of protecting plants is to recognize them and understand what they are and where they come from. But there are a large number of plant species that have been named on Earth, and many are still unknown yet, it is difficult to identifiy each species. To handle such huge information, develop a quick and efficient classification method has become a significant research. Plant species can be recognized by its leaf, flower, skin, fruit and seed, etc. Relatively speaking, using leaf to recognize plant species is very simple and convenient, and many leaf based plant species recognition methods have been proposed. In this paper, we mainly summarize the existing leaf based plant species identification methods, including plant leaf characteristic, public databases, feature extraction based methods, subspace learning based methods, sparse representation based methods, and deep learning based methods. The aim is to emphasize the importance of plant species identification, train people to know about plant species, and provide guidance and comprehensive study for the beginners in this field, in turn, to treasure and protect plant species.
Semi-supervised neighborhood discrimination index for feature selection
2020, Knowledge-Based Systems
Neighborhood discriminant index (NDI) is an effective feature selection method for supervised learning. In reality, it is easy to obtain unlabeled data and is costly to tag them all. Thus, the given dataset commonly has only a small amount of tagged samples and a large amount of unlabeled ones, which cannot be handled by supervised learning methods. For this situation, we propose a semi-supervised feature selection method called semi-supervised neighborhood discriminant index (SSNDI) that combines NDI and the Laplacian score method to effectively deal with both labeled and unlabeled samples. The goal of SSNDI is to find an optimal feature subset that has a good ability to keep local geometrical structure and to distinguish samples belonging to different classes. In SSNDI, the classical Laplacian score method is modified to cooperate the iterative form of NDI. In each iteration, SSNDI picks up an important feature according to the new criterion that is a mixture of NDI and the modified Laplacian score. Extensive experiments are conducted on UCI and microarray gene datasets. Experimental results confirm that SSNDI can achieve a better performance than NDI and the other state-of-the-art semi-supervised methods.
High-dimensional count data clustering based on an exponential approximation to the multinomial Beta-Liouville distribution
2020, Information Sciences
Citation Excerpt :
The results from the literature on the Swedish leaf database are summarized in Table 6. As shown in Table 6, the accuracy of recognizing the leaf type using our approach is higher than SSLDP [34], and MSDM [35], and almost comparable to the other methods, with an accuracy of 96.50%, which is a promising result considering that our approach is completely unsupervised. Human activity and action recognition is a popular topic in computer vision.
In this paper, we propose a mixture model for high-dimensional count data clustering based on an exponential-family approximation of the Multinomial Beta-Liouville distribution, which we call EMBL. We deal simultaneously with the problems of fitting the model to observed data and selecting the number of components. The learning algorithm automatically selects the optimal number of components and avoids several drawbacks of the standard Expectation-Maximization algorithm, including the sensitivity to initialization and possible convergence to the boundary of the parameter space. We demonstrate the effectiveness and robustness of the proposed clustering approach through a set of extensive empirical experiments that involve challenging real-world applications. The results reveal that the novel proposed model strives to achieve higher accuracy compared to the state-of-the-art generative models for count data clustering. Furthermore, the superior performance of EMBL demonstrates its flexibility and ability to address the burstiness phenomenon successfully, as well as shows its computational efficiency, especially when dealing with sparse high-dimensional vectors.
Combining sparse representation and singular value decomposition for plant recognition
2018, Applied Soft Computing Journal
Citation Excerpt :
Valliammal et al. [14] described an optimal approach for feature subset selection to classify the leaf images based on GA and Kernel PCA (KPCA). Recent researches show that the high dimensional leaf images lie on a low-dimensional nonlinear manifold, and some nonlinear manifold learning methods have been proposed for the plant recognition [15–17]. Although several effective plant recognition methods have been proposed, the classical methods and manifold learning methods have some limitations as follows:
Plant recognition is one of important research areas of pattern recognition. As plant leaves are extremely irregular, complex and diverse, many existing plant classification and recognition methods cannot meet the requirements of the automatic plant recognition system. A plant recognition approach is proposed by combining singular value decomposition (SVD) and sparse representation (SR) in this paper. The difference from the traditional plant classification methods is that, instead of establishing a classification model by extracting the classification features, the proposed method directly reduces the image dimensionality and recognizes the test samples based on the sparse coefficients, and uses the class-specific dictionary learning for sparse modeling to reduce the recognition time. The proposed method is verified on two plant leaf datasets and is compared with other four existing plant recognition methods The overall recognition accuracy of the proposed approach for the 6 kinds of plant leaves is over 96%, which is the best classification rate. The experimental results show the feasibility and effectiveness of the proposed method.
Semi-supervised matrixized least squares support vector machine
2017, Applied Soft Computing Journal
Citation Excerpt :
In such situations, the performance of the supervised algorithms usually deteriorates because of the lacking of sufficient supervised information. To overcome this shortcoming, the SSL [7,19,40,41], which can exploit a large number of unlabeled patterns along with relatively few labeled ones to build more efficient classifiers, has received significant attention. As far as we know, some semi-supervised tensor learning algorithms [18,35] have been proposed according to the idea of the alternating projection algorithm, e.g. the transductive STM (TSTM) [35] and the concave–convex procedure-based TSTM (CCCP-TSTM) [18].
The matrix learning, which studies how to design algorithms based on matrix patterns, is proven to have some significant advantages over the vector learning such as the improved classification performance and the low computational complexity. However, most of the traditional matrix learning algorithms are supervised ones which require labels of all patterns. In practice, the difficult acquisition of labeled patterns is a major challenge for supervised algorithms. An effective approach to handle this problem is the manifold regularization, which is known as one of the most elegant frameworks for the semi-supervised learning (SSL). The Laplacian regularized least squares (LapRLS) is a classical vector learning algorithm following this framework. Inspired by the advantages of the matrix learning and the SSL, in this paper, we propose a novel semi-supervised matrix learning algorithm by incorporating the manifold regularization into the matrixized least squares support vector machine (MatLSSVM), termed as Laplacian matrixized LSSVM, or LapMatLSSVM for short. MatLSSVM, which has been built by combining the merits of the matrix learning and LSSVM, is a promising supervised algorithm. As an extension of MatLSSVM to the SSL, LapMatLSSVM can not only directly operate on matrix patterns, but also effectively exploit the geometric information embedded in unlabeled matrix patterns. Moreover, its generalization risk bound is tighter than that of LapRLS in terms of the Rademacher complexity. For the implementation, LapMatLSSVM learns in an iterative manner, and solves a least squares optimization problem at each iteration. Extensive experiments have been conducted across two kinds of datasets: image datasets and UCI datasets. Experimental results confirm the benefits of the proposed algorithm.
Laplacian least squares twin support vector machine for semi-supervised classification
2014, Neurocomputing
The recently proposed Laplacian twin support vector machine (Lap-TSVM) is an excellent nonparallel-based kernel tool for semi-supervised classification problems, where its optimal decision hyperplane is determined by solving two quadratic programming problems (QPPs) with matrix inversion operations. In order to reduce its computation cost, in this paper, we formulate a least squares version of Lap-TSVM, termed as Lap-LSTSVM, leading to an extremely fast approach for generating semi-supervised classifiers. Besides, a meaningful regularization parameter is introduced for each problem in Lap-LSTSVM to balance the regularization terms between the reproducing kernel Hilbert spaces (RHKS) term and the manifold regularization (MR) term, instead of two parameters used in Lap-TSVM. In addition, an efficient conjugate gradient (CG) algorithm is further developed for solving the systems of linear equations (LEs) appeared to speed up the training procedure. Experimental results on both several synthetic and real-world datasets confirm the feasibility and the effectiveness of the proposed method.

View all citing articles on Scopus

View full text

Semi-supervised locally discriminant projection for classification and recognition

Abstract

Introduction

Section snippets

Related works

Semi-supervised locally discriminant projection

Experimental results and analysis

Conclusions and future work

Acknowledgements

Time series analysis with multiple resolutions

Information Systems

A dimensionality reduction technique for efficient time series similarity analysis

Information Systems

Text clustering using frequent itemsets

Knowledge-Based Systems

A global geometric framework for nonlinear dimension reduction

Science

The use of hybrid manifold learning and support vector machines in the prediction of business failure

Knowledge-Based Systems

Nonlinear dimensional reduction by locally linear embedding

Science

Laplacian eigenmaps for dimensional reduction and data representation

Neural Computation

Locality preserving projections

Proceedings of the Conference on Advances in Neural Information Processing Systems

Local discriminant embedding and its variant

Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition

Learning a locality discriminating projection for classification

Knowledge-Based Systems