Semi-supervised locally discriminant projection for classification and recognition

https://doi.org/10.1016/j.knosys.2010.11.002Get rights and content

Abstract

Semi-supervised dimensional reduction methods play an important role in pattern recognition, which are likely to be more suitable for plant leaf and palmprint classification, since labeling plant leaf and palmprint often requires expensive human labor, whereas unlabeled plant leaf and palmprint is far easier to obtain at very low cost. In this paper, we attempt to utilize the unlabeled data to aid plant leaf and palmprint classification task with the limited number of the labeled plant leaf or palmprint data, and propose a semi-supervised locally discriminant projection (SSLDP) algorithm for plant leaf and palmprint classification. By making use of both labeled and unlabeled data in learning a transformation for dimensionality reduction, the proposed method can overcome the small-sample-size (SSS) problem under the situation where labeled data are scant. In SSLDP, the labeled data points, combined with the unlabeled data ones, are used to construct the within-class and between-class weight matrices incorporating the neighborhood information of the data set. The experiments on plant leaf and palmprint databases demonstrate that SSLDP is effective and feasible for plant leaf and palmprint classification.

Introduction

Dimensionality reduction is a very important process in the plant leaf and palmprint image data analysis. It is often used as a preprocessing step before further plant leaf and palmprint classification, since it helps to eliminate unimportant and noisy factors, and to avoid the ‘curse-of-dimensionality’. Multiresolution Vector Quantized (MVQ) [1] approximation, along with a distance function, keeps both local and global information about the data. Instead of keeping low-level time series values, it maintains high-level feature information (key subsequences), facilitating the introduction of more meaningful similarity measures. Wang and Megalooikonomou [2] proposed a dimensionality reduction technique for time series analysis that significantly improves the efficiency and accuracy of similarity searches. Zhang et al. [3] used frequent itemsets for document clustering under the consideration of the demand of dimensionality reduction for representation. Recently, manifold learning is becoming the most promising dimensionality reduction algorithm, which plays an important role in many applications such as face, palmprint and gait recognition. Isometric Feature Mapping (Isomap) [4], [5], Locally Linear Embedding (LLE) [6] and Laplacian Eigenmap (LE) [7] are effective methods for data visualization and dimensional reduction, but they are defined only on the training data set, i.e., it is unclear how to evaluate the maps for new test data set. Locality Preserving Projection (LPP) [8] is defined everywhere in the ambient space rather than just on the training data set. So LPP outperforms LLE and LE in locating and explaining new test data in the feature subspace. However, LPP, like Isomap, LLE and LE, does not make use of the class label information, which is much available for recognition and classification tasks. Chen et al. [9] proposed a Local Discriminant Embedding (LDE) by incorporating the class information into the construction of embedding and deriving the embedding for nearest-neighbor classification in a low-dimensional space. Nevertheless, distant points are deemphasized efficiently by LDE, which may weaken the performance of classification. In the Locality Discriminating Projection (LDP) algorithm [10], the overlap among the class-specific manifolds is approximated by an invader graph, and a locality discriminant criterion is proposed to find the projections that best preserve the within-class local structures while decrease the between-class overlap. By discovering the local manifold structure, Locality Sensitive Discriminant Analysis (LSDA) [11] finds a projection which maximizes the margin between data points from different classes at each local area. Specifically, the data points are mapped into a subspace in which the nearby points with the same label are close to each other while the nearby points with different labels are far apart. These supervised methods use only the labeled samples including intra-class samples and inter-class samples to construct adjacent graph but do not consider the usefulness of abundant unlabeled data. However, in real-world pattern recognition and classification, it is impractical to expect the availability of large quantities of labeled samples because labeling data requires much laborious human effort and takes much time. On the other hand, unlabeled data may be abundant and can be easily and cheaply obtained, which is possible due to the fast growth of digital photography industry. Under such circumstances, we frequently meet a data set of relatively small labeled instances and large unlabeled instances. In this case, the above mentioned supervised methods often cause the small-sample-size problem when the labeled (i.e. training) sample number is less than the sample dimension. Moreover, they ignore or discard the possibility of the potential classification of unlabeled data. Hence, it may yield sub-optimal results. In plant leaf and palmprint classification, we can easily obtain a large number of unlabeled leaves or palmprints, how to make use of all labeled and unlabeled plant leaves or palmprints to improve the discriminant performance of the leaf and palmprint classification algorithm? A possible solution to deal with insufficient training (labeled) leaves or palmprints is to construct the transformation matrix by combining both labeled and unlabeled leaves or palmprints in dimensionality reduction methods. Recently, semi-supervised dimensional reduction methods have attracted great interest in many real-world problems [12], [13], [14], [15], [16], [17], [18]. In Locality Sensitive Semi-Supervised Feature Selection (called LSDF) [18], the labeled points are used to maximize the margin between data points from different classes, while the unlabeled points are used to discover the geometrical structure of the data space. In this paper, motivated by the LSDA and LSDF, we aim at extending LSDA in the semi-supervised case and propose a semi-supervised locally discriminant projection (SSLDP) for leaf and palmprint classification.

The rest of paper is organized as follows: LSDA is briefly introduced in Section 2. The SSLDP algorithm is proposed in Section 3. Experimental results are given in Section 4. Finally, conclusions and future work are drawn in Section 5.

Section snippets

Related works

For mathematical convenience, in LSDA algorithm [11], we denote the original data matrix as X = [X1, X2,  , Xn]  RD×n and the projected data matrix as Y = [Y1, Y2,  , Yn]  Rd×n, where d  D and Yi = ATXi. For each data point Xi, its k nearest neighborhood N(Xi) can be split into two subsets, within-class-neighbor Nw(Xi) and between-class-neighbor Nb(Xi), where Nw(Xi) contains the neighbors sharing the same label with Xi, while Nb(Xi) contains the neighbors having different labels. Then two graphs are

Semi-supervised locally discriminant projection

In this section, we attempt to extend the LSDA model by incorporating the labeled data and unlabeled data.

Experimental results and analysis

In this section, the performance of SSLDP is evaluated on the plant leaf and palmprint image databases and compared with the performances of LPP [8], LSDA [11] and LSDF [18]. LSDA only makes use of the labeled data points. However, LSDF and our proposed method make use of both labeled and unlabeled data points. Although LPP makes use of all data points, the labeled data are regarded as a part of the unlabeled data, i.e., the label information is ignored. In the following experiments, the 1-NN

Conclusions and future work

In this paper, a semi-supervised dimensional reduction algorithm named supervised orthogonal projection embedding (SSLDP) was proposed and applied successfully to plant leaf and palmprint classification. SSLDP utilizes all labeled and unlabeled data points to construct the within-class and between-class weight matrices which characterize the possible intra-class compactness and inter-class separability. The experiments on plant leaf and palmprint databases demonstrated that SSLDP is effective

Acknowledgements

This work was supported by the grants of the National Science Foundation of China, Nos. 60975005 and 60805021, and the grant of the Guide Project of Innovative Base of Chinese Academy of Sciences (CAS), No. KSCX1-YW-R-30.

References (21)

  • Q. Wang et al.

    Time series analysis with multiple resolutions

    Information Systems

    (2010)
  • Q. Wang et al.

    A dimensionality reduction technique for efficient time series similarity analysis

    Information Systems

    (2008)
  • W. Zhang et al.

    Text clustering using frequent itemsets

    Knowledge-Based Systems

    (2010)
  • J. Tenenbaum et al.

    A global geometric framework for nonlinear dimension reduction

    Science

    (2000)
  • F. Lin et al.

    The use of hybrid manifold learning and support vector machines in the prediction of business failure

    Knowledge-Based Systems

    (2011)
  • L.K. Saul et al.

    Nonlinear dimensional reduction by locally linear embedding

    Science

    (2000)
  • M. Belkin et al.

    Laplacian eigenmaps for dimensional reduction and data representation

    Neural Computation

    (2003)
  • X.F. He et al.

    Locality preserving projections

    Proceedings of the Conference on Advances in Neural Information Processing Systems

    (2003)
  • H.T. Chen et al.

    Local discriminant embedding and its variant

    Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition

    (2005)
  • J. Hu et al.

    Learning a locality discriminating projection for classification

    Knowledge-Based Systems

    (2009)
There are more references available in the full text version of this article.

Cited by (32)

  • High-dimensional count data clustering based on an exponential approximation to the multinomial Beta-Liouville distribution

    2020, Information Sciences
    Citation Excerpt :

    The results from the literature on the Swedish leaf database are summarized in Table 6. As shown in Table 6, the accuracy of recognizing the leaf type using our approach is higher than SSLDP [34], and MSDM [35], and almost comparable to the other methods, with an accuracy of 96.50%, which is a promising result considering that our approach is completely unsupervised. Human activity and action recognition is a popular topic in computer vision.

  • Combining sparse representation and singular value decomposition for plant recognition

    2018, Applied Soft Computing Journal
    Citation Excerpt :

    Valliammal et al. [14] described an optimal approach for feature subset selection to classify the leaf images based on GA and Kernel PCA (KPCA). Recent researches show that the high dimensional leaf images lie on a low-dimensional nonlinear manifold, and some nonlinear manifold learning methods have been proposed for the plant recognition [15–17]. Although several effective plant recognition methods have been proposed, the classical methods and manifold learning methods have some limitations as follows:

  • Semi-supervised matrixized least squares support vector machine

    2017, Applied Soft Computing Journal
    Citation Excerpt :

    In such situations, the performance of the supervised algorithms usually deteriorates because of the lacking of sufficient supervised information. To overcome this shortcoming, the SSL [7,19,40,41], which can exploit a large number of unlabeled patterns along with relatively few labeled ones to build more efficient classifiers, has received significant attention. As far as we know, some semi-supervised tensor learning algorithms [18,35] have been proposed according to the idea of the alternating projection algorithm, e.g. the transductive STM (TSTM) [35] and the concave–convex procedure-based TSTM (CCCP-TSTM) [18].

View all citing articles on Scopus
View full text