Semi-supervised locally discriminant projection for classification and recognition
Introduction
Dimensionality reduction is a very important process in the plant leaf and palmprint image data analysis. It is often used as a preprocessing step before further plant leaf and palmprint classification, since it helps to eliminate unimportant and noisy factors, and to avoid the ‘curse-of-dimensionality’. Multiresolution Vector Quantized (MVQ) [1] approximation, along with a distance function, keeps both local and global information about the data. Instead of keeping low-level time series values, it maintains high-level feature information (key subsequences), facilitating the introduction of more meaningful similarity measures. Wang and Megalooikonomou [2] proposed a dimensionality reduction technique for time series analysis that significantly improves the efficiency and accuracy of similarity searches. Zhang et al. [3] used frequent itemsets for document clustering under the consideration of the demand of dimensionality reduction for representation. Recently, manifold learning is becoming the most promising dimensionality reduction algorithm, which plays an important role in many applications such as face, palmprint and gait recognition. Isometric Feature Mapping (Isomap) [4], [5], Locally Linear Embedding (LLE) [6] and Laplacian Eigenmap (LE) [7] are effective methods for data visualization and dimensional reduction, but they are defined only on the training data set, i.e., it is unclear how to evaluate the maps for new test data set. Locality Preserving Projection (LPP) [8] is defined everywhere in the ambient space rather than just on the training data set. So LPP outperforms LLE and LE in locating and explaining new test data in the feature subspace. However, LPP, like Isomap, LLE and LE, does not make use of the class label information, which is much available for recognition and classification tasks. Chen et al. [9] proposed a Local Discriminant Embedding (LDE) by incorporating the class information into the construction of embedding and deriving the embedding for nearest-neighbor classification in a low-dimensional space. Nevertheless, distant points are deemphasized efficiently by LDE, which may weaken the performance of classification. In the Locality Discriminating Projection (LDP) algorithm [10], the overlap among the class-specific manifolds is approximated by an invader graph, and a locality discriminant criterion is proposed to find the projections that best preserve the within-class local structures while decrease the between-class overlap. By discovering the local manifold structure, Locality Sensitive Discriminant Analysis (LSDA) [11] finds a projection which maximizes the margin between data points from different classes at each local area. Specifically, the data points are mapped into a subspace in which the nearby points with the same label are close to each other while the nearby points with different labels are far apart. These supervised methods use only the labeled samples including intra-class samples and inter-class samples to construct adjacent graph but do not consider the usefulness of abundant unlabeled data. However, in real-world pattern recognition and classification, it is impractical to expect the availability of large quantities of labeled samples because labeling data requires much laborious human effort and takes much time. On the other hand, unlabeled data may be abundant and can be easily and cheaply obtained, which is possible due to the fast growth of digital photography industry. Under such circumstances, we frequently meet a data set of relatively small labeled instances and large unlabeled instances. In this case, the above mentioned supervised methods often cause the small-sample-size problem when the labeled (i.e. training) sample number is less than the sample dimension. Moreover, they ignore or discard the possibility of the potential classification of unlabeled data. Hence, it may yield sub-optimal results. In plant leaf and palmprint classification, we can easily obtain a large number of unlabeled leaves or palmprints, how to make use of all labeled and unlabeled plant leaves or palmprints to improve the discriminant performance of the leaf and palmprint classification algorithm? A possible solution to deal with insufficient training (labeled) leaves or palmprints is to construct the transformation matrix by combining both labeled and unlabeled leaves or palmprints in dimensionality reduction methods. Recently, semi-supervised dimensional reduction methods have attracted great interest in many real-world problems [12], [13], [14], [15], [16], [17], [18]. In Locality Sensitive Semi-Supervised Feature Selection (called LSDF) [18], the labeled points are used to maximize the margin between data points from different classes, while the unlabeled points are used to discover the geometrical structure of the data space. In this paper, motivated by the LSDA and LSDF, we aim at extending LSDA in the semi-supervised case and propose a semi-supervised locally discriminant projection (SSLDP) for leaf and palmprint classification.
The rest of paper is organized as follows: LSDA is briefly introduced in Section 2. The SSLDP algorithm is proposed in Section 3. Experimental results are given in Section 4. Finally, conclusions and future work are drawn in Section 5.
Section snippets
Related works
For mathematical convenience, in LSDA algorithm [11], we denote the original data matrix as X = [X1, X2, … , Xn] ∈ RD×n and the projected data matrix as Y = [Y1, Y2, … , Yn] ∈ Rd×n, where d ≪ D and Yi = ATXi. For each data point Xi, its k nearest neighborhood N(Xi) can be split into two subsets, within-class-neighbor Nw(Xi) and between-class-neighbor Nb(Xi), where Nw(Xi) contains the neighbors sharing the same label with Xi, while Nb(Xi) contains the neighbors having different labels. Then two graphs are
Semi-supervised locally discriminant projection
In this section, we attempt to extend the LSDA model by incorporating the labeled data and unlabeled data.
Experimental results and analysis
In this section, the performance of SSLDP is evaluated on the plant leaf and palmprint image databases and compared with the performances of LPP [8], LSDA [11] and LSDF [18]. LSDA only makes use of the labeled data points. However, LSDF and our proposed method make use of both labeled and unlabeled data points. Although LPP makes use of all data points, the labeled data are regarded as a part of the unlabeled data, i.e., the label information is ignored. In the following experiments, the 1-NN
Conclusions and future work
In this paper, a semi-supervised dimensional reduction algorithm named supervised orthogonal projection embedding (SSLDP) was proposed and applied successfully to plant leaf and palmprint classification. SSLDP utilizes all labeled and unlabeled data points to construct the within-class and between-class weight matrices which characterize the possible intra-class compactness and inter-class separability. The experiments on plant leaf and palmprint databases demonstrated that SSLDP is effective
Acknowledgements
This work was supported by the grants of the National Science Foundation of China, Nos. 60975005 and 60805021, and the grant of the Guide Project of Innovative Base of Chinese Academy of Sciences (CAS), No. KSCX1-YW-R-30.
References (21)
- et al.
Time series analysis with multiple resolutions
Information Systems
(2010) - et al.
A dimensionality reduction technique for efficient time series similarity analysis
Information Systems
(2008) - et al.
Text clustering using frequent itemsets
Knowledge-Based Systems
(2010) - et al.
A global geometric framework for nonlinear dimension reduction
Science
(2000) - et al.
The use of hybrid manifold learning and support vector machines in the prediction of business failure
Knowledge-Based Systems
(2011) - et al.
Nonlinear dimensional reduction by locally linear embedding
Science
(2000) - et al.
Laplacian eigenmaps for dimensional reduction and data representation
Neural Computation
(2003) - et al.
Locality preserving projections
Proceedings of the Conference on Advances in Neural Information Processing Systems
(2003) - et al.
Local discriminant embedding and its variant
Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition
(2005) - et al.
Learning a locality discriminating projection for classification
Knowledge-Based Systems
(2009)
Cited by (32)
Plant species recognition methods using leaf image: Overview
2020, NeurocomputingSemi-supervised neighborhood discrimination index for feature selection
2020, Knowledge-Based SystemsHigh-dimensional count data clustering based on an exponential approximation to the multinomial Beta-Liouville distribution
2020, Information SciencesCitation Excerpt :The results from the literature on the Swedish leaf database are summarized in Table 6. As shown in Table 6, the accuracy of recognizing the leaf type using our approach is higher than SSLDP [34], and MSDM [35], and almost comparable to the other methods, with an accuracy of 96.50%, which is a promising result considering that our approach is completely unsupervised. Human activity and action recognition is a popular topic in computer vision.
Combining sparse representation and singular value decomposition for plant recognition
2018, Applied Soft Computing JournalCitation Excerpt :Valliammal et al. [14] described an optimal approach for feature subset selection to classify the leaf images based on GA and Kernel PCA (KPCA). Recent researches show that the high dimensional leaf images lie on a low-dimensional nonlinear manifold, and some nonlinear manifold learning methods have been proposed for the plant recognition [15–17]. Although several effective plant recognition methods have been proposed, the classical methods and manifold learning methods have some limitations as follows:
Semi-supervised matrixized least squares support vector machine
2017, Applied Soft Computing JournalCitation Excerpt :In such situations, the performance of the supervised algorithms usually deteriorates because of the lacking of sufficient supervised information. To overcome this shortcoming, the SSL [7,19,40,41], which can exploit a large number of unlabeled patterns along with relatively few labeled ones to build more efficient classifiers, has received significant attention. As far as we know, some semi-supervised tensor learning algorithms [18,35] have been proposed according to the idea of the alternating projection algorithm, e.g. the transductive STM (TSTM) [35] and the concave–convex procedure-based TSTM (CCCP-TSTM) [18].