Elsevier

Pattern Recognition

Volume 45, Issue 1, January 2012, Pages 54-65
Pattern Recognition

Recursive “concave–convex” Fisher Linear Discriminant with applications to face, handwritten digit and terrain recognition

https://doi.org/10.1016/j.patcog.2011.07.008Get rights and content

Abstract

In classification, previous studies have shown that an eigenvalue based technique can be cast as an related SVM-type problem and that by solving this SVM-type problem, the performance can be improved significantly. In this paper, we develop a recursive “concave–convex” Fisher Linear Discriminant (DR) (RPFLD) for dimension reduction technique of high-dimensional data to extract as many meaningful features as possible, which incorporates the fundamental idea behind Fisher Linear Discriminant and casts the Fisher Linear Discriminant as a “concave–convex” programming problem based on the hinge loss. The solution of our method follows from solving the related SVM-type optimization problems iteratively, which means the proposed method, can be viewed as the combination of multiple related SVM-type problems. The special formulation of our method provides convenience for constructing sparse multi-class Fisher Linear Discriminant directly. Due to use of a recursive procedure, the number of features available from RPFLD is independent of the number of classes, meaning that in contrast to the original Fisher Linear Discriminant the number of features available from our method has no upper bound. We evaluate our algorithm on the Yale, and ORL face image databases, handwritten digit database and Terrain image dataset. Experimental results show that RPFLD outperforms other Fisher Linear Discriminant algorithms.

Highlights

► RPFLD casts the FLD as the related SVM-type problems. ► Based on RPFLD, it is easy to construct the sparse multi-class FLD. ► In contrast to FLD, RPFLD has no the limitation on the number of features. ► A recursive procedure is applied to yield multiple projection axes. ► Experiments disclose RPFLD has superior performance to other classic DR algorithms.

Introduction

Extracting good features is crucial for improving the performance of any pattern classifiers, especially when faced with high-dimensional data. A challenging problem is how to develop an effective procedure for extracting a small number of useful but good features. A way to attempt to resolve this problem is to use the dimensionality reduction (DR) techniques, each of which seeks to find an optimal low-dimensional subspace in a higher-dimensional input space. In the last years, principal component analysis (PCA) [1], [2], [5] and Fisher Linear Discriminant (FLD) [3], [4], as the popular linear subspace learning algorithms for this purpose, have been widely applied in the fields of pattern recognition and computer vision areas and have become two powerful tools for face recognition [6], [7], [8], [9].

FLD is supervised, which best separates the data corresponding to different classes. Many comparative studies between FLD and PCA on the face recognition problem were reported independently by numerous authors [3], [8], [9], [10], in which FLD outperformed PCA significantly [11], implying that incorporating supervised information in dimensionality reduction is important for satisfactory design of any classifier. Therefore, FLD can be applied to more pattern recognition problems [6], [9], [12]. Due to its effectiveness, FLD has been recently attracted much attention. In the recent years, various FLD based methods have been developed [3], [13], [14], [15], [16], [17], [18], [19], while most of them are to resolve the singularity problem of the within-class scatter matrix.

Currently, eigenvalue based techniques for fully supervised classification, such as Generalized Multisurface Support Vector Machine (GEPSVM) [20], [21] and our previously proposed method, i.e., Multi-weight Vector Projection Support Vector Machine (MVSVM) [22], have attracted much attention. In these works, the results have demonstrated that reformulating an eigenvalue problem as the related SVM-type formulation is helpful for significant performance improvement. More importantly, based the new formulation, we can easily obtain the corresponding sparse model. We first discuss GEPSVM before deeply discussing RPFLD to be proposed in this paper. The GEPSVM problem is the following: given two data matrices A and B, find two optimal fitting planes so that each approximates the points in the corresponding class, where A contains the samples in class +1 and B the samples in class −1. The GEPSVM criterion can be written as follows:min(w1,b1)0(Aw1+e1b12+δ[w1b1]2)/Bw1+e2b12,andmin(w2,b2)0(Bw2+e2b22+δ[w2b2]2/Aw2+e1b22),where δ acts as a regularization parameter, and (w1,b1) and (w2,b2) are, respectively, the directions and thresholds of two planes. The geometric interpretation of GEPSVM is that each plane needing to be estimated is closest to the points of its own class and furthest from the points of the other classes. Each plane is obtained by minimizing the sum of the squares of two-norm distances between the plane to each of the points for its own class and simultaneously maximizing the sum of the squares of two-norm distances between the plane to each of the points of the other classes. That is, the plane can be generated by solving a Rayleigh quotient (RQ). The special mathematical formulation based on the above geometric interpretation of GEPSVM permits an interesting modification of GEPSVM, called Twin SVM (TWSVM) that has a similar formulation as that of SVM [23]. The TWSVM has the following formulation:min(w1,b1)Aw1+e1b12+δe2Tξ,s.t.(Bw1+e2b1)+ξe2,andmin(w2,b2)Bw2+e2b22+δe1ξ,s.t.Aw2+e1b2+ξe1.

TWSVM is similar as GEPSVM in spirit, but they are based on a different formulation. In contrast to GEPSVM, TWSVM obtains each plane by minimizing the sum of the squares of two-norm distances between the plane to each of the points for its own class and simultaneously maximizing the distances rather than the sum of the squares of two-norm distances between the plane to each of the points of the other classes. Our previous proposed method, i.e., MVSVM [22] is formally similar as GEPSVM. Based on MVSVM, we further proposed Recursive Projection Twin Support Vector Machine via Within-Class Variance Minimization (PTSVM) [24]. Similarly to TWSVM, PTSVM also casts MVSVM as the related SVM-type problem. It is not difficult to check that SVM can be reformulated as an eigenvalue based optimization problem where the separating plane is produced by minimizing the squares of the two-norm w divided by the sum of the squares of the two-norm distances between each of the points to the separating plane. PTSVM and TWSVM have proved the effectiveness over the original GEPSVM and MVSVM algorithms.

Of course, the aim of this paper is not to focus on the classification. The above analysis is to deliver such information, i.e., an eigenvalue based method can be easily cast as a related SVM-type convex program. This reformulation can improve the performance and we can easily obtain the sparse model of the eigenvalue-based method. Recall that FLD aims at maximizing the ratio between the projected between-class and within-class scatters, i.e., maximizes the projected between-class scatter and minimizes the projected within-class scatter. An equivalent geometric interpretation of FLD is that the projected mean of the complete training points is closest to the projected mean of each class and simultaneously furthest from each of the projected points. The projections are obtained by minimizing the sum of the squares of two-norm distances between the projected mean of the complete training points to each of the projected points and simultaneously maximizing the sum of the squares of two-norm distances between the projected mean of the complete training points to the projected mean of each class. Obviously, FLD can also be reformulated as a SVM-type problem where the distances rather than the sum of the squares of two-norm distances between the projected mean of the complete training points to the projected mean of each class are maximized.

The previous works [25], [26] have concluded that for binary classification problem, the solution to the least squares problem is in the same direction as the solution of FLD that allow us to extract only one meaningful feature. However, this is not true if the specific assumption (i.e., the within-class distance is zero) does not hold (for details see [27]). Furthermore, conventional FLD is not only applicable to binary classification, but also multi-class problems. In this paper, we propose a new linear discriminant DR algorithm, called RPFLD, which stands for recursive “concave–convex” Fisher Linear Discriminant (RPFLD). RPFLD formulates the FLD algorithm as a “concave–convex” problem, which incorporates the same idea behind FLD of finding the projection axes that separate the points from different classes. This new algorithm is essentially developed from FLD but has significant performance advantage over FLD. Specially, in the proposed algorithm, a direct multi-class DR framework is constructed, and then the transformation matrix, which embeds the data points into the RPFLD subspace is evaluated by the recursive procedure similar in [24], [28].

We now state several characteristics of our proposed algorithm as follows:

  • RPFLD is an interesting modification of the FLD algorithm, which casts FLD as a “concave–convex” problem. Its solution can be obtained by solving the related SVM-type problems iteratively, which is consistent with the point that eigenvalue-based method can be reformulated as the related SVM-type problem.

  • RPFLD completely eliminate the constraint on the total number of features available from FLD. FLD extracts at most C−1 features, where C is the number of classes. This limitation is well known, which has somehow been accepted as an inherent property of FLD. It has obviously limited FLD's application to the recognition problems with the small number of classes [28]. RPFLD bears the similar idea as suggested by [24], [28], which is to generate new sample set. It recursively derives the meaningful features and the total number of features resulting from RPFLD is not limited by the number of classes.

  • The RPFLD formulation is not only very interesting but is also difficult to solve. Therefore, how to solve the RPFLD optimization problem is also a main topic in this paper. In order to solve our optimization problem, the “Convex–Concave” procedure (CCP) is adopted, which is rarely used in the fields of DR.

  • Compared with some existing methods, our method has better performance. Furthermore, experiments also disclose that the best recognition rate of our proposed algorithm is still higher than that of FLD when the optimal reduced dimensionality is selected within [1, C−1].

  • The special formulation of RPFLD allows us to easily develop the sparse multi-class FLD (SFLD) in the future works. In the meantime, RPFLD is currently designed for supervised DR in this paper, but can naturally be extended to other area as manifold learning [10] in the further works. Although there are many efficient off-the-shelve eigensolvers, which can be used to optimize the FLD optimization problem, there are no obvious way to introduce sparsity in e.g. the matrix inverse [25]. Moghaddam et al. [29] proposed a generalized spectral bounds framework for sparse FLD. However, this sparse FLD algorithm can only be applied to two-class problems. Local FLD (LFLD) that is supervised discovers both geometrical and discriminant structure of the data manifold [5], [30], but it suffers from the similar shortcomings as those of conventional FLD, since it is also an eigenvalue based technique. Similarly to RPFLD, we can cast the LFLD as a “Convex–Concave” procedure for the special purposes. Detailed studies of them are left for further work.

The rest of this paper is organized as follows. In Section 2, we give a review of Fisher Linear Discriminant. Section 3 constructs first the modeling of the “concave–convex” approach to Fisher Linear Discriminant, and then shows that its solution is reduced to solve convex programming problems iteratively. Section 4 proposes a recursive discriminant algorithm, which can aid in the generation of multiple projections. In Section 5, we evaluate our algorithm on face database, handwritten dataset, and Terrain dataset. Section 6 gives two models of sparse multi-class Fisher Linear Discriminant and “Concave–Convex” Local Fisher Linear Discriminant, which will be evaluated by experiments in future works. In the last section, we draw some conclusions.

Section snippets

Fisher Linear Discriminant (FLD)

In this section, we review an existing dimension reduction method for supervised learning, i.e. Fisher linear discriminant (FLD). For convince of description, we make some definitions through this paper.

We consider the problem of representing all of the vectors in a set of N n-dimensional training pointsX={x1,x2,,xN}belonging to C different classes, each of which has Nl samples in the subset Dl,l=1,2,,C, where XRn×N denotes the matrix of all samples. We let zRd(1d<n) be a low-dimensional

“concave–convex” Fisher Linear Discriminant (RPFLD)

In this section, we first propose a “Concave–Convex” approach to the FLD, called “Concave–Convex” FLD (PFLD), which is similar to FLD in spirit.

Recursive PFLD (RPFLD) for multiple projection vectors

By the above procedure, we can seek for a single axis to enable the projected samples to be well separated. Here, we extend PFLD to find multiple projection directions, which is consistent to the idea of the original FLD.

As mentioned above, FLD may give undesired performance, especially when it is applied on the datasets with small classes. This is so because the constraint on the total number of the meaningful features available from FLD is limited to C−1. To eliminate the limitation, we

Experiments

In this section, in order to evaluate the performance of the proposed RPFLD we report results on the face image databases, handwritten digit database and Terrain image database. This system performance is compared with the Eigenface (PCA) method, the FLD method, the Orthogonal LDA (OLDA) [38], and the Recursive FLD (RFLD) method [28]. PCA was directly performed using the codes from http://www.zjucadcg.cn/dengcai. Taking into account the too much computational cost of OLDA, we use a mature fast

Conclusions

We have proposed a new algorithm for dimension reduction, called Recursive “Concave–Convex” Fisher Linear Discriminant (RPFLD). RPFLD brings together ideas of the theory of convex programming, “Concave–convex” programming, and Fisher Linear Discriminant. As shown in our experiments, RPFLD can have more effective performance than other classical methods. Its special optimization formation makes us be very easy to construct sparse multi-class FLD, and we can also incorporate its special ideas in

Acknowledgment

The authors are extremely thankful to Scientific Foundation of Jiangsu province (BK2009393), Research Foundation for the Doctoral Program of Higher Education of China (20093219120025), and National Science Foundations of China and (90820306) for support.

Q.L. Ye received the BS degree in Computer Science from Nanjing Institute of Technology, China and the MS degree in Computer Science and Technology at Nanjing Forestry University, China. He is now a Ph.D. student under Prof. C.X. Zhao. His research interests include machine learning, data mining, pattern recognition, and intelligent robot. In these areas, he has published over 10 technical papers.

References (44)

  • K. Liu et al.

    An efficient algorithm for Foley-Sammon optimal set of discriminant vectors by algebraic method

    International Journal of Pattern Recognition and Artificial Intelligence

    (1992)
  • D.L. Swets et al.

    Using discriminant eigenfeatures for image retrieval

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1996)
  • K. Etemad et al.

    Discriminant analysis for recognition of human face images

    Journal of the Optical Soceity of America A

    (1997)
  • D. Cai, X.F. He, Manifold Adaptive Experimental Design for Text Categorization, in: Proceedings of IEEE Transactions on...
  • C. Xiang et al.

    Face recognition using recursive fisher linear discriminant

    IEEE Transactions on Image Process

    (2006)
  • C. Liu et al.

    Robust coding schemes for indexing and retrieval from large face databases

    IEEE Transactions on Image Process

    (2000)
  • T. Hastie et al.

    Penalized discriminant analysis

    Annals of Statistics

    (1995)
  • T. Hastie et al.

    Flexible discriminant analysis by optimal scoring, J

    American Statistical Association

    (1994)
  • R. Huang, Q. Liu, H. Lu, S. Ma. Solving the small-sample-size problem of LDA, in: Proceedings of 16th International...
  • X. Jiang et al.

    Eigenfeature regularization and extraction in face recognition

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2008)
  • T. Zhang et al.

    Generalized discriminant analysis: a matrix exponential approach

    IEEE Transactions on Systems, Man, and Cybernetics—Part B

    (2010)
  • K. Torkkola. Linear discriminant analysis in document classification, in: Proceedings of IEEE ICDM Workshop Text...
  • Cited by (26)

    • Regularized fisher linear discriminant through two threshold variation strategies for imbalanced problems

      2018, Knowledge-Based Systems
      Citation Excerpt :

      The advantages of FLD are two folds: (1) The criterion to train the model is reasonable, which could not only learn relationships between intra-class samples and their centroid, but also acquire the distances among centroids of different classes [5]; (2) The frequently utilized optimization procedure for FLD is analytical, which could achieve relatively fast computational speeds and obtain the optimal solution without local minimums [6]. Further, regularizing the original FLD could help alleviate the model from suffering the small sample problems [7–11]. Even though FLD and its variations have shown their effectiveness and efficiency on common classification tasks [12–16], they might deteriorate more or less when dealing with the imbalanced datasets, in which the number of samples belonging to one class is much lower than those of the other classes [17].

    View all citing articles on Scopus

    Q.L. Ye received the BS degree in Computer Science from Nanjing Institute of Technology, China and the MS degree in Computer Science and Technology at Nanjing Forestry University, China. He is now a Ph.D. student under Prof. C.X. Zhao. His research interests include machine learning, data mining, pattern recognition, and intelligent robot. In these areas, he has published over 10 technical papers.

    C.X. Zhao received the Ph.D. degree in Electronic Engineering from Harbin Institute of Technology in 1998. Since 2000, as a full-time professor, she has been with the Computer Science and Technology department at the Nanjing University of Science and Technology, Nanjing, China. She is now a senior member of China computer Federation. Her current research interests include pattern recognition, image processing, artificial intelligence, and mobile robot.

    X.B. Chen received the MS degree in School of Computer Science and Telecommunication Engineering from Jiangsu University and then worked at Jiangsu University in September 2005 as an assistant lecturer. His current research interests include machine learning and pattern recognition.

    View full text