Elsevier

Neurocomputing

Volume 79, 1 March 2012, Pages 125-131
Neurocomputing

Supervised sparse representation method with a heuristic strategy and face recognition experiments

https://doi.org/10.1016/j.neucom.2011.10.013Get rights and content

Abstract

In this paper we propose a supervised sparse representation method for face recognition. We assume that the test sample could be approximately represented by a sparse linear combination of all the training samples, where the term “sparse” means that in the linear combination most training samples have zero coefficients. We exploit a heuristic strategy to achieve this goal. First, we determine a linear combination of all the training samples that best represents the test sample and delete the training sample whose coefficient has the minimum absolute value. Then a similar procedure is carried out for the remaining training samples and this procedure is repeatedly carried out till the predefined termination condition is satisfied. The finally remaining training samples are used to produce a best representation of the test sample and to classify it. The face recognition experiments show that the proposed method can achieve promising classification accuracy.

Introduction

In the past two decades much attention has been paid to face recognition and various face recognition methods have been proposed [1], [2], [3], [4], [5], [6], [7]. For example, the well-known dimensionality reduction methods first generate a subspace from the original samples and then classify the test sample in this subspace. State-of-the-art dimensionality reduction methods such as principal component analysis (PCA) [4], [5], [6], [7], [8] and the discriminant transform method (DTM) [8], [9], [10], [11], [12], [13], [14] have shown competitive performance when used for face recognition. We also note that recently proposed max–min distance based [15], class-mean based [16], [17], patch alignment based [18], [19] dimensionality reduction methods and the manifold learning methods [20], [21], [22] can achieve very promising performance. These dimensionality reduction methods have different motivations. For example, when transforming samples into a lower-dimensional space, PCA and DTM aim to preserve the most representative information of the original samples and the discriminative information of different classes, respectively. The max–min distance based dimensionality reduction method [15] aims to make all class-pairs separate as much as possible by maximizing the minimum pairwise distance of all the classes in the obtained low dimensional subspace. The class-mean-based dimensionality reduction method attempts to maximize the geometric mean of the Kullback–Leibler divergences between different classes [17].

Recently, Wright et al. proposed a novel face recognition method, sparse representation method (SRM) [23], [24], [25]. SRM exploits a sparse linear combination of all the training samples to represent and classify the face image. Hereafter the term “sparse” means that a number of coefficients are zero or very close to zero, and we refer to non-zero coefficients as “sparse” coefficients. This method evaluates the contribution, to representing the test sample, of each class and exploits the evaluation result to classify the test sample [23], [24], [25]. As it is not known how many zero coefficients there are and which coefficients are zero, we refer to the method proposed in [23], [24], [25] as unsupervised sparse representation method. These methods usually use an iterative solution scheme to achieve the sparse linear combination. This solution scheme not only requires that the deviation between the test sample and the sparse linear combination be as small as possible but also requires that the l1-norm of the solution vector should be small enough.

The idea of sparsity has also been introduced in dimensionality reduction for face recognition. For example, non-negativeness [19] and elastic net [20] have been adopted to enforce the sparsity of either the projection matrix or the representation coefficients. In this paper, we focus on the SRM framework for face recognition, and propose a heuristic strategy to achieve a sparse representation of the test sample for accurate face recognition.

We note that latest studies have new findings on sparse-representation-based face recognition. For example, as shown in Section 4, some literatures claimed that the “sparseness” did not have the dominant effect in achieving accurate face recognition. Also, some researchers pointed out that the l1-norm was not necessary and the l2-norm could obtain a similar performance.

In this paper we propose a supervised sparse representation method. This method is based on a heuristic strategy and represents the test sample as a sparse linear combination of all the training samples. Hereafter ‘supervised’ means that the proposed method produces the sparseness in a supervised way where the number of the zero coefficients is known or predefined and it is known that to which training samples the zero coefficients are assigned. In our method the sparse coefficients are true zero, whereas in the SRM method [23], [24] the coefficients are obtained using an iterative solution scheme and most of the ‘sparse’ coefficients are probably close to zero.

The idea of supervised sparse representation is inspired by the original SRM method [23], [24], [25] and the local learning methods [26], [27], [28], [29], [30], [31], [32], [33], [34], [35]. Previous studies have shown that the local learning method is able to obtain a higher accuracy than the global learning method [36], [37]. Since the proposed method uses only a subset of all the training samples to represent the test sample, it can also be viewed as a method that uses “local” training samples to represent and classify the test sample. The proposed method first uses an iterative and heuristic strategy to determine “local” training samples for the test sample and then exploits only these “local” training samples to classify the test sample. Actually, the method also converts the original classification problem into a simpler classification problem that contains fewer classes. The explanation is as follows: it has been empirically proven and widely admitted that if the test sample is very far from the training samples of a class, it will be very reasonable not to classify the test sample into this class. Also, a distance-based classifier always classifies the test sample into a class that is near to it. As shown later, our method inclines to remove the training sample that is far from the test sample from the linear combination to represent the test sample. As a result, the remaining training samples are from only a few classes rather than all the classes and our method can make a decision in a smaller decision space. When our method uses the finally remaining training samples to classify the test sample, a higher accuracy might be obtained. As shown in Section 5, the experimental results also verify the feasibility and the effectiveness of the proposed method. Another contribution of our work is that it provides a way to integrate an idea similar to “sparse representation” with the l2 norm for face recognition. Our paper also illustrates the following two points: first, the sparse representation is useful for accurate face recognition. Second, the l1 norm is not the sole means to obtain sparse representation. The l2 norm can be integrated with a special scheme such as the one presented in the paper to obtain sparse representation.

The remainder of the paper is organized as follows: in Section 2, we describe the global method that exploits all the training samples to represent and classify the test sample. In Section 3, we present the supervised sparse representation method. Section 4 shows the characteristics, rationale and advantages of the proposed method. Section 5 presents the experimental results. Finally, in Section 6, we offer our conclusion.

Section snippets

Global method

In this section we present the global method that exploits all the training samples to represent and classify the test sample.

Suppose that there are n training samples, respectively, denoted by n column vectors A1……An. The global method assumes that test sample Y can be approximately represented by a linear combination of all the training samples. In other words, it considers that the following equation is approximately satisfied:Y=i=1nβiAi

We can rewrite Eq. (1) asY=Aβwhere β=(β1...βn)T, A=(A1,

Description of our method

The main steps of our method are as follows:

  • Step 1. First exploit a linear combination of all the samples from the set of training samples to represent the test sample and use Eq. (4) to solve the corresponding linear system for obtaining the coefficients. Repeatedly run the following procedure: remove the training sample with the smallest absolute value from the set of training samples and then use a linear combination of the remaining training samples to represent the test sample.

  • Step 2.

Analysis of our method

In this section we describe the characteristics, rationale, and potential advantages of the proposed method. The basic characteristic of this method is to use a linear combination of a subset of training samples to best represent the test sample and to exploit the representation result to perform classification. Here ‘best’ means that the error between the obtained linear combination and test sample is almost the smallest.

As shown in Section 3, the proposed method removes the training sample

Experimental results

We used the ORL [45], FERET [46], [47] and AR [48] face databases to test our method. The face images of these three databases were obtained under the condition of varying pose, facial expression, or lighting. Occluded face images are also included in the AR face database. For the FERET face database, we used only a subset of 1400 images from 200 individuals [37]. Fig. 1 shows the 14 face images of two subjects in this subset of the FERET database. From the AR face database, we used 3120 face

Conclusion

The proposed method uses a heuristic iterative algorithm to obtain supervised sparse representation of the test sample and then exploits the representation result to perform classification. As the proposed method is a supervised sparse method, we not only know how many zero coefficients there are but also know which coefficients are zero. Our paper illustrates the following two points: first, the sparse representation is useful for accurate face recognition. Second, the l1 norm is not the sole

Acknowledgment

This article was partly supported by Key Laboratory of Network Oriented Intelligent Computation, Program for New Century Excellent Talents in University (No. NCET-08-0156), The Fundamental Research Funds for the Central Universities (Grant no.HIT.NSRIF. 2009130) and the National Nature Science Foundations of China under Grant nos. 61071179, 60902099, 61073125, 61020106004, 61173086, and 60873140.

Yong Xu was born in Sichuan, China, in 1972. He received his B.S. degree, M.S. degree in 1994 and 1997, respectively. He received his Ph.D. degree in Pattern recognition and Intelligence System at NUST(China) in 2005. Now he works at Shenzhen graduate school, Harbin Institute of Technology. His current interests include feature extraction, biometrics, face recognition, machine learning, image processing and video analysis.

References (49)

  • W. Zhao et al.

    Face recognition: a literature survey

    ACM Comput. Surv.

    (2003)
  • A.M. Martinez et al.

    PCA versus LDA

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2001)
  • M. Turk et al.

    Eigenfaces for recognition

    J. Cognitive Neurosci.

    (1991)
  • J. Yang et al.

    Two-dimensional PCA: a new approach to appearance-based face representation and recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2004)
  • G.-F. Lu et al.

    Face recognition using discriminant locality preserving projections based on maximum margin criterion

    Pattern Recognition

    (2010)
  • J. Ye et al.

    Feature reduction via generalized uncorrelated linear discriminant analysis

    IEEE Trans. Knowl. Data Eng.

    (2006)
  • D. Zhang et al.

    Advanced pattern recognition technologies with applications to biometrics

    Med. Inf. Sci. Ref.

    (2009)
  • W. Yang et al.

    Face recognition using a multi-manifold discriminant analysis method

    ICPR

    (2010)
  • W. Bian, D. Tao, Max–min distance analysis by using sequential SDP relaxation for dimension reduction, IEEE...
  • H. Yan et al.

    Discriminant feature extraction based on center distance

    ICIP

    (2009)
  • D. Tao et al.

    Geometric mean for subspace selection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • T. Zhang et al.

    Patch alignment for dimensionality reduction

    IEEE Trans. Knowl. Data Eng.

    (2009)
  • N. Guan et al.

    Non-negative patch alignment framework

    IEEE Trans. Neural Networks

    (2011)
  • T. Zhou et al.

    Manifold elastic net: a unified framework for sparse dimension reduction

    Data Min Knowl. Discovery

    (2010)
  • Cited by (0)

    Yong Xu was born in Sichuan, China, in 1972. He received his B.S. degree, M.S. degree in 1994 and 1997, respectively. He received his Ph.D. degree in Pattern recognition and Intelligence System at NUST(China) in 2005. Now he works at Shenzhen graduate school, Harbin Institute of Technology. His current interests include feature extraction, biometrics, face recognition, machine learning, image processing and video analysis.

    Wangmeng Zuo is an associate professor in School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China. He received his Ph.D. degree in computer application technology from Harbin Institute of Technology in 2007. From July to December 2004, from November 2005 to August 2006, and from July 2007 to February 2008, he was a research assistant in the Department of Computing, Hong Kong Polytechnic University. From August 2009 to February 2010, he was a visiting professor in Microsoft Research Asia. His research interests include sparse representation, biometrics, pattern recognition, and image processing.

    Zizhu Fan received the M.S. degree in computer science from Hefei University of Technology, Hefei, China , in 2003. Now he is working for his Ph.D. degree in computer science & technology at Shenzhen graduate school, Harbin Institute of Technology (HIT). His current interests include pattern recognition and machine learning. He has published more than 10 journal papers.

    View full text