L0-norm sparse representation based on modified genetic algorithm for face recognition

https://doi.org/10.1016/j.jvcir.2015.01.001Get rights and content

Highlights

  • GASRC exploits the L0-norm optimization to implement a new SRC based method.

  • GASRC is suitable for dealing with high-dimensional and small-scale training set.

  • GASRC uses a modified GA to determine the representation set for a test sample.

  • GASRC achieves better recognition result than many state-of-the-art methods.

Abstract

The typical sparse representation for classification (SRC) exploits the training samples to represent the test samples, and classifies the test samples based on the representation results. SRC is essentially an L0-norm minimization problem which can theoretically yield the sparsest representation and lead to the promising classification performance. We know that it is difficult to directly resolve L0-norm minimization problem by applying usual optimization method. To effectively address this problem, we propose the L0-norm based SRC by exploiting a modified genetic algorithm (GA), termed GASRC, in this paper. The basic idea of GASRC is that it modifies the traditional genetic algorithm and then uses the modified GA (MGA) to select a part of the training samples to represent a test sample. Compared with the conventional SRC based on L1-norm optimization, GASRC can achieve better classification performance. Experiments on several popular real-world databases show the good classification effectiveness of our approach.

Introduction

Sparse representation based classification (SRC) [1], [2] has attracted much attention in recent years. The basic idea of the typical SRC is to represent a test sample as a sparse linear combination of the training samples, and then classify the test samples based on the representation results. SRC assigns the test sample to the class which leads to the minimum representation error [3], [4]. It has been widely used for many applications such as face recognition. In order to perform undersampled face recognition, in which there are very few, or even a single, training face images per subject, Deng et al. proposed an Extended SRC (ESRC) by exploiting intra-class variant dictionary [5]. Wagner et al. implemented a face recognition system based on the sparse representation to simultaneously deal with variations in illumination, image misalignment, and partial occlusion in the test image [6]. This system can work well under a variety of realistic conditions. Cheng proposed a directed L1-graph for class analysis and classification [7]. In order to capture the nonlinear information within the data, kernel sparse representation algorithms for image classification and face recognition are proposed [8], [9], [10]. All the methods mentioned above are based on L1-norm. Nevertheless, a number of good representation methods that are based on L2-norm [3], [11] were proposed in recent years. Table 1 concisely describes the typical representation methods based on L1 or L2 norm and our proposed method. In this table, we give the objective function and constrains of the problem in each method, and the features that are used in these methods.

Essentially, SRC is an L0-norm (the number of nonzero entries in a vector) optimization problem which can theoretically yield the sparsest representation and lead to the promising classification performance [2], [12]. We know that it is difficult to directly resolve it by applying usual optimization method since the L0-norm minimization is an NP-hard problem. In pattern recognition and machine learning community, researchers often exploit L1-norm minimization to approximate the L0-norm minimization under the condition that the solution is sufficiently sparse [20]. This condition, however, does not always hold, particularly when the data dimensionality is very high and the data scale is relatively small. For example, in face recognition, the face image is usually very high-dimensional (the dimensionality of a small face image of size 64 × 64 is 4096) and the number of the training samples per subject is generally not larger than some value, say 20, even a single, in some applications, which is very small compared with the dimensionality of the face images. In this case, applying L1-norm minimization might not achieve sufficiently sparse solution. As a result, it is not very suitable for using L1-norm minimization to approximate the L0-norm minimization.

In theory, the L0-norm minimization can be effectively solved by the stochastic optimization methods such as the genetic algorithm (GA) which has been successfully applied to feature selection and other applications [21], [22]. We know that the conventional GA method does not exploit the prior knowledge, e.g., the distance information of the samples. Its effectiveness and efficiency can be further improved if we employ an appropriate distance measure to evaluate the similarities between the samples in the classification. To this end, we propose the L0-norm based SRC by exploiting a modified genetic algorithm (MGA) to select the most representative training samples which lead to the minimal representation error for a test sample in this paper. We refer to the proposed approach as GASRC. Note that the typical SRC assumes that the samples from a single class lie on a subspace [2]. This implies that a sample can be represented by the other samples from the class to which this sample belongs. On the other hand, a sample and its neighbors generally belong to the same class, and these neighbors have a close correlation with this sample. They play more important role in representing the sample than other samples from the same class. Accordingly, it is natural that GASRC is based on the following prior knowledge. When representing a test sample, the neighbors of this test sample are more representative than those that are not its neighbors [11], [23]. In other words, using the neighbors of a test sample to represent it can yield the less representation error, compared with using those samples that are not neighbors of this test sample.

In this work, we pay attention on the classification performance of GASRC. It aims to select the most representative training samples, termed the representation set which can yield the sparsest representation and lead to promising classification performance, to represent the test sample by applying MGA approach. As discussed above, the representation set of a test sample tends to contain its neighbors or the training samples that are more similar to the test sample. GASRC divides the representation set of a test sample into two parts. The first part contains some appropriate nearest neighbors of this test sample. The second part consists of the other representative training samples obtained by MGA. When selecting the second part of the representation set for a test sample, MGA needs to combine the determined neighbors of the test sample. If a number of training samples including the neighbors of the test sample yield the minimal representation error, MGA determines them as the representation set of the test sample. Finally, GASRC uses the determined representation set to represent the test sample and classify it.

Our work has the following contributions.

First, GASRC is suitable to be directly applied in the learning settings in which the data are high-dimensional and the number of the training data is small. It is worth noting that if the traditional SRC based on L1-norm optimization deals with very high-dimensional data when the scale of the training set is relatively small, it often needs the dimensionality reduction such as PCA [14] to reduce the data dimensionality. Otherwise, the traditional SRC tends to yield the non-sparse solution, which might increase its classification error rate.

Second, GA is applied in the sparse representation based classification for the first time in pattern recognition community, to our best knowledge. It can be used to effectively select the representation set for a test sample. MGA directly exploits the nearest neighbors of the test sample as a part of the representation set. This scheme can effectively reduce the search state space and enhance the selection efficiency. Moreover, this scheme guarantees to obtain the representation set that contains the neighbors of the test sample or the samples that have close correlation with the test sample.

Third, besides the traditional SRC approach, our proposed algorithm can be generally applied to the other representation based approaches, as long as these approaches essentially exploit the L0-norm optimization. Compared with the conventional SRC based on L1-norm, the proposed GASRC algorithm can achieve better classification performance, particularly when the data are high-dimensional and the number of the training samples is small. Experiments on several popular real-world databases justify the good classification effectiveness of our approach.

The rest of this paper is organized as follows. Section 2 gives a review of the traditional SRC based on L0-norm and L1-norm optimization. Section 3 presents our proposed approach, i.e., GASRC. Section 4 reports the experiment results on the popular high-dimensional image data sets. And we offer the conclusions in Section 5.

Section snippets

SRC

Assume that there are N training samples X=[x1,x2,,xN]Rm×N where xiRm (i=1,2,..,N) denotes the ith training sample. Given a test sample yRm, the traditional SRC essentially seeks the sparsest solution to y=Xβ, where β=[β1,β2,,βN]T and βi is the representation coefficient associated with the ith training sample, by solving the following optimization problem [2]βˆ0=argminβ0subject toy=Xβ,where β0 denotes the L0-norm of the representation coefficient vector β, which is the number of

GASRC

In this section, we will introduce our proposed GASRC approach. We first modify the traditional GA approach, and propose a modified GA (MGA) approach to select an appropriate subset of the training set, i.e., representation set. Then, we use the representation set to represent a test sample and classify it. Next, we introduce a number of the conceptions in GA terminology.

The conventional GA approach starts with a set of k randomly generated individuals, referred to as the population. Each

Experiments

In this section, we have conducted four experiments to demonstrate the classification effectiveness of our proposed GASRC approach. The four experiments are respectively conducted on three popular face data sets, the Extended Yale B, CMU PIE and AR data sets, as well as a handwritten digit data set, the MNIST data set. For each data set, we randomly choose a part of the samples for training, and the rest samples are used for testing, as conducted in many works. This scheme has the merit that

Conclusions

In this paper, we propose a GA based algorithm, GASRC, to solve the L0-norm optimization problem that seeks the most representative training samples to represent the test samples. This scheme can theoretically yield the promising classification performance. Compared with the conventional SRC algorithm using the L1-norm optimization, GASRC can effectively improve the classification performance, particularly when the data dimensionality is very high and the number of the training samples is

Acknowledgments

This article is partly supported by National Natural Science Foundation of China under Grants Nos. 61472138, 61263032, 71262011, 11061014, 61300032 and 61202276, Science and Technology Foundation of Jiangxi Educational Committee of China (GJJ14375, KJLD12067), as well as China Postdoctoral Science Foundation funded project (2012M520428).

References (31)

  • A. Wagner et al.

    Toward a practical face recognition system: robust alignment and illumination by sparse representation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2012)
  • B. Cheng et al.

    Learning with l1-graph for image analysis

    IEEE Trans. Image Process.

    (2010)
  • S. Gao et al.

    Sparse representation with kernels

    IEEE Trans. Image Process.

    (2013)
  • L. Zhang et al.

    Kernel sparse representation-based classifier

    IEEE Trans. Signal Process.

    (2012)
  • Y. Xu et al.

    A two-phase test sample sparse representation method for use with face recognition

    IEEE Trans. Circuits Syst. Video Technol.

    (2011)
  • Cited by (24)

    • Virtual dictionary based kernel sparse representation for face recognition

      2018, Pattern Recognition
      Citation Excerpt :

      In recent years, sparse representation for classification (SRC) [1,2] has attracted much attention in machine learning and pattern recognition community, particularly for face recognition [3–5].

    • Individualized learning for improving kernel Fisher discriminant analysis

      2016, Pattern Recognition
      Citation Excerpt :

      The local learning algorithms can capture the local structure of the data. The second type is based on the representation approaches [17–22]. Also, these approaches learn and classify the test samples one by one.

    • Adjusting samples for obtaining better l<inf>2</inf>-norm minimization based sparse representation

      2016, Journal of Visual Communication and Image Representation
      Citation Excerpt :

      Sparse representation (SR) has a good reputation in the fields of pattern recognition, machine learning, and computer vision [1–6].

    • Multiple representations and sparse representation for image classification

      2015, Pattern Recognition Letters
      Citation Excerpt :

      The recently proposed modular neural networks are also good tools to represent and recognize faces [17]. Recently proposed sparse representation classification (SRC) algorithms have obtained satisfactory performance in image classification and image super-resolution etc. [18–24]. For comprehensive introductions to sparse representation, please refer to [25–28].

    • Sparse Representation Based on Modified Genetic Algorithm for Classification

      2022, Iranian Journal of Science and Technology - Transactions of Electrical Engineering
    View all citing articles on Scopus

    This paper has been recommended for acceptance by Yehoshua Zeevi.

    View full text