Elsevier

Neurocomputing

Volume 101, 4 February 2013, Pages 104-115
Neurocomputing

Locality constrained representation based classification with spatial pyramid patches

https://doi.org/10.1016/j.neucom.2012.08.007Get rights and content

Abstract

In this work, we propose a linear representation based face recognition (FR) method incorporating locality information from both spatial features and training samples. Instead of holistic face images, the proposed method is conducted on the spatial pyramid local patches, which are aggregated by a Bayesian based fusion method. The locality constraint on the representation coefficients leads to an approximately sparse representation, which effectively explores the discriminative nature of spatial local features. Different from the sparse representation based classification (SRC) exposing an 1-norm constraint on the coefficients, the proposed locality constrained representation based classification (LCRC) is formulated with a computationally efficient 2-norm. The proposed method is robust to two crucial problems in face recognition: occlusion and lack of training data. A simple locality based concentration index (LCI) is defined to measure the reliability of each local patch, by which not only the heavily corrupted patches but also the less discriminant ones are rejected. Due to the use of both local patches and the locality constraint, less training data are required by the proposed method. Based on the locality constrained representation, we present three algorithms which outperform the state-of-the-art on the AR and Extended Yale B datasets for both the occlusion and single sample per person (SSPP) problems.

Introduction

Linear representation based face recognition methods attract a lot of interests recently due to its efficacy and simplicity. These methods are based on the assumption that a high-dimensional probe face image lies on a low-dimensional subspace spanned by the training samples of the same subject [1]. The decision is made by minimizing the residuals of reconstructing the probe face by a linear combination in terms of training samples with a set of coefficients. In practice, however, these methods do not perform well enough when training samples of each class are not sufficient to model various potential facial variations, e.g., changes of expression, illumination, occlusion, etc. Recently sparse representation based classifier (SRC) [2] has obtained a breakthrough success on face recognition. To address the problem, it takes samples from not one but all subjects to formulate a overcomplete dictionary. Then a sparse representation is obtained by a 1-minimization problem. However, Shi et al. [3] argue that the sparsity assumption is not supported by the data and the 2 approach is more robust and efficient. Similarly, it is argued in [4] that it is the collaborative representation but not the 1-norm sparsity constraint that in fact boosts the face recognition performance. The proposed collaborative representation based classification (CRC) with regularized least square in [4] achieves comparative recognition results with SRC. Different from SRC, both these two methods have analytical solutions due to the use of 2-norm, which makes them much more efficient. One problem of these methods is that they treat all samples belonging to different subjects equally, and a too redundant dictionary makes these 2 methods [3], [4] less discriminant especially when using relatively less complex features, e.g., local features.

Local features are more robust than holistic ones for face recognition on noisy data. Various local feature descriptors such as histograms of Local Binary Patterns [5], Gabor wavelets [6] have been suggested to improve the robustness of FR systems. Another popular way to extract local features is the modular approach, which first partition a whole face image into several blocks and then features are extracted and processed independently based on these local regions. Using this technique, recognition accuracies are largely improved on data with occlusions [2], [7]. However, these methods fail to explore the locality information of the local features among the training samples, and there are no effective ways to aggregate the results for individual blocks.

As an extension of the bag-of-features model, spatial pyramid matching (SPM) [8] has made a remarkable success on image classification. SPM partitions an image into increasingly fine sub-regions where histograms of local features are computed. Inspired by SPM [8], in this paper we subdivide each image into local patches at different spatial pyramid levels. Then the proposed method is conducted on these patches, by which both the holistic (corresponding to the first level) and local features with increasingly fine resolutions can be taken into classification. A Bayesian based fusion method is then proposed to aggregate the intermediate results with respect to these patches. The Bayesian method is based on the assumption that patches within a face are independent to each other, for simplicity.

In this work, we explore the discriminative nature of locality constrained representation (LCR) of local patches for identifying faces. For local patches, the residual gap between different subjects obtained by the aforementioned 2 based methods is small. when face images suffer from severe distortion, the test image is possibly far from some training samples (even from the same class). The locality constraint encourages the coefficients with respect to nearby samples and simultaneously penalizes the coefficients corresponding to distant ones, which forces the representation discriminant (see examples in Fig. 1, Fig. 3). Unlike SRC computed by the 1-minimization, the proposed LCR based classification (LCRC) is formulated with a weighted ridge regression problem.

It is well known that the conventional 2-minimization usually result in dense solutions. However, we show that, with the locality constraint, the 2-norm can also lead to a sparse representation. In [9], the authors argue that locality is more essential than sparsity since sparsity dose not necessarily lead to locality but locality always incurs sparsity. Observing that, a classifier based on the sparsity of the coefficients (denoted as LCRC-Spr) is presented. The discriminant nature of the locality constraint is validated by the high accuracy of LCRC-Spr, which is very close to (sometimes even better) the corresponding residual based LCRC.

Taking advantage of the locality constraint, large representation coefficients are concentrated on a small number of entries, which are expected to mainly fall in the same class. Based on that we also represent a class based algorithm C-LCRC, which computes the representation coefficients from one class each time. With a smaller training data matrix, C-LCRC is more efficient.

The method described in this paper effectively addresses two crucial problems in face recognition:

Occlusion. The presence of contiguous occlusion is one of the most challenging problems in the context of robust face recognition. Human may easily recognize a familiar person wearing sunglasses or scarves; however, it is a hard job for a computer to automatically make a correct identification on an obstructed facial image. For linear representation based methods, outliers incurred by occlusion may dramatically bias the regression model and results in a bad representation. The spatial pyramid partition and Bayesian fusion method proposed in this paper can significantly ignore the influence caused by occlusion. In addition, a locality based concentration index (LCI) is defined to measure the reliability of local patches, by which not only non-face patches but also the less discriminant ones (generic to many subject) are rejected.

Lack of training samples. In some real face recognition applications, very few or even only single sample per person (SSPP) is available. The LR based methods (e.g., LRC, SRC) using holistic facial features become unstable in this situation since they do not have enough samples to represent the incoming test image, which make the residual large even for the correct subject. The fact much less inherent facial variations exist in a local patch together with the locality constraint make it possible that much less samples are necessary for our method to cover these variations. Moreover, the proposed Bayesian fusion method can effectively preserve most of the discriminant information. This is verified by our experiments in Section 5.

The remainder of the paper is organized as follows: In Section 2, a brief discussion of related linear representation based methods is given. The proposed method LCRC is described in Section 3 and another two related algorithms are developed in Section 4. In Section 5 the proposed three algorithms and several other methods are evaluated on the AR and Extended Yale B databases. Finally the conclusion and discussion are offered in Section 6.

Section snippets

Related works

In face recognition community, linear representation based methods have been widely used due to their effectiveness and simplicity. These LR methods are based on the assumption that any probe image lies on a low-dimensional subspace [10], [1], and the subspace is spanned by samples from the same subject [1]. The similar idea was previously used in nearest linear combinations (NLC) [11] and nearest feature line (NFL) [12]. Suppose we have a data matrix ARm×n containing the gallery face images

Locality constrained linear representation

Recall that in the SRC formulation (7), the same weight parameter λ is used for all regression coefficients. However, this constraint dose not necessarily hold in practice.2 Intuitively the coefficients corresponding to less relevant predictors should be penalized, whereas the most

Classification based on locality constrained representation

Besides the algorithm shown in Section 3.2, in this section we propose another two algorithms based on the locality constrained representation.

Experimental results

In this section, we evaluate the proposed three algorithms: LCRC, LCRC-Spr, C-LCRC on public benchmark databases for face recognition. We will first demonstrate the robustness of our methods to contiguous occlusion: both artificial and natural. And then we will show the efficacy of the proposed method with insufficient training data or SSPP. Several state-of-the-art methods are also performed for comparison. As well as the methods discussed in the former section, we will also compare our

Conclusions and future work

In this work we propose a new face recognition method incorporating locality on both representation samples and spatial features. The locality constraint enforces the representation sparse, which effectively concentrates the large representation coefficients on a small number of training samples, while other ones are nearly zeros. The spatial pyramid local patches instead of holistic features are used to significantly boost the classification performances. Due to both, the proposed method is

Fumin Shen received his bachelor degree in Applied Mathematics from Shandong University, China. Currently he is a Ph.D. student in School of Computer Science, Nanjing University of Science and Technology, China. His major research interests include computer vision and machine learning, including face recognition, image analysis, hashing methods, and robust statistics with its applications in computer vision.

References (41)

  • I. Naseem et al.

    Linear regression for face recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2010)
  • S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene...
  • K. Yu, T. Zhang, Y. Gong, Nonlinear learning using local coordinate coding, in: Advances in Neural Information...
  • P.N. Belhumeur et al.

    Eigenfaces vs. fisherfacesrecognition using class specific linear projection

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1997)
  • S. Li, Face recognition based on nearest linear combinations, in: Proceedings of the IEEE Computer Society Conference...
  • S. Li et al.

    Face recognition using the nearest feature line method

    IEEE Trans. Neural Networks

    (1999)
  • J. Ho, M.-H. Yang, J. Lim, K.-C. Lee, D. Kriegman, Clustering appearances of objects under varying illumination...
  • K.-C. Lee et al.

    Acquiring linear subspaces for face recognition under variable lighting

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • A. Yang, S. Sastry, A. Ganesh, Y. Ma, Fast l1-minimization algorithms and an application in robust face recognition: a...
  • M. Turk et al.

    Eigenfaces for recognition

    J. Cognitive Neuroscience

    (1991)
  • Cited by (18)

    • Large-scale image retrieval with supervised sparse hashing

      2017, Neurocomputing
      Citation Excerpt :

      We intend to apply our method to other high-dimensional vision tasks (e.g., face recognition [43,44], video search and multimodal retrieval) in the future work.

    • Face recognition using adaptive local ternary patterns method

      2016, Neurocomputing
      Citation Excerpt :

      In recent years, the new feature representation learning methods, such as deep learning, give encouragement results [32,33]. Robust regression techniques has been successfully exploited to face recognition with disguises, occlusions and small-size training data [34–36]. CS-LBP is based on LBP.

    • Fusion of multiple channel features for person re-identification

      2016, Neurocomputing
      Citation Excerpt :

      At present, the state-of-the-art approaches for person re-identification are mainly divided into two groups: (1) the appearance-based approach designing of distinctive and stable descriptors to represent the person's appearance; (2) the metric learning approach obtaining a suitable metric method which minimizes the distance of the same person and maximizes the distance of different persons. Most existing appearance-based approaches concern low-level features such as color (color histogram, Dominant color, color space, etc. [3–5,45]), texture (local binary pattern (LBP), Gabor, Co-occurrence matrix [6-8,35,40], etc. and shape [9–11,44,47]. These features are always combined to improve the recognition rate.

    • Binary code learning via optimal class representations

      2016, Neurocomputing
      Citation Excerpt :

      The proposed binary code learning method is potentially applied in other vision tasks (e.g., image classification [33,34,46], video search [45,44,35,36] and multimodal retrieval [38,39]), which will be investigated in our future work.

    View all citing articles on Scopus

    Fumin Shen received his bachelor degree in Applied Mathematics from Shandong University, China. Currently he is a Ph.D. student in School of Computer Science, Nanjing University of Science and Technology, China. His major research interests include computer vision and machine learning, including face recognition, image analysis, hashing methods, and robust statistics with its applications in computer vision.

    Zhenmin Tang received his Ph.D. degree from Nanjing University of Science and Technology, Nanjing, China. He now is a professor and also the head of School of Computer Science, Nanjing University of Science and Technology. His major research areas include intelligent system, pattern recognition, image processing, Embedded system. He has published over 80 papers. He is also the leader of several key programs of National Nature Science Foundation of China.

    Jingsong Xu is a Ph.D. student at Pattern Recognition and Intelligent System, Nanjing University of Science and Technology. He received the B.Sc. degree from the same university in 2007. Currently, he is visiting University of Technology, Sydney. His research interests include computer vision and machine learning.

    1

    His contribution was made when visiting The University of Adelaide.

    View full text