Elsevier

Pattern Recognition

Volume 45, Issue 1, January 2012, Pages 80-91
Pattern Recognition

Facial expression recognition using radial encoding of local Gabor features and classifier synthesis

https://doi.org/10.1016/j.patcog.2011.05.006Get rights and content

Abstract

Primarily motivated by some characteristics of the human visual cortex (HVC), we propose a new facial expression recognition scheme, involving a statistical synthesis of hierarchical classifiers. In this scheme, the input images of the database are first subjected to local, multi-scale Gabor-filter operations, and then the resulting Gabor decompositions are encoded using radial grids, imitating the topographical map-structure of the HVC. The codes are fed to local classifiers to produce global features, representing facial expressions. Experimental results show that such a hybrid combination of the HVC structure with a hierarchical classifier significantly improves expression recognition accuracy when applied to wide-ranging databases in comparison with the results in the literature. Furthermore, the proposed system is not only robust to corrupted data and missing information, but can also be generalized to cross-database expression recognition.

Highlights

► A radial encoding strategy for efficiently downsampling Gabor filter outputs. ► A new classifier combination method by extracting information from local classifiers. ► Extraction of features to represent facial expression efficiently. ► Cross-database test for person-independent facial expression recognition.

Introduction

It is acknowledged that automatic recognition of facial expressions from face (color and gray-level) images is complex in view of significant variations in the physiognomy of faces with respect to head pose, environment illumination and person-identity [1]. Even the normal color (and gray-level) face images exhibit considerable variations, and contain redundant information (among pixels) in intensity for describing facial expressions. A direct use of a (color or gray-level) face image has not been successful in expression recognition in spite of normalization techniques to achieve illumination, scale and pose invariance. The implication is that appropriate features are needed for facial expression classification, as, in fact, evidenced by the observed human ability to recognize expressions without a reference to facial identity [2], [3]. This provides the motivation for the present paper to focus on specific feature-maps that are extracted from gray-level face images to represent facial expressions, thereby reducing the complexity and dimensionality of the problem. We consider only static images of expressive human faces (as available in some standard databases), and not their video sequences.

The literature on facial expression recognition in static images is somewhat sparse in comparison with that of face recognition. Most of the existing references contain algorithms to extract features from an image and to reduce the dimensionality of the problem. Generally, they can be classified as holistic or local. Among the former are subspace methods such as “Eigenfaces” and “Fisherfaces” [4], [5], which consider the complete face as an input, and extract features corresponding to expressions by constructing a subspace using principal component analysis (PCA) [6], independent component analysis (ICA) [7] or by Fisher's linear discriminant analysis (FLD) [8], [9]. The local approach divides a face image into certain small blocks, and applies some feature extraction algorithms, such as local binary pattern (LBP) analysis [10], [11], [12], [13] and scale invariant feature transformation (SIFT) [14] in order to get a local texture description. The face recognition performance is significantly improved when the local features are involved, compared to that of only using global features, as reported in [15], [16]. Hence it is believed that local methods can produce promising results for both facial identity and expression recognition.

It is found that facial expression is usually correlated with identity [17], and variations in identity (which are regarded as extrapersonal) dominate over those in expression (which are regarded as intrapersonal). The unresolved, and hence challenging, problem is the automatic expression recognition of a novel (i.e., not in the database) person. Therefore, most of the existing algorithms, which seem to perform well on person-dependent expression recognition, are substantially less efficient on person-independent expression recognition.1 This is the motivation for our current focus on the problem of person-independent expression recognition from static images.

Since the human ability to perceive expressions is known to be highly sophisticated, though the underlying biological mechanism has not yet been understood, it seems to be expedient to attempt modeling the results of empirical studies on the visual cortex [18]. In fact, many biologically plausible models of human object recognition have been proposed [19], [20], [21], among which the following simplified three-stage hierarchical structure of the visual cortex seems to be a dominant theme:

  • 1.

    Basic units, such as simple cells in the primary cortex, respond to stimuli with certain orientations in their receptive fields, thereby extracting low-level local features of the stimuli.

  • 2.

    Intermediate units such as cells in the extrastriate cortex, integrate the low-level features extracted in the previous stage, and obtain more specific global features.

  • 3.

    Decision-making units recognize objects based on the global features.

In order to model the spatial orientation properties of simple cells in the primary cortex, a set of two-dimensional Gabor filters have been proposed [22]. These filters, when convolved with an image, have the useful property of robustness against slight object rotation and distortion and variations in illumination. However, it is found that the resulting Gabor outputs are highly correlated with redundant information at neighboring pixels. For facial expression recognition, Gabor jets [23] were introduced to statistically post-process the Gabor outputs to arrive at salient features. Lyons and his colleagues [8], [24] invoked these Gabor-jets, and selected fiducial points, while, in [25], the authors uniformly downsampled the high-dimensional Gabor features. It is known that the choice of fiducial points and the downsampling factor influences the final recognition performance [10]. Therefore, an efficient encoding strategy for Gabor outputs is needed. Motivated by the elegant encoding property of human retina with invariance to limited spatial transformations (shift, scaling and rotation), Ganesh and Venkatesh [26] proposed a radial encoding strategy for (both binary and non-binary) images. Interestingly, there seems to exist a more general encoding strategy in the primary visual cortex where the neurons are spatially organized such that retinal topography, called retinotopic mapping (RM), is approximately preserved. That is, for a visual image formed on the retina, the neighboring regions are represented by corresponding neighboring regions of the visual cortex. However, since the small retinal fovea is mapped onto a much larger area in the primary cortex than the retinal periphery [27], the mapping in the cortical areas is nonlinear.

In this paper, based on the retinotopic mapping, we extend the radial encoding strategy for Gabor outputs, and obtain salient local features representing facial expressions. And then we propose a facial expression recognition system, which is a combination of some characteristics of the human visual cortex (HVC) and a statistical synthesis of hierarchical classifiers. In this scheme, local features are first obtained by encoding the outputs of Gabor filters on local patches. Then principal component analysis (PCA) and Fisher linear discriminant (FLD) analysis are applied to the encoded features, the outputs of which are fed to local classifiers. The outputs from the latter, in turn, are then concatenated to form global, intermediate-level features which are subjected to the next level of PCA and FLD projections in order to extract the salient information, leading to classification by the global classifier.

Briefly, the main contributions of the paper are:

  • 1.

    A radial grid encoding strategy for Gabor-filter outputs, leading to high recognition accuracy, and outperforming the techniques that invoke Gabor-jets, based on fiducial points and a downsampling method.

  • 2.

    Design of a local classifier combination which employs the FLD analysis to extract discriminating information from the outputs of the local classifiers. This approach is shown to be better than the traditional voting method.

  • 3.

    Extraction of features to represent facial expressions efficiently, in such a way that a facial expression image from a novel person can also be recognized.

The rest of the paper is organized as follows. Section 2 introduces the general framework for the proposed system; Section 3 shows experimental results while Section 4 presents a detailed analysis and discussion. And, finally, Section 5 concludes the paper.

Section snippets

General framework for the proposed facial expression recognition system

In what follows, it is assumed that images from a face-image database are being analyzed for expression recognition. Relevant details of the databases are given later in Section 3. The proposed facial expression recognition system comprises four major steps as shown in Fig. 1: (A) preprocessing and partitioning; (B) local feature extraction and representation; (C) classifier synthesis (to integrate local features); and (D) (final) decision-making. Below, we describe each of these steps.

Experimental results

Our system is implemented in MATLAB, and all simulations are conducted using a 2.66 GHz Intel Core Quad processor with 8 GB memory. We use the following two facial expression databases for experiments: (1) Japanese Female Facial Expression (JAFFE) database [41]; (2) Cohn–Kanade (CK) database [42].

The JAFFE database contains 213 images of seven facial expressions of 10 Japanese female models, including six basic facial expressions (happy, sad, angry, surprised, disgusted and scared) and one

Discussions

In this section, we first compare the proposed expression classification scheme with that of others as applied to the JAFFE and CK databases, using the person-independent strategy for cross-validation. Note that the results of different algorithms may not be directly comparable because of differences in experimental setups, the number of subjects and of expressions used, and so on, but they can still indicate the discriminative performance of every approach. Furthermore, it should be added here

Conclusion

We have proposed a hybrid facial expression recognition framework in the form of a novel fusion of statistical techniques and the known model of a human visual system. An important component of this framework is the biologically inspired radial grid encoding strategy which is shown to effectively downsample the outputs of a set of local Gabor filters as applied to local patches of input images. Local classifiers are then employed to make the local decisions, which are integrated to form

Acknowledgment

The research reported here was supported by NUS Academic Research Fund R-263-000-362-112. The authors gratefully acknowledge the anonymous reviewers and editors for their helpful comments which have led to the present, substantially improved paper.

W.F. Gu received the B.E. degree in Information Security from University of Science and Technology of China in 2006. Currently, he is pursuing the Ph.D. degree at National University of Singapore under the supervision of Dr. Cheng Xiang and Dr. Hai Lin. His research interests include pattern recognition and computer vision.

References (58)

  • M. Lyons et al.

    Automatic classification of single facial images

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1999)
  • C. Xiang et al.

    Face recognition using recursive fisher linear discriminant

    IEEE Transactions on Image Processing

    (2006)
  • X.Y. Feng et al.

    A coarse-to-fine classification scheme for facial expression recognition

  • L.H. He et al.

    An enhanced LBP feature based on facial expression recognition

  • C. Shan et al.

    Robust facial expression recognition using local binary patterns

  • G.Y. Zhao et al.

    Dynamic texture recognition using local binary patterns with an application to facial expressions

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2007)
  • Z.S. Li et al.

    Facial expression recognition using facial-component-based bag of words and PHOG descriptors

    Information and Media Technologies

    (2010)
  • J. Zou et al.

    A comparative study of local matching approach for face recognition

    IEEE Transactions on Image Processing

    (2007)
  • V. Bruce et al.

    Understanding face recognition

    The British Journal of Psychology

    (1986)
  • D. Hubel et al.

    Brain and Visual Perception, The Story of a 25-Year Collaboration

    (2005)
  • K. Fukushima

    Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position

    Biological Cybernetics

    (1980)
  • C.N.S. Ganesh Murthy et al.

    Modified neocognitron for improved 2-D pattern recognition

    IEEE Proceedings—Vision, Image and Signal Processing

    (1996)
  • M. Riesenhuber et al.

    Hierarchical models of object recognition in cortex

    Nature Neuroscience

    (1999)
  • J.P. Jones et al.

    An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex

    Journal of Neurophysiology

    (1987)
  • M. Poetzsch et al.

    Improving object recognition by transforming Gabor filter responses

    Network: Computation in Neural Systems

    (1996)
  • Z. Zhang et al.

    Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron

  • H.B. Deng et al.

    A new facial expression recognition method based on local Gabor filter bank and PCA plus LDA

    International Journal of Information Technology

    (2005)
  • C.N.S. Ganesh Murthy et al.

    Efficient classification by neural networks using encoded patterns

    Electronics Letters

    (1995)
  • R.B. Tootell et al.

    Deoxyglucose analysis of retinotopic organization in primates

    Science

    (1982)
  • Cited by (0)

    W.F. Gu received the B.E. degree in Information Security from University of Science and Technology of China in 2006. Currently, he is pursuing the Ph.D. degree at National University of Singapore under the supervision of Dr. Cheng Xiang and Dr. Hai Lin. His research interests include pattern recognition and computer vision.

    C. Xiang received the B.S. degree in mechanical engineering from Fudan University, China in 1991; M.S. degree in mechanical engineering from the Institute of Mechanics, Chinese Academy of Sciences in 1994; and M.S. and Ph.D. degrees in electrical engineering from Yale University in 1995 and 2000, respectively. From 2000 to 2001 he was a financial engineer at Fannie Mae, Washington D.C. At present, he is an Associate Professor in the Department of Electrical and Computer Engineering at the National University of Singapore. His research interests include computational intelligence, adaptive systems and pattern recognition.

    Y.V. Venkatesh (SM-IEEE'91) received the Ph.D. degree from the Indian Institute of Science (IIS), Bangalore. He was an Alexander von Humboldt Fellow at the Universities of Karlsruhe, Freiburg, and Erlangen, Germany; a National Research Council Fellow (USA) at the Goddard Space Flight Center, Greenbelt, MD; and a Visiting Professor at the Institut fuer Mathematik, Technische Universitat Graz, Austria, Institut fuer Signalverarbeitung, Kaiserslautern, Germany, National University of Singapore, Singapore and others. His research interests include 3D computer and robotic vision; signal processing; pattern recognition; biometrics; hyperspectral image analysis; and neural networks. As a Professor at IIS, he was also the Dean of Engineering Faculty and, earlier, the Chairman of the Electrical Sciences Division. Dr. Venkatesh is a Fellow of the Indian Academy of Sciences, the Indian National Science Academy, and the Indian Academy of Engineering. He is on the editorial board of the International Journal of Information Fusion.

    D. Huang received the B.E. degree in Electrical & Computer Engineering from National University of Singapore in 2005. Currently, he is pursuing the Ph.D. degree at the same university under the supervision of Dr. Xiang Cheng.

    H. Lin obtained his B.S. degree at the University of Science and Technology Beijing and his M.S. degree from the Chinese Academy of Sciences in 1997 and 2000, respectively. In 2005, he received his Ph.D. degree from the University of Notre Dame under the guidance of Dr. Panos Antsaklis. His teaching and research interests are in the multidisciplinary study of the problems at the intersections of control, communication, computation and life sciences. His current research thrust is on hybrid control systems, networked embedded systems and systems biology.

    View full text