Elsevier

Pattern Recognition

Volume 42, Issue 7, July 2009, Pages 1340-1350
Pattern Recognition

Natural facial expression recognition using differential-AAM and manifold learning

https://doi.org/10.1016/j.patcog.2008.10.010Get rights and content

Abstract

This paper proposes a novel natural facial expression recognition method that recognizes a sequence of dynamic facial expression images using the differential active appearance model (AAM) and manifold learning as follows. First, the differential-AAM features (DAFs) are computed by the difference of the AAM parameters between an input face image and a reference (neutral expression) face image. Second, manifold learning embeds the DAFs on the smooth and continuous feature space. Third, the input facial expression is recognized through two steps: (1) computing the distances between the input image sequence and gallery image sequences using directed Hausdorff distance (DHD) and (2) selecting the expression by a majority voting of k-nearest neighbors (k-NN) sequences in the gallery. The DAFs are robust and efficient for the facial expression analysis due to the elimination of the inter-person, camera, and illumination variations. Since the DAFs treat the neutral expression image as the reference image, the neutral expression image must be found effectively. This is done via the differential facial expression probability density model (DFEPDM) using the kernel density approximation of the positively directional DAFs changing from neutral to angry (happy, surprised) and negatively directional DAFs changing from angry (happy, surprised) to neutral. Then, a face image is considered to be the neutral expression if it has the maximum DFEPDM in the input sequences. Experimental results show that (1) the DAFs improve the facial expression recognition performance over conventional AAM features by 20% and (2) the sequence-based k-NN classifier provides a 95% facial expression recognition performance on the facial expression database (FED06).

Introduction

Since the facial expression displays human emotions, its recognition is important in human–computer interaction, human–robot interaction, digital entertainment and games, and smart user interfaces for cellular phones and digital cameras. Hence, many researchers have had a growing interest in facial expression analysis [1]. However, it is still difficult to develop a facial expression recognition system that is real-time implementable, person-independent, camera and illumination robust, and more stably recognizable because the person, camera, and illumination variations complicate the distribution of the facial expressions.

Input face images should be accurately represented through reduced dimensions to analyze the expressions effectively. There are two representation methods, using linear and non-linear models. Linear models such as principal component analysis (PCA) [2], bilinear [3], and tensor model [4] are simple and efficient [5]. However, they are not suitable for representing dynamically changing facial expressions, which are inherently non-linear. To overcome this problem, many researchers have analyzed the facial expressions in non-linear space.

Chang et al. [6], [7], [8] have exploited Lipschitz manifold embedding to model and align the facial features in a low-dimensional embedding space, which improved the facial expression recognition performance greatly. But their model has two limitations: (1) it used only the shape information extracted by the active shape model (ASM) to learn the expression manifold and (2) it learned and evaluated the facial expression recognition performance only for two subjects.

To overcome these limitations, Shan et al. [9], [10], [11] proposed an appearance manifold of the facial expressions, where the appearance feature was extracted from the raw image data using the local binary patterns (LBP). They also proposed the method called supervised locality preserving projections (SLPP) for aligning the manifolds of each subject. Their work to represent facial images on a non-linear space was very impressive, but their approach had a critical problem in that the expression manifold for each subject should be learned individually. This implies that their approach must have training samples containing all facial expressions and must align the expression manifolds. Also, the approach is not robust to changes in illumination.

To solve the abovementioned problems, we propose an approach using differential-AAM features (DAFs) and unified expression manifolds, as illustrated in Fig. 1. The DAFs are computed from the difference of the active appearance model (AAM) parameters between an input image and a reference image, which is the neutral expression image extracted from the image sequences of the target person. We can develop a person-independent facial expression recognition system using DAFs because the differences from a neutral expression to a specific expression (angry, happy, surprised) or vice versa are similar among different people. This also allows the manifold learning to use all training samples in the unified expression space.

After extracting the DAFs, the facial expressions can be recognized through static and temporal approaches. Static classifiers such as neural networks (NN) [12], [13], support vector machines (SVM) [14], linear discriminant analysis (LDA) [15], and the k-nearest neighbors (k-NN) attempt to recognize the facial expression using one frame image. The temporal classifiers such as the hidden Markov model (HMM) [16], and the recurrent neural networks (RNN) [17] attempt the facial expression recognition using a sequence of images. Sebe et al. [18] compared the recognition performances of the SVM, naive-Bayes (NB), tree-augmented naive-Bayes (TAN) and k-NN classifiers. Their experimental results showed that the k-NN classifier had the best classification result of the static classifier methods. However, the sequence-based classifiers had better recognition performances than the frame-based classifiers. Although the HMM is a well-known sequence-based classifiers, it failed to estimate the model parameters effectively given a small number of training sequences in a high-dimensional space.

To overcome these limitations, we propose using the k-nearest neighbor sequences (k-NNS) classifier [19], a sequence-based temporal classifier where it searches the k-NN sequences based on the directed Hausdorff distance (DHD) and then classifies the facial expression as the chosen by the most NNs.

This paper is organized as follows. Section 2 describes the theoretical background of AAMs. Section 3 presents the DAFs and differential facial expression probability density model (DFEPDM), which is the method of finding the neutral facial expression. Section 4 examines the manifold learning and the facial expression recognition using the k-NNS classifier and the majority voting. Section 5 presents the experimental results, which show the improvement of the facial expression recognition performances. Finally, Section 6 presents our conclusions.

Section snippets

Active appearance models (AAMs)

AMMs [20], [21] are generative, parametric models of a certain visual phenomenon that show both shape and appearance variations. These variations are represented by a linear model such as PCA, which finds the maximum variance while reserving the subspace of the given data. A face model can be constructed from training data using AAMs, and face tracking is achieved by fitting the learned model to an input sequence.

The shape of a 2D AAM is represented by a triangulated 2D mesh with l vertices.

Differential-AAMs

AAMs are efficient for face modeling because they can represent the various face images using a compact set of linear model that are obtained by applying PCA to a set of collected example data. Therefore, AAM features of facial images contain all variations that are included in the training samples. However, they cannot effectively represent the variations that are not included in the training samples. To overcome this problem, we propose differential-AAMs that are robust to inter-person

Manifold learning

The classical linear methods such as PCA and classical MDS are simple and efficient because they are linear. However, they are not suitable for representing dynamically changing facial expressions because the changing expressions are inherently non-linear. To overcome this limitation, many non-linear dimensionality reduction techniques have been exploited to model manifold structures for facial expression analysis. Typically, the manifold embedding is trained for the specific person's facial

Experiment result and discussion

We have performed several experiments that show the validity of the proposed facial expression recognition method. The proposed system was implemented in a Visual C++ environment on a PC platform with a Pentium-4 Duo CPU with a clock speed of 2.8 GHz, 2 GB RAM, and Windows XP professional.

Conclusion

We proposed a new framework for real-time person-independent facial expression recognition which was composed of three modules: differential-AAM feature (DAF) extraction, manifold embedding, and classification using k-nearest neighbor sequences (k-NNS).

The DAFs were defined by the difference between the AAM feature of the input face image and that of the reference image. Therefore, they are person independent because the differences among people are similar although a specific expression of

Acknowledgement

This work was supported by the Korea Science and Engineering Foundation (KOSEF) through the Biometrics Engineering Research Center (BERC) at Yonsei University (R112002105070030(2008)) and also was supported by the Intelligent Robotics Development Program, one of the 21st Century Frontier R&D Programs funded by the Ministry of Commerce, Industry and Energy (MOCIE).

About the Author—YEONGJAE CHEON received the B.S. degree in computer engineering from Hongik University, Korea, in 2006, and the M.S. degree in computer engineering from Pohang University of Science and Technology (POSTECH), in 2008. Now, he is working for NHN Corporation as software engineer.

His research interests include biometrics, face analysis, and facial expression recognition.

References (31)

  • Y. Chang, C. Hu, and M. Turk, Probabilistic expression analysis on manifolds, in: Proceedings of IEEE Conference on...
  • C. Shan, S. Gong, P. McOwan, Robust facial expression recognition using local binary patterns, in: Proceedings of IEEE...
  • C. Shan, S. Gong, P. McOwan, Appearance manifold of facial expression, in: IEEE International Workshop on...
  • C. Shan, S. Gong, P. McOwan, Dynamic facial expression recognition using a Bayesian temporal manifold model, in:...
  • Z. Zhang, M. Lyons, M. Schuster, S. Akamatsu, Comparison between geometry-based and Gabor-wavelets-based facial...
  • Cited by (128)

    • Analysis of Detection and Recognition of Human Face Using Support Vector Machine

      2024, Communications in Computer and Information Science
    View all citing articles on Scopus

    About the Author—YEONGJAE CHEON received the B.S. degree in computer engineering from Hongik University, Korea, in 2006, and the M.S. degree in computer engineering from Pohang University of Science and Technology (POSTECH), in 2008. Now, he is working for NHN Corporation as software engineer.

    His research interests include biometrics, face analysis, and facial expression recognition.

    About the Author—DAIJIN KIM received the B.S. degree in electronic engineering from Yonsei University, Seoul, Korea, in 1981, and the M.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Taejon, 1984. In 1991, he received the Ph.D. degree in electrical and computer engineering from Syracuse University, Syracuse, NY.

    During 1992–1999, he was an Associate Professor in the Department of Computer Engineering at DongA University, Pusan, Korea. He is currently a Professor in the Department of Computer Science and Engineering at POSTECH, Pohang, Korea.

    His research interests include biometrics, human computer interaction, and intelligent systems.

    View full text