Natural facial expression recognition using differential-AAM and manifold learning
Introduction
Since the facial expression displays human emotions, its recognition is important in human–computer interaction, human–robot interaction, digital entertainment and games, and smart user interfaces for cellular phones and digital cameras. Hence, many researchers have had a growing interest in facial expression analysis [1]. However, it is still difficult to develop a facial expression recognition system that is real-time implementable, person-independent, camera and illumination robust, and more stably recognizable because the person, camera, and illumination variations complicate the distribution of the facial expressions.
Input face images should be accurately represented through reduced dimensions to analyze the expressions effectively. There are two representation methods, using linear and non-linear models. Linear models such as principal component analysis (PCA) [2], bilinear [3], and tensor model [4] are simple and efficient [5]. However, they are not suitable for representing dynamically changing facial expressions, which are inherently non-linear. To overcome this problem, many researchers have analyzed the facial expressions in non-linear space.
Chang et al. [6], [7], [8] have exploited Lipschitz manifold embedding to model and align the facial features in a low-dimensional embedding space, which improved the facial expression recognition performance greatly. But their model has two limitations: (1) it used only the shape information extracted by the active shape model (ASM) to learn the expression manifold and (2) it learned and evaluated the facial expression recognition performance only for two subjects.
To overcome these limitations, Shan et al. [9], [10], [11] proposed an appearance manifold of the facial expressions, where the appearance feature was extracted from the raw image data using the local binary patterns (LBP). They also proposed the method called supervised locality preserving projections (SLPP) for aligning the manifolds of each subject. Their work to represent facial images on a non-linear space was very impressive, but their approach had a critical problem in that the expression manifold for each subject should be learned individually. This implies that their approach must have training samples containing all facial expressions and must align the expression manifolds. Also, the approach is not robust to changes in illumination.
To solve the abovementioned problems, we propose an approach using differential-AAM features (DAFs) and unified expression manifolds, as illustrated in Fig. 1. The DAFs are computed from the difference of the active appearance model (AAM) parameters between an input image and a reference image, which is the neutral expression image extracted from the image sequences of the target person. We can develop a person-independent facial expression recognition system using DAFs because the differences from a neutral expression to a specific expression (angry, happy, surprised) or vice versa are similar among different people. This also allows the manifold learning to use all training samples in the unified expression space.
After extracting the DAFs, the facial expressions can be recognized through static and temporal approaches. Static classifiers such as neural networks (NN) [12], [13], support vector machines (SVM) [14], linear discriminant analysis (LDA) [15], and the -nearest neighbors (-NN) attempt to recognize the facial expression using one frame image. The temporal classifiers such as the hidden Markov model (HMM) [16], and the recurrent neural networks (RNN) [17] attempt the facial expression recognition using a sequence of images. Sebe et al. [18] compared the recognition performances of the SVM, naive-Bayes (NB), tree-augmented naive-Bayes (TAN) and -NN classifiers. Their experimental results showed that the -NN classifier had the best classification result of the static classifier methods. However, the sequence-based classifiers had better recognition performances than the frame-based classifiers. Although the HMM is a well-known sequence-based classifiers, it failed to estimate the model parameters effectively given a small number of training sequences in a high-dimensional space.
To overcome these limitations, we propose using the -nearest neighbor sequences (-NNS) classifier [19], a sequence-based temporal classifier where it searches the -NN sequences based on the directed Hausdorff distance (DHD) and then classifies the facial expression as the chosen by the most NNs.
This paper is organized as follows. Section 2 describes the theoretical background of AAMs. Section 3 presents the DAFs and differential facial expression probability density model (DFEPDM), which is the method of finding the neutral facial expression. Section 4 examines the manifold learning and the facial expression recognition using the -NNS classifier and the majority voting. Section 5 presents the experimental results, which show the improvement of the facial expression recognition performances. Finally, Section 6 presents our conclusions.
Section snippets
Active appearance models (AAMs)
AMMs [20], [21] are generative, parametric models of a certain visual phenomenon that show both shape and appearance variations. These variations are represented by a linear model such as PCA, which finds the maximum variance while reserving the subspace of the given data. A face model can be constructed from training data using AAMs, and face tracking is achieved by fitting the learned model to an input sequence.
The shape of a 2D AAM is represented by a triangulated 2D mesh with vertices.
Differential-AAMs
AAMs are efficient for face modeling because they can represent the various face images using a compact set of linear model that are obtained by applying PCA to a set of collected example data. Therefore, AAM features of facial images contain all variations that are included in the training samples. However, they cannot effectively represent the variations that are not included in the training samples. To overcome this problem, we propose differential-AAMs that are robust to inter-person
Manifold learning
The classical linear methods such as PCA and classical MDS are simple and efficient because they are linear. However, they are not suitable for representing dynamically changing facial expressions because the changing expressions are inherently non-linear. To overcome this limitation, many non-linear dimensionality reduction techniques have been exploited to model manifold structures for facial expression analysis. Typically, the manifold embedding is trained for the specific person's facial
Experiment result and discussion
We have performed several experiments that show the validity of the proposed facial expression recognition method. The proposed system was implemented in a Visual environment on a PC platform with a Pentium-4 Duo CPU with a clock speed of 2.8 GHz, 2 GB RAM, and Windows XP professional.
Conclusion
We proposed a new framework for real-time person-independent facial expression recognition which was composed of three modules: differential-AAM feature (DAF) extraction, manifold embedding, and classification using -nearest neighbor sequences (-NNS).
The DAFs were defined by the difference between the AAM feature of the input face image and that of the reference image. Therefore, they are person independent because the differences among people are similar although a specific expression of
Acknowledgement
This work was supported by the Korea Science and Engineering Foundation (KOSEF) through the Biometrics Engineering Research Center (BERC) at Yonsei University (R112002105070030(2008)) and also was supported by the Intelligent Robotics Development Program, one of the 21st Century Frontier R&D Programs funded by the Ministry of Commerce, Industry and Energy (MOCIE).
About the Author—YEONGJAE CHEON received the B.S. degree in computer engineering from Hongik University, Korea, in 2006, and the M.S. degree in computer engineering from Pohang University of Science and Technology (POSTECH), in 2008. Now, he is working for NHN Corporation as software engineer.
His research interests include biometrics, face analysis, and facial expression recognition.
References (31)
- et al.
Automatic facial expression analysis: a survey
Pattern Recognition
(2003) - et al.
Manifold based analysis of facial expression
Image Vision Comput.
(2006) - et al.
Facial expression recognition from video sequences: temporal and static modeling
Comput. Vision Image Understanding
(2003) - et al.
Authentic facial expression analysis
Image Vision Comput.
(2007) - et al.
Static topographic modeling for facial expression recognition and analysis
Comput. Vision Image Understanding
(2007) - et al.
Two-dimensional PCA: a new approach to appearance-based face representation and recognition
IEEE Trans. Pattern Anal. Mach. Intell.
(2004) - et al.
Separating style and content with bilinear models
Neural Comput.
(2000) - H. Wang, N. Ahuja, Facial expression decomposition, in: Proceedings of the Ninth IEEE International Conference on...
- E. Chanang, H. Deshpande, C. Bergler, Facial expression space learning, in: Proceedings of the 10th Pacific Conference...
- Y. Chang, C. Hu, M. Turk, Manifold of facial expression, in: Proceedings of IEEE International Workshop on Analysis and...
Cited by (128)
Dual subspace manifold learning based on GCN for intensity-invariant facial expression recognition
2024, Pattern RecognitionFaceNet recognition algorithm subject to multiple constraints: Assessment of the performance
2024, Scientific AfricanA novel facial emotion recognition method for stress inference of facial nerve paralysis patients
2022, Expert Systems with ApplicationsAnalysis of Detection and Recognition of Human Face Using Support Vector Machine
2024, Communications in Computer and Information Science
About the Author—YEONGJAE CHEON received the B.S. degree in computer engineering from Hongik University, Korea, in 2006, and the M.S. degree in computer engineering from Pohang University of Science and Technology (POSTECH), in 2008. Now, he is working for NHN Corporation as software engineer.
His research interests include biometrics, face analysis, and facial expression recognition.
About the Author—DAIJIN KIM received the B.S. degree in electronic engineering from Yonsei University, Seoul, Korea, in 1981, and the M.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Taejon, 1984. In 1991, he received the Ph.D. degree in electrical and computer engineering from Syracuse University, Syracuse, NY.
During 1992–1999, he was an Associate Professor in the Department of Computer Engineering at DongA University, Pusan, Korea. He is currently a Professor in the Department of Computer Science and Engineering at POSTECH, Pohang, Korea.
His research interests include biometrics, human computer interaction, and intelligent systems.