Manifold based analysis of facial expression
Introduction
Facial expression is one of the most powerful ways that people coordinate conversation and communicate emotions and other mental, social, and physiological cues. Computational facial expression analysis is an active and challenging research topic in computer vision, impacting important applications in areas such as human–computer interaction and data-driven animation.
Facial expressions can be classified in various ways—in terms of non-prototypic expressions such as ‘raised brows.’ prototypic expressions such as emotional labels (e.g. ‘happy’), or facial actions such as the action units defined in facial action coding system (FACS) [1]. Some psychologists claim that there are six kinds of universally recognized facial expressions: happiness, sadness, fear, anger, disgust, and surprise [2]. Existing expression analyzers [3], [4], [5] usually classify the examined expression into one of the basic emotion categories. These six basic categories are only a small subset of all facial expressions expressible by the human face. For ‘blended’ expressions, it may be more reasonable to classify them quantitatively into multiple emotion categories. Considering the intensity scale of the different facial expressions, each person has his/her own maximal intensity of displaying a particular facial action. It is useful to recognize the temporal intensity change of expressions in videos. Some surveys [6], [7] gave a detailed review of existing methods on facial expression analysis and recognition.
A key challenge in automatic facial expression analysis is to identify a global representation for all possible facial expressions that affords semantic analysis. In this paper, we explore the space of expression images and propose the manifold of expressions as a foundation for expression analysis, using non-linear dimensionality reduction to embed facial deformations in a low-dimensional space. Non-linear dimensionality reduction has attracted attention for a long time in computer vision and visualization research [8], [9]. Images lie in a very high-dimensional space, but a class of images generated by latent variables lies on a manifold in this space. For human face images, the latent variables may be the illumination, identity, pose and facial deformations.
An N-dimensional representation of the face (where N could be the number of pixels in the image or the number of parameters in a face model, for example) can be considered a point in an N-dimensional face space, and the variability of facial expression can be represented as low-dimensional manifolds embedded in this space. People change facial expressions continuously over time. Thus all images of an individual's facial expressions represent a smooth manifold in the N-dimensional face space with the ‘neutral’ face as the central reference point. The intrinsic dimension of this manifold is much lower than N.
On the manifold of expressions, similar expressions are points in the local neighborhood on the manifold. Sequences of basic emotional expressions become paths on the manifold extended from the reference center, as illustrated in Fig. 1. The blends of expressions lie between those paths, so they can be defined analytically by the positions of the basic paths. The analysis of the relationships between different facial expressions is facilitated on the manifold.
It is a formidable task to learn the complete structure of the manifold of expressions in a high-dimensional image space. To overcome this problem, our core idea is to embed the non-linear manifold in a low-dimensional space and recognize facial expression from video sequences probabilistically. Fig. 2 illustrates the overall structure of the system.
Non-linear embedding methods such as ISOMAP [10], local linear embedding (LLE) [11], charting a manifold [12], and global coordinate of local linear models [13] are promising in handling high-dimensional non-linear data. Recently, researchers have applied manifold methods to face recognition [14], [15], [16] and facial expression representation [17], [18], [19].
Rather than working in the image space (which is very sensitive to illumination changes), we describe the face as a set of points along facial feature contours, as shown in Fig. 3. We applied a modified Lipschitz embedding [20], [21] to embed the face contour representation in the high-dimensional space into a low-dimensional space, while keeping the main structure of the manifold. Lipschitz embedding leads to good preservation of clusters in practical cases [22], [23].
After the embedding, the expression sequences in the gallery become paths emanating from the center, which is defined by the neutral expression. In an offline training stage, a Gaussian mixture model is applied to cluster data in the low-dimensional expression space. For each cluster, a specific ASM model is learned to allow more robustness with respect to non-linear image variations. We learn the probabilistic model of transition between those paths from the gallery videos.
Given a probe video sequence, based on our learned model, we track facial features using ICondensation [24], while recognizing facial expressions in the same probabilistic framework. The probe set includes videos of random expression changes, which may not begin or end with a neutral expression. The duration and the intensity of the expressions are varied. The transition between different expressions is represented as the evolution of the posterior probability of the basic paths. Our empirical study demonstrates that the probabilistic approach can recognize expression transitions effectively. We also synthesize image sequences of changing expressions through the manifold model.
Differing from traditional methods that consider expression tracking and recognition in separate stages, we address these tasks in a common probabilistic framework, which enables them to be solved in a cooperative manner.
The remainder of this paper is organized as follows. In Section 2 we discuss related work. We then discuss the properties of Lipschitz embedding in Section 3. Section 4 covers the learning of our proposed representation, while Section 5 describes the framework to track and recognize facial deformation. In Section 6, we show how to synthesize facial expressions using our model. Section 7 reports our experimental results, and Section 8 presents conclusions and future work.
Section snippets
Related work
In the past decade, many techniques have been proposed to automatically classify expressions in still images, using methods based on Neural Networks [25], [26], Gabor wavelets [5], [27] and rule-based methods [3], to mention just a few. However, in recent years, more attention has been given to modeling facial deformation in dynamic scenarios [28], [29], [30], which allows the integration of information temporally across the video sequence, potentially increasing recognition rates over
Lipschitz embedding
Lipschitz embedding [20], [21] is a powerful embedding method used widely in image clustering and image search. For a finite set of input data S, Lipschitz embedding is defined in terms of a set R of subsets of S, R={A1,A1,…,Ak}. The subsets Ai are termed the reference sets of the embedding. Let d(o;A) be an extension of the distance function d to a subset A⊂S, such that . An embedding with respect to R is defined as a mapping F such that F(o)=(d(o;A1); d(o;A2);…, d(o;Ak)).
Learning dynamic facial deformation
We are interested in embedding the facial deformations of a person in a very low-dimensional space, which reflects the intrinsic structure of facial expressions. From training video sequences of different people undergoing different expressions, a low-dimensional manifold is learned, with a subsequent probabilistic modeling used for tracking and recognition. The goal of the probabilistic model is to exploit the temporal information in video sequences. Expression recognition is performed on the
Probabilistic tracking and recognition
In the previous section, we showed how to learn a facial expression model on the manifold as well as its associated dynamics. Now, we show how to use this representation to achieve robust online facial deformation tracking and recognition. Our probabilistic tracking is based on the ICondensation algorithm [24], which is described next, followed by expression classification. Both tracking and recognition are described in the same probabilistic framework, which enables them to be carried out in a
Synthesis of dynamic expressions
The manifold model can also be used to synthesize an image sequence with changing expressions. Given expression r, r=1,…,6, we keep the cluster indexing l1,…,lk, and k is the number of the clusters, such that:For expression r, there are m gallery videos that begin from the neutral expression, pass the apex, and end with the neutral expression. We set the first video sequence as a template. Then we apply dynamic time warping [43] to the following m−1 image
Experimental results
In this section, we present our experimental results on facial deformation tracking and recognition.
Conclusions
We proposed a novel framework for dynamic facial expression analysis. We now summarize our main contributions:
- (1)
A new representation for tracking and recognition of facial expressions, based on manifold embedding and probabilistic modeling in the embedded space. Our experimental results show that manifold methods provide an analytical way to analyze the relationship between different expressions, and to recognize blended expressions.
- (2)
A robust method for facial deformation tracking based on a set
Acknowledgements
This work has been supported by NSF ITR grant #0205740 and under the auspices of the US Department of Energy by the Lawrence Livermore National Laboratory under contract No. W-7405-ENG-48. Thanks to the comments from the reviewers. They together make this paper a better one.
References (48)
- et al.
Global self organization of all known protein sequences reveals inherent biological signatures
Journal of Molecular Biology
(1997) - et al.
Facial expression recognition from video sequences: temporal and static modeling
Computer Vision and Image Understanding
(2003) - et al.
Facial Action Coding System: Manual
(1978) Emotion in the Human Face
(1982)- et al.
Recognizing facial expressions in image sequences using local parameterized models of image motion
International Journal of Computer Vision
(1997) - et al.
Coding, analysis interpretation, recognition of facial expressions
IEEE Transactions on Pattern Analysis and Machine Intelligence
(1997) - et al.
Comparison between geometry-based and gabor wavelets-based facial expression recognition using multi-layer perceptron
Proceedings of Int. Conf. on Automatic Face and Gesture Recognition
(1998) - et al.
Automatic analysis of facial expressions: the state of the art
IEEE Trans. Pattern Analysis and Machine Intelligence
(2000) - et al.
Automatic facial expression analysis: a survey
Pattern Recognition
(2003) Core affect and the psychological construction of emotion
Psychological Review
(2003)