Static and dynamic 3D facial expression recognition: A comprehensive survey☆
Highlights
► We survey the recent advances in 3D and 4D facial expression recognition. ► We discuss developments in 3D facial data acquisition and tracking. ► We discuss the 3D/4D face databases suitable for facial expressions analysis. ► We discuss the challenges that have to be addressed.
Introduction
Automatic human behaviour understanding has attracted a great deal of interest over the past two decades, mainly because of its many applications spanning various fields such as psychology, computer technology, medicine and security. It can be regarded as the essence of next-generation computing systems as it plays a crucial role in affective computing technologies (i.e. proactive and affective user interfaces), learner-adaptive tutoring systems, patient-profiled personal wellbeing technologies, etc. [1].
Facial expression is the most cogent, naturally preeminent means for humans to communicate emotions, to clarify and give emphasis, to signal comprehension disagreement, to express intentions and, more generally, to regulate interactions with the environment and other people [2]. These facts highlight the importance of automatic facial behaviour analysis, including facial expression of emotion and facial action unit (AU) recognition, and justify the interest this research area has attracted, in the past twenty years [3], [4].
Until recently, most of the available data sets of expressive faces were of limited size containing only deliberately posed affective displays, mainly of the prototypical expressions of six basic emotions (i.e. anger, disgust, fear, happiness, sadness and surprise), recorded under highly controlled conditions. Recent efforts focus on the recognition of complex and spontaneous emotional phenomena (e.g. boredom or lack of attention, frustration, stress, etc.) rather than on the recognition of deliberately displayed prototypical expressions of emotions [5], [4], [6], [7]. However, most of these systems are still highly sensitive to the recording conditions such as illumination, occlusions and other changes in facial appearance like makeup and facial hair. Furthermore, in most cases when 2D facial intensity images are used, it is necessary to maintain a consistent facial pose (preferably a frontal one) in order to achieve a good recognition performance, as even small changes in the facial pose can reduce the system's accuracy. Moreover, single-view 2D analysis is unable to fully exploit the information displayed by the face as 2D video recordings cannot capture out-of-plane changes of the facial surface, or difficult to see changes. Hence, many 2D views must be utilised simultaneously if the information in the face is to be fully captured. Alternatively, in order to tackle this problem, 3D data can be acquired and analysed. In the case of AU recognition, the subtle changes occurring in the depth of the facial surface are captured in detail when 3D data are used, with 2D data. For example, AU18 (Lip Pucker) is not easily distinguished from AU10 + AU17 + AU24 (Upper Lip and Chin Raising and Lip Presser) in a 2D frontal view video. In a 3D capture the action is easily identified, as can be seen in Fig. 1. Similarly, AU 31 (Jaw Clencher), can be difficult to detect in a 2D view, but is easily captured by the full 3D data as can be seen in Fig. 2. Recent advances in structured light scanning, stereo photogrammetry and photometric stereo have made the acquisition of 3D facial structure and motion a feasible task.
In this survey we focus on the use of 3D and 4D data capture for automatic facial expression recognition and analysis. We first study the recent technological solutions that are available for acquiring static and dynamic 3D faces. We particularly focus on the difficulties encountered when applying these techniques in order to be able to capture naturalistic (spontaneous) expressions. We later examine the challenges existing in 3D face alignment, tracking and finding point correspondences and review existing methods. Furthermore, we survey the databases that have been created either for 3D and 4D facial expression analysis, or biometric applications but contain significant number of expressive examples. Next, we discuss the methods used for static and dynamic 3D facial expression recognition. Here, we mainly focus on feature extraction as this is what differentiates 3D methods from the corresponding 2D ones. Finally we examine the challenges that still remain and discuss the future research needed in tracking and recognition methodologies beyond the state of the art.
The rest of the paper is organised as follows. Section 2 reviews 3D acquisition, tracking and alignment methods. Section 3 presents in detail the available databases suitable for 3D facial expression analysis. Section 4 surveys the recognition systems that have been developed, both for static and dynamic analysis of 3D facial expressions. Section 5 discusses a number of open issues in the field. Finally, Section 6 concludes the paper.
Section snippets
Acquisition of 3D and 4D faces, dense correspondences, alignment and tracking
In the past decade the fields of capturing, reconstruction, alignment and tracking of static and dynamic 3D faces have witnessed tremendous development. This section focuses on the state-of-the-art methods in this field from the perspective of the kind of facial behaviour (posed or spontaneous) that the surveyed technology is able to capture. For acquisition, the focus is more on the actual process (i.e. how many cameras are needed, where they need to be placed, what kind of patterns should be
Databases
During the past two decades a number of 3D face databases have been created in order to be used for face modelling and recognition. In this Section we review the existing 3D databases, including not only those that have been especially created for expression recognition, but also those that contain expressive faces despite having been recorded for other purposes (e.g. face recognition), as long as they contain enough available samples for training and testing 3D static and dynamic facial
Static and dynamic 3D facial expression recognition
A wide range of 3D facial expression recognition methodologies have been developed in order to perform analysis on static faces and, more recently, dynamic facial image sequences. Methods for 3D facial expression recognition generally consist of two main stages: feature extraction, and selection and classification of features. Dynamic systems may also employ temporal modelling of the expression as a further step. Here, we focus more on the techniques employed for 3D feature extraction. This is
Challenges and discussion
Research in 3D facial expression analysis is still in its infant stage, with a large number of works expected in the near future as the current technological advances allow the easy and affordable acquisition of high quality 3D data. However, there exist several issues that remain unsolved in this field.
There are many databases that can be used for static 3D facial expression analysis. However the current trend has shown a shift in interest of researchers towards the analysis of facial
Conclusions
Several approaches have been followed in the field of 3D facial expression analysis. The development of 3D data acquisition methods has allowed the creation of several databases containing 3D static faces and facial image sequences demonstrating expressions. The public availability of these databases has facilitated research in this area, particularly in static analysis. Many methods have been developed for the tracking and alignment of 3D facial meshes, a crucial step before feature
References (127)
- et al.
Human-Centred Intelligent Human–Computer Interaction (HCI2): how far are we from attaining it?
Int. J. Auton. Adapt. Commun. Syst.
(2008) - et al.
Thin slices of expressive behavior as predictors of interpersonal consequences: a meta-analysis
Psychol. Bull.
(1992) - et al.
Guide to visual analysis of humans: looking at people
Ch. Facial Expression Analysis
(2011) - et al.
Automatic, dimensional and continuous emotion recognition
Int. J. Synthet. Emot.
(2010) - et al.
A survey of affect recognition methods: audio, visual, and spontaneous expressions
IEEE Trans. Pattern Anal. Mach. Intell.
(2009) - et al.
Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space
IEEE Trans. Affective Comput
(2011) - et al.
Bridging the gap between social animal and unsocial machine: a survey of social signal processing, IEEE Trans
Affective Comput. IEEE Trans. Affective Comput.
(2012) - et al.
3d face reconstruction from a single image using a single reference face shape
IEEE Trans. Pattern Anal. Mach. Intell.
(2011) - et al.
Reconstructing 3d face model with associated expression deformation from a single face image via constructing a low-dimensional expression deformation manifold
IEEE Trans. Pattern Anal. Mach. Intell.
(2011) - et al.
A morphable model for the synthesis of 3D faces
Face recognition based on fitting a 3D morphable model
IEEE Trans. Pattern Anal. Mach. Intell.
Efficient, robust and accurate fitting of a 3D morphable model
Automatic 3D face reconstruction from single images or video
3D morphable face models revisited
3D morphable model fitting from multiple views
Markerless reconstruction of dynamic facial expressions
Morphable 3D models from video
3D morphable models
Range sensing for computer vision
3D facial surface acquisition by structured light
Automatic 3D facial expression analysis in videos
Anal. Model. Faces Gestures
A camera–projector system for real-time 3d video
Real-time acquisition of depth and color images using structured light and its application to 3D face recognition
Real-Time Imaging
High-resolution, real-time 3D shape acquisition
High-resolution, real-time 3D absolute coordinate measurement based on a phase-shifting method
Opt. Express
High resolution acquisition, learning and transfer of dynamic 3-D facial expressions
High-speed 3-D shape measurement based on digital fringe projection
Opt. Eng.
Rainbow three-dimensional camera: new concept of high-speed three-dimensional vision systems
Opt. Eng.
Surface profile measurement using color fringe projection
Mach. Vis. Appl.
Color-encoded digital fringe projection technique for high-speed three-dimensional surface contouring
Opt. Eng.
The office of the future: a unified approach to image-based modeling and spatially immersive displays
Real-time 3D model acquisition
Stripe boundary codes for real-time structured-light range scanning of moving objects
Fast 3D scanning with automatic motion compensation
Minolta Vivid 910
Inspeck Mega Capturor II Digitizer
Kinect
Photometric method for determining surface orientation from multiple images
Opt. Eng.
Video normals from colored lights
IEEE Trans. Pattern Anal. Mach. Intell.
A method for enforcing integrability in shape from shading algorithms
IEEE Trans. Pattern Anal. Mach. Intell.
What is the range of surface reconstructions from a gradient field?
An algebraic approach to surface reconstruction from gradient fields
From few to many: illumination cone models for face recognition under variable lighting and pose
IEEE Trans. Pattern Anal. Mach. Intell.
Direct analytical methods for solving Poisson equations in computer vision problems
IEEE Trans. Pattern Anal. Mach. Intell.
The photoface database
3D face reconstructions from photometric stereo using near infrared and visible light
Comput. Vision Image Underst.
A comparison and evaluation of multi-view stereo reconstruction algorithms
High-quality single-shot capture of facial geometry
ACM Trans. on Graphics (Proc. SIGGRAPH)
DI4D — 4D Capture Systems
3DMD 4D Capture
Cited by (314)
OCEAN-AI framework with EmoFormer cross-hemiface attention approach for personality traits assessment
2024, Expert Systems with ApplicationsOptimizing exposure times of structured light metrology systems using a digital twin
2024, Measurement: Journal of the International Measurement ConfederationNonverbal communication
2023, Encyclopedia of Mental Health, Third Edition: Volume 1-3Measuring 3D facial displacement of increasing smile expressions
2022, Journal of Plastic, Reconstructive and Aesthetic SurgeryTransferable discriminative non-negative matrix factorization for cross-database facial expression recognition
2022, Digital Signal Processing: A Review Journal
- ☆
This paper has been recommended for acceptance by Jan-Michael Frahm, Dr.-Ing.