Static and dynamic 3D facial expression recognition: A comprehensive survey

https://doi.org/10.1016/j.imavis.2012.06.005Get rights and content

Abstract

Automatic facial expression recognition constitutes an active research field due to the latest advances in computing technology that make the user's experience a clear priority. The majority of work conducted in this area involves 2D imagery, despite the problems this presents due to inherent pose and illumination variations. In order to deal with these problems, 3D and 4D (dynamic 3D) recordings are increasingly used in expression analysis research. In this paper we survey the recent advances in 3D and 4D facial expression recognition. We discuss developments in 3D facial data acquisition and tracking, and present currently available 3D/4D face databases suitable for 3D/4D facial expressions analysis as well as the existing facial expression recognition systems that exploit either 3D or 4D data in detail. Finally, challenges that have to be addressed if 3D facial expression recognition systems are to become a part of future applications are extensively discussed.

Highlights

► We survey the recent advances in 3D and 4D facial expression recognition. ► We discuss developments in 3D facial data acquisition and tracking. ► We discuss the 3D/4D face databases suitable for facial expressions analysis. ► We discuss the challenges that have to be addressed.

Introduction

Automatic human behaviour understanding has attracted a great deal of interest over the past two decades, mainly because of its many applications spanning various fields such as psychology, computer technology, medicine and security. It can be regarded as the essence of next-generation computing systems as it plays a crucial role in affective computing technologies (i.e. proactive and affective user interfaces), learner-adaptive tutoring systems, patient-profiled personal wellbeing technologies, etc. [1].

Facial expression is the most cogent, naturally preeminent means for humans to communicate emotions, to clarify and give emphasis, to signal comprehension disagreement, to express intentions and, more generally, to regulate interactions with the environment and other people [2]. These facts highlight the importance of automatic facial behaviour analysis, including facial expression of emotion and facial action unit (AU) recognition, and justify the interest this research area has attracted, in the past twenty years [3], [4].

Until recently, most of the available data sets of expressive faces were of limited size containing only deliberately posed affective displays, mainly of the prototypical expressions of six basic emotions (i.e. anger, disgust, fear, happiness, sadness and surprise), recorded under highly controlled conditions. Recent efforts focus on the recognition of complex and spontaneous emotional phenomena (e.g. boredom or lack of attention, frustration, stress, etc.) rather than on the recognition of deliberately displayed prototypical expressions of emotions [5], [4], [6], [7]. However, most of these systems are still highly sensitive to the recording conditions such as illumination, occlusions and other changes in facial appearance like makeup and facial hair. Furthermore, in most cases when 2D facial intensity images are used, it is necessary to maintain a consistent facial pose (preferably a frontal one) in order to achieve a good recognition performance, as even small changes in the facial pose can reduce the system's accuracy. Moreover, single-view 2D analysis is unable to fully exploit the information displayed by the face as 2D video recordings cannot capture out-of-plane changes of the facial surface, or difficult to see changes. Hence, many 2D views must be utilised simultaneously if the information in the face is to be fully captured. Alternatively, in order to tackle this problem, 3D data can be acquired and analysed. In the case of AU recognition, the subtle changes occurring in the depth of the facial surface are captured in detail when 3D data are used, with 2D data. For example, AU18 (Lip Pucker) is not easily distinguished from AU10 + AU17 + AU24 (Upper Lip and Chin Raising and Lip Presser) in a 2D frontal view video. In a 3D capture the action is easily identified, as can be seen in Fig. 1. Similarly, AU 31 (Jaw Clencher), can be difficult to detect in a 2D view, but is easily captured by the full 3D data as can be seen in Fig. 2. Recent advances in structured light scanning, stereo photogrammetry and photometric stereo have made the acquisition of 3D facial structure and motion a feasible task.

In this survey we focus on the use of 3D and 4D data capture for automatic facial expression recognition and analysis. We first study the recent technological solutions that are available for acquiring static and dynamic 3D faces. We particularly focus on the difficulties encountered when applying these techniques in order to be able to capture naturalistic (spontaneous) expressions. We later examine the challenges existing in 3D face alignment, tracking and finding point correspondences and review existing methods. Furthermore, we survey the databases that have been created either for 3D and 4D facial expression analysis, or biometric applications but contain significant number of expressive examples. Next, we discuss the methods used for static and dynamic 3D facial expression recognition. Here, we mainly focus on feature extraction as this is what differentiates 3D methods from the corresponding 2D ones. Finally we examine the challenges that still remain and discuss the future research needed in tracking and recognition methodologies beyond the state of the art.

The rest of the paper is organised as follows. Section 2 reviews 3D acquisition, tracking and alignment methods. Section 3 presents in detail the available databases suitable for 3D facial expression analysis. Section 4 surveys the recognition systems that have been developed, both for static and dynamic analysis of 3D facial expressions. Section 5 discusses a number of open issues in the field. Finally, Section 6 concludes the paper.

Section snippets

Acquisition of 3D and 4D faces, dense correspondences, alignment and tracking

In the past decade the fields of capturing, reconstruction, alignment and tracking of static and dynamic 3D faces have witnessed tremendous development. This section focuses on the state-of-the-art methods in this field from the perspective of the kind of facial behaviour (posed or spontaneous) that the surveyed technology is able to capture. For acquisition, the focus is more on the actual process (i.e. how many cameras are needed, where they need to be placed, what kind of patterns should be

Databases

During the past two decades a number of 3D face databases have been created in order to be used for face modelling and recognition. In this Section we review the existing 3D databases, including not only those that have been especially created for expression recognition, but also those that contain expressive faces despite having been recorded for other purposes (e.g. face recognition), as long as they contain enough available samples for training and testing 3D static and dynamic facial

Static and dynamic 3D facial expression recognition

A wide range of 3D facial expression recognition methodologies have been developed in order to perform analysis on static faces and, more recently, dynamic facial image sequences. Methods for 3D facial expression recognition generally consist of two main stages: feature extraction, and selection and classification of features. Dynamic systems may also employ temporal modelling of the expression as a further step. Here, we focus more on the techniques employed for 3D feature extraction. This is

Challenges and discussion

Research in 3D facial expression analysis is still in its infant stage, with a large number of works expected in the near future as the current technological advances allow the easy and affordable acquisition of high quality 3D data. However, there exist several issues that remain unsolved in this field.

There are many databases that can be used for static 3D facial expression analysis. However the current trend has shown a shift in interest of researchers towards the analysis of facial

Conclusions

Several approaches have been followed in the field of 3D facial expression analysis. The development of 3D data acquisition methods has allowed the creation of several databases containing 3D static faces and facial image sequences demonstrating expressions. The public availability of these databases has facilitated research in this area, particularly in static analysis. Many methods have been developed for the tracking and alignment of 3D facial meshes, a crucial step before feature

References (127)

  • M. Pantic et al.

    Human-Centred Intelligent Human–Computer Interaction (HCI2): how far are we from attaining it?

    Int. J. Auton. Adapt. Commun. Syst.

    (2008)
  • N. Ambady et al.

    Thin slices of expressive behavior as predictors of interpersonal consequences: a meta-analysis

    Psychol. Bull.

    (1992)
  • F. De la Torre et al.

    Guide to visual analysis of humans: looking at people

    Ch. Facial Expression Analysis

    (2011)
  • H. Gunes et al.

    Automatic, dimensional and continuous emotion recognition

    Int. J. Synthet. Emot.

    (2010)
  • Z. Zeng et al.

    A survey of affect recognition methods: audio, visual, and spontaneous expressions

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2009)
  • M. Nicolaou et al.

    Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space

    IEEE Trans. Affective Comput

    (2011)
  • A. Vinciarelli et al.

    Bridging the gap between social animal and unsocial machine: a survey of social signal processing, IEEE Trans

    Affective Comput. IEEE Trans. Affective Comput.

    (2012)
  • I. Kemelmacher-Shlizerman et al.

    3d face reconstruction from a single image using a single reference face shape

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • S. Wang et al.

    Reconstructing 3d face model with associated expression deformation from a single face image via constructing a low-dimensional expression deformation manifold

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • V. Blanz et al.

    A morphable model for the synthesis of 3D faces

  • V. Blanz et al.

    Face recognition based on fitting a 3D morphable model

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2003)
  • S. Romdhani et al.

    Efficient, robust and accurate fitting of a 3D morphable model

  • P. Breuer et al.

    Automatic 3D face reconstruction from single images or video

  • A. Patel et al.

    3D morphable face models revisited

  • N. Faggian et al.

    3D morphable model fitting from multiple views

  • D. Sibbing et al.

    Markerless reconstruction of dynamic facial expressions

  • W. Brand

    Morphable 3D models from video

  • 3D morphable models

  • R. Jarvis

    Range sensing for computer vision

    (1993)
  • C. Beumier et al.

    3D facial surface acquisition by structured light

  • Y. Chang et al.

    Automatic 3D facial expression analysis in videos

    Anal. Model. Faces Gestures

    (2005)
  • M. Vieira et al.

    A camera–projector system for real-time 3d video

  • F. Tsalakanidou et al.

    Real-time acquisition of depth and color images using structured light and its application to 3D face recognition

    Real-Time Imaging

    (2005)
  • S. Zhang et al.

    High-resolution, real-time 3D shape acquisition

  • S. Zhang et al.

    High-resolution, real-time 3D absolute coordinate measurement based on a phase-shifting method

    Opt. Express

    (2006)
  • Y. Wang et al.

    High resolution acquisition, learning and transfer of dynamic 3-D facial expressions

  • P. Huang et al.

    High-speed 3-D shape measurement based on digital fringe projection

    Opt. Eng.

    (2003)
  • Z. Geng

    Rainbow three-dimensional camera: new concept of high-speed three-dimensional vision systems

    Opt. Eng.

    (1996)
  • C. Wust et al.

    Surface profile measurement using color fringe projection

    Mach. Vis. Appl.

    (1991)
  • P. Huang et al.

    Color-encoded digital fringe projection technique for high-speed three-dimensional surface contouring

    Opt. Eng.

    (1999)
  • R. Raskar et al.

    The office of the future: a unified approach to image-based modeling and spatially immersive displays

  • S. Rusinkiewicz et al.

    Real-time 3D model acquisition

  • O. Hall-Holt et al.

    Stripe boundary codes for real-time structured-light range scanning of moving objects

  • T. Weise et al.

    Fast 3D scanning with automatic motion compensation

  • Minolta Vivid 910

  • Inspeck Mega Capturor II Digitizer

  • Kinect

  • R. Woodham

    Photometric method for determining surface orientation from multiple images

    Opt. Eng.

    (1980)
  • G. Brostow et al.

    Video normals from colored lights

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2011)
  • R. Frankot et al.

    A method for enforcing integrability in shape from shading algorithms

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1988)
  • A. Agrawal et al.

    What is the range of surface reconstructions from a gradient field?

  • A. Agrawal et al.

    An algebraic approach to surface reconstruction from gradient fields

  • A. Georghiades et al.

    From few to many: illumination cone models for face recognition under variable lighting and pose

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2001)
  • T. Simchony et al.

    Direct analytical methods for solving Poisson equations in computer vision problems

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1990)
  • S. Zafeiriou et al.

    The photoface database

  • M. Hansen et al.

    3D face reconstructions from photometric stereo using near infrared and visible light

    Comput. Vision Image Underst.

    (2010)
  • S. Seitz et al.

    A comparison and evaluation of multi-view stereo reconstruction algorithms

  • T. Beeler et al.

    High-quality single-shot capture of facial geometry

    ACM Trans. on Graphics (Proc. SIGGRAPH)

    (2010)
  • DI4D — 4D Capture Systems

  • 3DMD 4D Capture

  • Cited by (314)

    • Optimizing exposure times of structured light metrology systems using a digital twin

      2024, Measurement: Journal of the International Measurement Confederation
    • Nonverbal communication

      2023, Encyclopedia of Mental Health, Third Edition: Volume 1-3
    • Measuring 3D facial displacement of increasing smile expressions

      2022, Journal of Plastic, Reconstructive and Aesthetic Surgery
    View all citing articles on Scopus

    This paper has been recommended for acceptance by Jan-Michael Frahm, Dr.-Ing.

    View full text