Facial feature extraction by a cascade of model-based algorithms

https://doi.org/10.1016/j.image.2008.01.002Get rights and content

Abstract

In this paper, we propose a cascaded facial feature-extraction framework employing a set of model-based algorithms. In this framework, the algorithms are arranged with increasing model flexibility and extraction accuracy, such that the cascaded algorithm can have an optimal performance in both robustness and extraction accuracy. Especially, we propose a set of guidelines to analyze and jointly optimize the performance relation between the constituting algorithms, such that the constructed cascade gives the best overall performance. Afterwards, we present an implementation of the cascaded framework employing three algorithms, namely, sparse-graph search, component-based texture fitting and component-based direct fitting. Special attention is paid on the search and optimization of the model parameters of each algorithm, such that the overall extraction performance is greatly improved with respect to both reliability and accuracy.

Introduction

Recent advances in image processing and vision computing enable automatic identification of human faces in visual scenes. One promising application for face recognition is consumer home applications, where face-recognition systems facilitate intelligent and convenient services for daily life.

Within the framework of the ITEA project HomeNet2Run [10], we have developed a face-identification system [22], which actively identifies home users and notifies other home devices (e.g. home server/television) to adapt services accordingly. Facial feature extraction is an indispensable step in the processing, which lays the basis for both face alignment to the database format and the subsequent recognition stage. The robustness and accuracy of the facial feature extraction directly influences the success of the final identification.

However, to achieve an automatic facial feature-extraction system, a number of challenges should be addressed.

  • The facial feature appearances and their geometric relationships are considerably influenced by individual variations and also affected by various facial expressions.

  • Especially for non-professional applications, the image-capturing environments are more variable than for professional cases. Particularly camera qualities and illumination conditions exert difficulties to the feature-extraction procedure.

A good feature-extraction algorithm should provide accurate feature locations while being robust against large feature variances. Furthermore, for embedded/consumer applications, computation cost is also an important factor that should be taken into account during the algorithm design.

In the following, we present a brief literature review of current feature-extraction techniques and point out their limitations.

In the past decade, a number of techniques have been proposed in literature on automatic facial feature extraction. Earlier techniques find the positions of facial features based on empirical knowledge about facial feature appearances (e.g. [12], [18]), such as eye symmetry and varying color distributions over the features. A more recent paper [8] makes use of edge properties around facial landmarks and reports quite good results. Some other techniques treat feature component (e.g. eyes and nose) localization as a two-class pattern-classification problem, where the classes are feature and non-feature. Typical examples include the eigenfeature-based techniques [19] and Adaboost-based techniques [5]. These techniques are relatively robust for large feature variances. However, they usually provide only coarse localization results.

Recently, model-based algorithms have received great attention in literature. A face model (e.g. a graph model or patch model) is usually devised to represent the salient facial features, and its parameters are optimized to fit to a new face. These techniques usually offer both accurate description of features and flexibility to adapt to individual feature variances. Some earlier examples of model-based techniques are active contour models [1] and deformable geometric templates [20]. These techniques incorporate object-specific a priori knowledge about facial features into pre-defined energy functions. Although these techniques may lead to quite accurate feature-extraction results, the fitting procedure may converge to incorrect local minima due to improper model initializations and large feature variances, resulting in a limited convergence performance.

The well-known active shape model (ASM) [3] and active appearance model (AAM) [4] implicitly define facial feature properties by using statistics, which automatically derives feature models by the use of a set of training samples. The statistics-based approach for feature extraction offers more flexibility, which also can be easily applied to other object structures. Later on, many derivatives are proposed. In direct AAM [11], it has been suggested that it is possible to predict the shape directly from the texture when the two are sufficiently correlated. In [17], a nonlinear extension has been proposed to enhance the performance of the conventional ASM.

From the above, we can see that it is generally difficult for a single technique to achieve high performance in both robustness and accuracy. Classification-based algorithms are not influenced by initializations and therefore are more robust. The statistical point models can give fairly accurate results, but they are more sensitive to model initializations. To this end, many recent proposals explore different properties of facial features in order to benefit from various algorithms. For example, it is well-known that ASM/AAM performs poorly if the model is not initialized properly. In [13], a rough estimation of major feature components (e.g. eyes and mouth) is used based on color information, in order to provide a better model initialization for ASM. In [6], the authors first apply a set of independent feature detectors for 17 feature points using the Adaboost classification technique. Following that, a conventional AAM is applied to refine the results.

In this paper we focus on using multiple algorithms to improve the facial feature-extraction performance with respect to both robustness and accuracy. As compared to previous approaches, the following differences become apparent.

  • (1)

    We have clearly defined a number of performance metrics (e.g. the capture range, accuracy threshold and localization error) to characterize a model-based algorithm. These concepts have not been precisely defined previously in literature. The performance metrics are used extensively later on in the implementation of a cascaded algorithm to guide the optimization of model parameters.

  • (2)

    In previous approaches, several techniques are combined heuristically without quantitatively addressing their relation in performance. In our approach, we quantitatively analyze the coupling between multiple algorithms and propose ways to optimize the coordination between the algorithms.

  • (3)

    We have proposed an implementation example using three cascaded algorithms. The proposed principles are applied to guide the optimization of the algorithm parameters such that the overall performance is enhanced.

The remainder of the paper is organized as follows. In Section 2, we first define several performance metrics to characterize model-based algorithms. Section 3 presents the cascaded feature-extraction framework, where two important principles are proposed to provide guidance for the subsequent algorithm design. Following that, Section 4 gives a brief overview of an implementation of the extraction cascade consisting of three component algorithms, namely, sparse-graph search (SGS), prediction-based texture fitting and appearance-based feature refinement. Sections 5–7 present the detailed illustrations of the three algorithms. Section 8 analyzes the overall performance of the system and Section 9 concludes the paper.

Section snippets

Performance metrics for model-based algorithms

In this section, we introduce several performance metrics for model-based algorithms for facial feature extraction. These metrics and related concepts are used extensively in the sequel of this paper to characterize different algorithms.

Design principles

In this section, we propose some basic design principles to guide the construction of a feature-extraction cascade, based on the performance measurements of individual algorithms. In this framework, each constituting algorithm in the cascade receives inputs from the preceding algorithm and refines the extraction.

In Fig. 2, suppose algorithm A1 has a large capture range but low extraction accuracy, while algorithm A2 has a small capture range but high extraction accuracy, the cascaded algorithm A

Overview of cascaded extraction algorithm

We propose three novel algorithms for constituting an extraction cascade, as depicted in Fig. 3. The three constituting algorithms are briefly summarized as follows:

  • (1)

    Sparse-graph search (SGS): SGS is the first algorithm in the cascaded framework, and aims at finding the center locations of six facial features, namely, eyes (left and right), eyebrows (left and right), nose and mouth. Here the extraction accuracy is trade-off against large capture ranges and reliability.

  • (2)

    Component-based texture

Sparse-graph search (SGS)

The aim of the SGS is to estimate facial feature locations at a coarse granularity with the minimum cost. More specifically, we define six facial features corresponding to six prominent feature regions, i.e. eyes (left and right), eyebrows (left and right), nose and mouth. Given a coarsely estimated face region, SGS aims at finding the rough locations of these features, which can be used as inputs for the subsequent algorithms working at finer granularities.

In order to achieve a large capture

Component-based texture fitting (CTF)

In this section, we propose a more flexible feature model (CTF), which is used as the second step in the cascaded extraction (refer to Fig. 3). Each feature component defined in the previous section, e.g. an eye or mouth, is now represented by a shape parameterized by location, scale and rotation. The key objective of CTF is to find the optimal shape parameter based on the current texture enclosed by the shape. This is accomplished by using direct parameter prediction based on a set of training

Component-based direct fitting (CDF)

In this section, we present the third-stage algorithm in the cascaded extraction. More specifically, we refine the feature-extraction results for each facial feature by using a more flexible appearance model, with the aim to achieve better adaptation for individual feature instances. In this appearance model, in addition to the shape parameter vector p as used in CTF, we employ an additional appearance parameter vector a to model the feature shape and texture deformations.

Differing from CTF in

Performance evaluation

In this section, we give a brief summary of the characteristics and performance of the cascaded feature extraction proposed in this paper. We also quantitatively analyze the performance gain by using the cascade. In Table 3, we summarize the design motivations for the three component algorithms in the cascade. These algorithms capture various characteristics of facial features and give different extraction performance in terms of robustness (convergence) and accuracy. This is summarized as

Summary and conclusions

In this paper, we have first presented a cascaded facial feature-extraction framework. Within this framework, we have defined several metrics (capture range and average extraction accuracy) to measure the performance of a model-based algorithm. We have designed a new three-algorithm cascade for an incremental modeling of facial features. Our approach uses an incremental modeling of facial feature structures, with additional parameters incorporated at each stage, such that the model has more

References (22)

  • P. Phillips et al.

    The FERET database and evaluation procedure for face recognition algorithms

    Image Vision Comput.

    (1998)
  • E. Saber et al.

    Frontal-view face detection and facial feature extraction using color, shape and symmetry based cost functions

    Pattern Recognition Lett.

    (1998)
  • A. Blake et al.

    Active Contours

    (1998)
  • C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, software available at, 2001...
  • T. Cootes

    An Introduction to Active Shape Models, Image Processing and Analysis

    (2000)
  • T. Cootes et al.

    Active appearance models

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2001)
  • D. Cristinacce et al.

    Facial feature detection using AdaBoost with shape constraints

  • D. Cristinacce et al.

    A multistage approach to facial feature detection

  • B. Fröba et al.

    Real-time face detection using edge-orientation matching

  • Y. Gizatdinova et al.

    Feature-based detection of facial landmarks from neutral and expressive facial images

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2006)
  • GNU Scientific Library GSL...
  • Cited by (8)

    View all citing articles on Scopus
    View full text