Facial feature extraction by a cascade of model-based algorithms
Introduction
Recent advances in image processing and vision computing enable automatic identification of human faces in visual scenes. One promising application for face recognition is consumer home applications, where face-recognition systems facilitate intelligent and convenient services for daily life.
Within the framework of the ITEA project HomeNet2Run [10], we have developed a face-identification system [22], which actively identifies home users and notifies other home devices (e.g. home server/television) to adapt services accordingly. Facial feature extraction is an indispensable step in the processing, which lays the basis for both face alignment to the database format and the subsequent recognition stage. The robustness and accuracy of the facial feature extraction directly influences the success of the final identification.
However, to achieve an automatic facial feature-extraction system, a number of challenges should be addressed.
- •
The facial feature appearances and their geometric relationships are considerably influenced by individual variations and also affected by various facial expressions.
- •
Especially for non-professional applications, the image-capturing environments are more variable than for professional cases. Particularly camera qualities and illumination conditions exert difficulties to the feature-extraction procedure.
In the following, we present a brief literature review of current feature-extraction techniques and point out their limitations.
In the past decade, a number of techniques have been proposed in literature on automatic facial feature extraction. Earlier techniques find the positions of facial features based on empirical knowledge about facial feature appearances (e.g. [12], [18]), such as eye symmetry and varying color distributions over the features. A more recent paper [8] makes use of edge properties around facial landmarks and reports quite good results. Some other techniques treat feature component (e.g. eyes and nose) localization as a two-class pattern-classification problem, where the classes are feature and non-feature. Typical examples include the eigenfeature-based techniques [19] and Adaboost-based techniques [5]. These techniques are relatively robust for large feature variances. However, they usually provide only coarse localization results.
Recently, model-based algorithms have received great attention in literature. A face model (e.g. a graph model or patch model) is usually devised to represent the salient facial features, and its parameters are optimized to fit to a new face. These techniques usually offer both accurate description of features and flexibility to adapt to individual feature variances. Some earlier examples of model-based techniques are active contour models [1] and deformable geometric templates [20]. These techniques incorporate object-specific a priori knowledge about facial features into pre-defined energy functions. Although these techniques may lead to quite accurate feature-extraction results, the fitting procedure may converge to incorrect local minima due to improper model initializations and large feature variances, resulting in a limited convergence performance.
The well-known active shape model (ASM) [3] and active appearance model (AAM) [4] implicitly define facial feature properties by using statistics, which automatically derives feature models by the use of a set of training samples. The statistics-based approach for feature extraction offers more flexibility, which also can be easily applied to other object structures. Later on, many derivatives are proposed. In direct AAM [11], it has been suggested that it is possible to predict the shape directly from the texture when the two are sufficiently correlated. In [17], a nonlinear extension has been proposed to enhance the performance of the conventional ASM.
From the above, we can see that it is generally difficult for a single technique to achieve high performance in both robustness and accuracy. Classification-based algorithms are not influenced by initializations and therefore are more robust. The statistical point models can give fairly accurate results, but they are more sensitive to model initializations. To this end, many recent proposals explore different properties of facial features in order to benefit from various algorithms. For example, it is well-known that ASM/AAM performs poorly if the model is not initialized properly. In [13], a rough estimation of major feature components (e.g. eyes and mouth) is used based on color information, in order to provide a better model initialization for ASM. In [6], the authors first apply a set of independent feature detectors for 17 feature points using the Adaboost classification technique. Following that, a conventional AAM is applied to refine the results.
In this paper we focus on using multiple algorithms to improve the facial feature-extraction performance with respect to both robustness and accuracy. As compared to previous approaches, the following differences become apparent.
- (1)
We have clearly defined a number of performance metrics (e.g. the capture range, accuracy threshold and localization error) to characterize a model-based algorithm. These concepts have not been precisely defined previously in literature. The performance metrics are used extensively later on in the implementation of a cascaded algorithm to guide the optimization of model parameters.
- (2)
In previous approaches, several techniques are combined heuristically without quantitatively addressing their relation in performance. In our approach, we quantitatively analyze the coupling between multiple algorithms and propose ways to optimize the coordination between the algorithms.
- (3)
We have proposed an implementation example using three cascaded algorithms. The proposed principles are applied to guide the optimization of the algorithm parameters such that the overall performance is enhanced.
Section snippets
Performance metrics for model-based algorithms
In this section, we introduce several performance metrics for model-based algorithms for facial feature extraction. These metrics and related concepts are used extensively in the sequel of this paper to characterize different algorithms.
Design principles
In this section, we propose some basic design principles to guide the construction of a feature-extraction cascade, based on the performance measurements of individual algorithms. In this framework, each constituting algorithm in the cascade receives inputs from the preceding algorithm and refines the extraction.
In Fig. 2, suppose algorithm has a large capture range but low extraction accuracy, while algorithm has a small capture range but high extraction accuracy, the cascaded algorithm
Overview of cascaded extraction algorithm
We propose three novel algorithms for constituting an extraction cascade, as depicted in Fig. 3. The three constituting algorithms are briefly summarized as follows:
- (1)
Sparse-graph search (SGS): SGS is the first algorithm in the cascaded framework, and aims at finding the center locations of six facial features, namely, eyes (left and right), eyebrows (left and right), nose and mouth. Here the extraction accuracy is trade-off against large capture ranges and reliability.
- (2)
Component-based texture
Sparse-graph search (SGS)
The aim of the SGS is to estimate facial feature locations at a coarse granularity with the minimum cost. More specifically, we define six facial features corresponding to six prominent feature regions, i.e. eyes (left and right), eyebrows (left and right), nose and mouth. Given a coarsely estimated face region, SGS aims at finding the rough locations of these features, which can be used as inputs for the subsequent algorithms working at finer granularities.
In order to achieve a large capture
Component-based texture fitting (CTF)
In this section, we propose a more flexible feature model (CTF), which is used as the second step in the cascaded extraction (refer to Fig. 3). Each feature component defined in the previous section, e.g. an eye or mouth, is now represented by a shape parameterized by location, scale and rotation. The key objective of CTF is to find the optimal shape parameter based on the current texture enclosed by the shape. This is accomplished by using direct parameter prediction based on a set of training
Component-based direct fitting (CDF)
In this section, we present the third-stage algorithm in the cascaded extraction. More specifically, we refine the feature-extraction results for each facial feature by using a more flexible appearance model, with the aim to achieve better adaptation for individual feature instances. In this appearance model, in addition to the shape parameter vector as used in CTF, we employ an additional appearance parameter vector to model the feature shape and texture deformations.
Differing from CTF in
Performance evaluation
In this section, we give a brief summary of the characteristics and performance of the cascaded feature extraction proposed in this paper. We also quantitatively analyze the performance gain by using the cascade. In Table 3, we summarize the design motivations for the three component algorithms in the cascade. These algorithms capture various characteristics of facial features and give different extraction performance in terms of robustness (convergence) and accuracy. This is summarized as
Summary and conclusions
In this paper, we have first presented a cascaded facial feature-extraction framework. Within this framework, we have defined several metrics (capture range and average extraction accuracy) to measure the performance of a model-based algorithm. We have designed a new three-algorithm cascade for an incremental modeling of facial features. Our approach uses an incremental modeling of facial feature structures, with additional parameters incorporated at each stage, such that the model has more
References (22)
- et al.
The FERET database and evaluation procedure for face recognition algorithms
Image Vision Comput.
(1998) - et al.
Frontal-view face detection and facial feature extraction using color, shape and symmetry based cost functions
Pattern Recognition Lett.
(1998) - et al.
Active Contours
(1998) - C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, software available at, 2001...
An Introduction to Active Shape Models, Image Processing and Analysis
(2000)- et al.
Active appearance models
IEEE Trans. Pattern Anal. Mach. Intell.
(2001) - et al.
Facial feature detection using AdaBoost with shape constraints
- et al.
A multistage approach to facial feature detection
- et al.
Real-time face detection using edge-orientation matching
- et al.
Feature-based detection of facial landmarks from neutral and expressive facial images
IEEE Trans. Pattern Anal. Mach. Intell.
(2006)
Cited by (8)
Facial features detection and localization
2019, Studies in Computational IntelligenceRobust lip feature detection in facial images
2018, ICNC-FSKD 2017 - 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge DiscoveryAutomated conformance testing for ISO/IEC 19794-5 Standard on facial photo specifications
2013, International Journal of BiometricsVisual tracking of hands, faces and facial features of multiple persons
2012, Machine Vision and Applications3D chrominance histogram based face localisation
2011, International Journal of Signal and Imaging Systems Engineering