Regression-based Active Appearance Model initialization for facial feature tracking with missing frames☆
Introduction
The Active Appearance Model (AAM) (Cootes et al., 2001) receives a significant amount of attention from the computer vision community in terms of deformable visual objects registration. A variety of applications are possible, including dynamic head pose and gaze estimation for real-time user interfaces, expression recognition, and lip reading.
AAMs generally treat registration as an optimization problem solved by local minimization methods. However, the gradient-descent-based fitting scheme is inherently dependent on good initializations. Poor initializations of model parameters cause the optimization to stick easily into the local minimum and diverge away from the target. This problem is prominent in the registration of images with considerable shape variations, which often occurs in lossy multimedia mobile network where some frames are unavailable due to unstable wireless connection and narrow bandwidth. Therefore, robust initializations for AAM tracking are extremely desirable to deal with frame loss in such a lossy environment.
The simplest method to initialize the model is through brute-force searching, which iteratively tests every possible configuration. However this method is extremely time-consuming because of the huge number of possible initializations. Stegmann (2000) suggested performing an AAM search in parallel with different initialization parameters, i.e., perturbed pose and model parameters. This method is less time-consuming compared to brute-force searching. However, it is still far from efficient, especially for real-time applications.
In most AAM fitting cases whose targets are facial images, the parameters of the shape model are roughly estimated by the detected face and facial features such as eyes, mouth centers, and nose tips (Rara et al., 2009, Rabie et al., 2008, Wong and Chung, 2010). After detecting these facial features, the AAM base mesh is warped to these points for AAM initialization. The initialization accuracy increases with more detailed facial features. However, the performance of these methods highly depends on the accuracy of feature detection, which will decrease at the presence of varying facial contexts and complex backgrounds. Wimmer (2008) used a learned Active Shape Model (ASM) fitting for AAM initialization because it provides stable results for the entire set of experiments, even in cases of poor initial parameter estimates as determined by a face detector.
In fitting an AAM to video sequences, conventional methods directly fit the AAM to each frame using the fitting results, i.e., shape and appearance parameters, of the previous frame as the initialization of the current frame (Ionita et al., 2011, Liu, 2010, Cristinacce and Cootes, 2008, Saragih et al., 2011, Sung and Kim, 2009). However, this method is only suitable for small movements between frames. Cui and Jin (2012) used the Lucas–Kanade optical flow to track several salient feature points, which were then used to constrain AAM shape initialization. The method considered inter-frame correspondences. However, its performance relied on salient feature tracking results, and only the similarity preservation of the two frames was considered. This approach limited the initialization performance because internal and external changes may disable salient feature tracking, and the transformation between two frames is usually more complicated.
One possible solution is to locate a sparse set of local points on each image and use them to conduct the initialization. In Feng et al. (2011), local feature matching between neighboring frames was adopted to predict the initial three-dimensional (3D) AAM parameters, wherein 3D pose estimation was conducted using a 3D-shaped model constraint. This method is also time-consuming and is designed for 3D-based face tracking.
A regression-based approach is proposed in the current paper for efficient two-dimensional AAMs initialization. Instead of looking for salient features like eyes and mouth, we use local sparse/scattered feature correspondence for AAMs initializations. The relationship between the local features and the global shape is obtained from the training data. By establishing an inter-frame relationship that combines the local and global facial features, compared to a previous work, the approach becomes more robust to external facial context changes (illuminations, viewpoints, etc.) and internal changes (expressions, glass-bearing, etc.). This improvement makes this approach more suitable for lossy network where the variation between neighbor frames may be large.
Fig. 1 shows a diagram of the proposed method. The landmarks in the first frame are manually annotated to initialize tracking. The AAM for the rest frames during tracking is initialized by a local-landmark (L2L) mapping based on Kernel Ridge Regression (KRR) (Trevor Hastie and Friedman, 2009). This method exploits the spatial relationship between scattered local invariant features and structured facial annotation points. To improve initialization accuracy, an improved local feature correspondence strategy called dual-threshold scale invariant feature transform (SIFT) matching is also presented in this paper as one of the supporting strategies.
The proposed AAM initialization framework has two main contributions: (1) a data-driven approach is proposed to identify the shape correspondence between sequential images of faces from their scattered local feature matching; and (2) an accurate match strategy for local feature correspondences of consecutive frames, which improves the accuracy of tracking results. The proposed initialization method helps AAMs to converge and accurately localize the facial features during tracking. This method outperforms other AAM initializations in terms of convergence rate and tracking accuracy, especially in lossy network where some frames are missing.
The remainder of this paper is organized as follows: Section 2 describes the proposed regression-based AAM initialization approach in detail. Section 3 introduces some supporting strategies for performance improvement. Section 4 and Section 5 present the experimental results and the conclusions respectively.
Section snippets
Regression based AAM initialization
We briefly introduce KRR in this section before investigating the initialization process through a map obtained from the scattered/sparse local feature correspondence space to the structured landmark space.
Assistant strategies
Some strategies for performance improvement are introduced in this section.
Experiments and discussion
This section demonstrates the effectiveness of the proposed approach in fitting facial video sequences. The proposed initialization method is compared with the following approaches: (1) The general AAM initialization which takes the previous frame as the initialization of the current frame (Ionita et al., 2011, Liu, 2010, Cristinacce and Cootes, 2008, Saragih et al., 2011, Sung and Kim, 2009); (2) The recently proposed initialization method which used the Lucas-Kanade optical flow to track some
Conclusion
An approach for automatic AAM initialization during facial features tracking is presented. By establishing a spatial relationship between local and landmark points, the approach helps improve the performance of AAM trackers in terms of accuracy and efficiency, especially in lossy network where some frames may be unavailable and the variation between consecutive frames is unstable. The proposed framework is validated by tracking facial features in image sequences with different data for training.
Acknowledgment
This work was supported by the National Natural Science Foundation of China (61104213), Natural Science Foundation of Jiangsu Province (BK2011146), and Opening Fund of Key Laboratory of System Control and Information Processing (Ministry of Education) at Shanghai Jiaotong University (SCIP2011008).
References (19)
- et al.
Automatic feature localisation with constrained local models
Pattern Recogn.
(2008) Video-based face model fitting using adaptive active appearance model
Image Vision Comput.
(2010)- et al.
Adaptive active appearance model with incremental learning
Pattern Recogn. Lett.
(2009) - Aran, O., Ari, I., Guvensan, A., Haberdar, H., Kurr, Z., Turkmen, I., Uyar, A., Akarun, L., 2007. A database of...
- et al.
Active appearance models
IEEE Trans. Pattern Anal. Mach. Intell.
(2001) - et al.
Facial feature points tracking based on aam with optical flow constrained initialization
J. Pattern Recogn. Res.
(2012) - Feng, X., Shen, X., Zhou, M., Zhang, H., Kim, J., 2011. Robust facial expression tracking based on composite...
- FGNet, 2004. Fgnet talking face video....
- Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S. 2007. Guide to the cmu multi-pie database, Technical report,...
Cited by (9)
The decadal perspective of facial emotion processing and Recognition: A survey
2022, DisplaysCitation Excerpt :Secondly, several methods for integration of the DAFs into smooth and continuous operating space are used. Some advanced AAM models have also been recently developed, such as AAM-based Directed Gradient (HOG) histograms dense-based AAM, and AAM-based regression [75]. The efficacy of these newly developed AAM variants on FER is a fascinating piece of work to investigate.
A robust incremental clustering-based facial feature tracking
2017, Applied Soft Computing JournalCitation Excerpt :The texture and shape information are combined in one PCA space to the AAM for model matching. A regression-based approach, Kernel Ridge Regression (KRR) [31] is proposed for automatic initialization to handle missing frames during tracking. Although AAM provides a more robust model compared with the original ASM, the problem with both the strategies is that an accurate initialization to best fit the model, else the methods are prone to local minima.
Weighted-fusion feature of MB-LBPUH and HOG for facial expression recognition
2020, Soft ComputingFace landmark point tracking using LK pyramid optical flow
2018, Proceedings of SPIE - The International Society for Optical EngineeringRGB-D Sensor for Facial Expression Recognition in AAL Context
2018, Lecture Notes in Electrical Engineering
- ☆
This paper has been recommended for acceptance by C. Luengo.