Adaptive active appearance model with incremental learning

doi:10.1016/j.patrec.2008.11.006

Pattern Recognition Letters

Volume 30, Issue 4, 1 March 2009, Pages 359-367

https://doi.org/10.1016/j.patrec.2008.11.006 Get rights and content

Abstract

The active appearance model (AAM) is a well-known model that can represent a non-rigid object like the face effectively. However, the AAM often fails to converge correctly when the illumination conditions of face images change largely because it uses a set of fixed appearance basis vectors that are usually obtained in a training phase. To overcome this problem, we propose an adaptive AAM that updates the appearance basis vectors with the current face image by the incremental principal component analysis (PCA). However, the update of the appearance basis vectors with ill-fitted face images can worsen the AAM fitting to the forthcoming face images. To avoid this situation, we devise a conditional update method that updates the appearance basis vectors when the AAM fitting is good and the AAM reconstruction error is large. We evaluate the goodness of AAM fitting in terms of the number of outliers. When the AAM fitting is good we update the online appearance model (OAM) parameters, where the OAM is taken to keep the variation of input face image continuously, and also evaluate the goodness of the appearance basis vectors in terms of the magnitude of AAM reconstruction error. When the appearance basis vectors of the current AAM produces a large AAM reconstruction error, we update the appearance basis vectors using the incremental PCA. The proposed conditional update of the appearance basis vectors stabilizes the AAM fitting and improves the face tracking performance especially when the illumination condition changes very dynamically. Experimental results show that the adaptive AAM is superior to the conventional AAM in terms of the occurrence rate of fitting error and the fitting accuracy.

Introduction

Active appearance models (AAMs) are the flexible linear models that can represent the shapes and appearance variations of non-rigid objects (Cootes et al., 2001, Matthews and Baker, 2004). They are specifically popular for tracking the facial feature points because the human face has a variety of shape and appearance variations among different people. Matthews and Baker (2004) modeled the feature points of the human face using the AAM and proposed a fast fitting algorithm. Xiao et al. (2004) introduced the 3D shape model which limited the variation of 2D shape in order to construct a more robust AAM. Sung and Kim (2006) proposed a stereo AAM that could fit to a stereo image and developed a facial expression system by tracking the facial feature points.

However, AAMs are extremely sensitive to several factors such as poses, expressions, and illumination changes. Specifically, the AAM fitting performance is drastically degraded when the illumination characteristic of an input image is deviated from that of the training images. Many researcher have tried to overcome the effect of illumination. Cao et al. (2003) used the histogram equalization and the gamma correction. Wang et al. (2003) introduced a ratio image that transformed the illumination of input image into the illumination of reference image. However, the ratio image could not be appropriate for the real-time application due to the long computation time. Blanz and Vetter, 1999, Wang et al., 2003 built the parametric 3D face model that allowed for the variations of the 3D shape and texture and solved the illumination problems by estimating the illumination effects from 3D surface normals. Hager and Belhumeur (1998) removed the illumination effects by finding the bases of the illumination from the normal vectors and SVD method. But, the data points in the 3D face models are so dense that we cannot handle them in the real-time.

One simple way to solve this problem is to train the AAM with a large amount of training images with all possible illumination conditions. However, it is difficult to collect the training images with all possible illumination conditions and to represent all possible illumination changes no matter how the AAM is trained by such a rich training set.

In this paper, we propose an adaptive AAM that updates its linear appearance model to represent a given data well throughout the incremental PCA technique (Artac et al., 2002, Hall et al., 1998, Li, 2004). The proposed adaptive AAM is able to treat the input image with the unknown illumination condition because it puts the different illumination condition into the updated appearance basis vectors using the incremental PCA technique. However, a wrong update by the ill-fitted input image can degrade the fitting performance. So, we need to update the appearance basis vectors selectively when the AAM fitting is good, which requires a quantitative measure that evaluates the goodness of AAM fitting.

Generally, we use the AAM fitting error as a measure of the goodness of AAM fitting. However, it is necessary to know the ground truth of shape vector in order to compute the AAM fitting error,¹ but it is impossible to know the ground truth of the shape vector in the test phase. To solve this problem, we propose to take the number of outlier pixels as an alternative measure of the goodness of AAM fitting because there is a strong correlation between the ill-fitting and the outlier pixels in that an ill-fitted pixel is treated as an outlier pixel. Gross et al. (2004) introduced the idea of outlier pixels by using the robust statistics to improve the AAM fitting. However, this method was not effective when the illumination of the input face image was changing. To overcome this limitation, we take the adaptive appearance model (McKenna et al., 1997) that makes the computation of outlier pixels robust to the change of illumination.

Fleet et al. (2003) proposed an online appearance model (OAM) for the robust visual tracking. Their approach involved a mixture of Gaussian models of three components such as stable image structure S, the lost component L for the outliers, and the wandering component W for two-frame variations. Zhou et al. (2004) modified the existing online appearance model (Fleet et al., 2003) by using a fixed template F instead of the lost component L for an effective occlusion handling and acquired the outlier pixels in the manner of robust statistics. We propose a novel online appearance model that modifies the existing OAM by taking the reconstructed appearance image R instead of the fixed template F, where the reconstructed appearance image is more adaptive to the input image than the fixed template F. In this work, the OAM parameters such as means, variances, and mixing densities of three components S, L, and R are updated by the EM learning only when the AAM fitting is good, which is determined by the number of outlier pixels.

We can keep a good AAM fitting with the help of the online appearance model while the illumination condition is sustaining. However, when the illumination condition of the input image is changed at a certain time, the AAM fitting becomes unstable because the current appearance basis vectors are not appropriate for representing the changed input image and this causes an increase in the reconstruction error.² To avoid this situation, we update the current appearance basis vectors of the AAM by changing the input image using the incremental PCA technique. This selective update of the appearance basis vectors of the AAM provides the illumination adaptiveness to the AAM by using a minimum load of just the necessary updates.

The organization of this paper is as following. Section 2 reviews related studies such as the active appearance model, the incremental principal component analysis, and the online appearance model. Section 3 explains the proposed adaptive AAM algorithm. Section 4 presents some experimental results and discussion. Finally, Section 5 draws our conclusion.

Section snippets

Active appearance models

In AAMs (Cootes et al., 2001), the 2D shape is represented by triangulated meshes with l vertices, which correspond to salient points of an objects. Mathematically, the shape vector s is defined by the 2D coordinates of the l vertices that make up the mesh as $s = (x_{1}, y_{1}, \dots, x_{l}, y_{l})^{t}$ and shape variation is expressed by a linear combination of a mean shape s₀ and n shape base vectors $s_{i}$ as $s = s_{0} + \sum_{i = 1}^{l} p_{i} s_{i},$ where $p = (p_{1}, p_{2}, \dots, p_{l})^{T}$ is a shape parameter vector. The shape basis vectors are normally obtained by

AAM fitting quality measurement

As mentioned earlier, the AAM fitting error, which is defined by the difference between the warped patch image and the model appearance image, is not appropriate for measuring the AAM fitting quality to determine whether the AAM fitting is good or bad. The reason is that we do not know whether a high value of the AAM fitting error comes from the ill-fitted result or the illumination change.

Fig. 1 shows a case that the AAM fitting error is misused for evaluating the goodness of the AAM fitting,

Data set

To prove the effectiveness of the proposed adaptive AAM algorithm, we performed several experiments using two face image sequences that were captured under different lighting conditions, and the images were saved as 320 × 240 gray images.

The first image sequence consists of 100 face images that are captured under the normal lighting condition. Fig. 5a shows some face images in the first image sequence, where the facial expression is varying over the sequence. The second image sequence consists of

Conclusion

Because the AAM uses a subspace appearance model that has been learned from a set of training images, it can fit to the images that are similar to the training images. However, it is so sensitive to the illumination change, especially when the illumination condition of input images are different from that of training images, that it produces unstable fitting performances.

To alleviate this problem, we proposed an adaptive AAM that updates the appearance bases incrementally to adapt itself to the

Acknowledgement

This work was supported by the Korea Science and Engineering Foundation (KOSEF) through the Biometrics Engineering Research Center (BERC) at Yonsei University (R112002105070030(2008)) and also was supported by the Intelligent Robotics Development Program, one of the 21st Century Frontier R&D Programs funded by the Ministry of Commerce, Industry and Energy (MOCIE).

References (20)

Y. Li
On incremental and robust subspace learning
Pattern Recognition
(2004)
Artac, M., Jogan, M., Leonardis, A., 2002. Incremental PCA for on-line visual learning and recognition. In: Proc....
S. Baker et al.
Lucas–Kanade 20 years on: A unifying framework
Internat. J. Comput. Vision
(2004)
Blanz, V., Vetter, T., 1999. A morphable model for the synthesis of 3d faces. In: Proc. Computer Graphics, Annual Conf....
Cao, B., Shan, S., Gao, W., Zhao, D., 2003. Illumination normalization for robust face recognition against varying...
T. Cootes et al.
Active appearance models
IEEE Trans. Pattern Anal. Machine Intell.
(2001)
D. Fleet et al.
Robust online appearance models for visual tracking
IEEE Trans. Pattern Anal. Machine Intell.
(2003)
Gross, R., Matthews, I., Baker, S., 2003. Lucas–Kanade 20years on: A unifying framework: Part 3, CMU-RI-TR-03-05,...
Gross, R., Matthews, I., Baker, S., 2004. Constructing and fitting active appearance models with occlusion. In: Proc....
G. Hager et al.
Efficient region tracking with parametric models of geometry and illumination
IEEE Trans. Pattern Anal. Machine Intell.
(1998)

There are more references available in the full text version of this article.

Cited by (39)

Distance-based online classifiers
2016, Expert Systems with Applications
Main impact of the paper is proposing a family of algorithms for the online learning and classification. These algorithms work in rounds, where at each round a new instance is given and the algorithm makes a prediction. After the true class of the instance is revealed, the learning algorithm updates its internal hypothesis. The proposed algorithms are based on fuzzy C-means clustering and kernel-based fuzzy C-means clustering, followed by a calculation of distances between cluster centroids and the incoming instance for which the class label is to be predicted. In one of the proposed variants, simple distance-based classifiers thus obtained serve as basic classifiers for the implemented Rotation Forest ensemble classifier, which increases the accuracy of classification. In the paper we also propose using kernelized fuzzy C-means clustering method as an alternative approach to constructing distance based online classifiers. The approach allows to construct online classifiers of the polynomial computational complexity which is a significant feature considering potential application to the big data analysis. Using the kernelized clustering is advantageous since it allows for automatic estimation of the number of clusters maintaining the number of the user-defined parameters. The proposed classification algorithms are validated experimentally. Experiment results show that the approach assures good quality of classification, extending the range of the available online approaches.
Sequentially adaptive active appearance model with regression-based online reference appearance template
2016, Journal of Visual Communication and Image Representation
Citation Excerpt :
To tackle the problems, incremental learning approaches were introduced to AAMs. Ref. [11] presented an adaptive AAM that updates the appearance basis vectors with the input image by the incremental principal component analysis (PCA) in person specific AAM. However, the method needed images of same person for training, which is generally unpractical.
Statistically motivated approaches, such as the active appearance model (AAM), have been widely used for non-rigid objects registration and tracking. As an extension of AAM, sequential AAM (SAAM) was proposed, in which both an incremental updated component and a reference component were employed simultaneously in the fitting scheme. To make SAAM more adaptive to facial context variations during tracking, a regression-based online reference appearance model (ORAM) is presented to update the subject-specific appearance of the SAAM. The spatial map between scattered local feature correspondences and structured landmark correspondences is learned via Kernel Ridge Regression (KRR). Additionally, a shape deformation and appearance model evaluation strategies help to improve the accuracy and efficiency of the algorithm. The approach is experimentally validated by tracking face videos with improved fitting accuracy.
Unsupervised segmentation of the vocal tract from real-time MRI sequences
2015, Computer Speech and Language
Advances on real-time magnetic resonance imaging (RT-MRI) make it suitable to study the dynamic aspects of the upper airway. One of the main challenges concerns how to deal with the large amount of data resulting from these studies, particularly to extract relevant features for analysis such as the vocal tract profiles. A method is proposed, based on a modified active appearance model (AAM) approach, for unsupervised segmentation of the vocal tract from midsagittal RT-MRI sequences. The described approach was designed considering the low inter-frame difference. As a result, when compared to a traditional AAM approach, segmentation is performed faster and model convergence is improved, attaining good results using small training sets. The main goal is to extract the vocal tract profiles automatically, over time, providing identification of different regions of interest, to allow the study of the dynamic features of the vocal tract, for example, during speech production. The proposed method has been evaluated against vocal tract delineations manually performed by four observers, yielding good agreement.
Regression-based Active Appearance Model initialization for facial feature tracking with missing frames
2014, Pattern Recognition Letters
Citation Excerpt :
Wimmer (2008) used a learned Active Shape Model (ASM) fitting for AAM initialization because it provides stable results for the entire set of experiments, even in cases of poor initial parameter estimates as determined by a face detector. In fitting an AAM to video sequences, conventional methods directly fit the AAM to each frame using the fitting results, i.e., shape and appearance parameters, of the previous frame as the initialization of the current frame (Ionita et al., 2011; Liu, 2010; Cristinacce and Cootes, 2008; Saragih et al., 2011; Sung and Kim, 2009). However, this method is only suitable for small movements between frames.
The Active Appearance Model (AAM) is receiving considerable attention in the field of facial analysis as a powerful method for modeling and segmenting deformable visual objects. Several extensions and improvements have been proposed on the original AAM, but AAMs maintain their dependence on the good initialization of model parameters to achieve accurate fitting results. AAMs are usually used directly in video tracking by searching on each subsequent frame that employs the fitting result of the previous frame for initialization. However, this model sometimes fails when large movements exist between two frames. This mechanism occurs when frames are dropped from the video due to the use of a lossy multimedia network. A regression-based approach for automatic AAM initialization is presented in this paper. After undergoing a scattered feature correspondence based on a dual-threshold matching strategy, the AAM shape points are initialized by the spatial map between local-landmark (L2L) correspondences. The map is learned based on Kernel Ridge Regression (KRR). The proposed method can successfully track the frames that are not identified with the general AAM trackers by establishing spatial relationship between local and landmark points. The initialization is robust to disturbances, which enables it to outperform key-feature-tracking or detection-based methods. We demonstrate the efficacy of the approach on two challenging facial videos with different training data and report a detailed quantitative evaluation of its performance.
An incremental learning algorithm based on the K-associated graph for non-stationary data classification
2013, Information Sciences
Citation Excerpt :
However, this idea causes at least three prohibitive drawbacks: (i) retraining process usually is computationally expensive; (ii) deciding when a given classifier is no longer useful is quite challenging, mainly due to virtual concept drift; and (iii) selecting which data should be used to train a new classifier is also a challenging task. The practical solution to tackle classification on non-stationary domains is using an incremental learning algorithm [23,37]. Such approach enables a classifier to acquire knowledge during application phase and to update the model with new data without explicitly retraining it [28].
Non-stationary classification problems concern the changes on data distribution over a classifier lifetime. To face this problem, learning algorithms must conciliate essential, but difficult to gather, attributes like good classification performance, stability and low associated costs, like processing time and memory. This paper presents an extension of the K-associated optimal graph learning algorithm to cope with classification over non-stationary domains. The algorithm relies on a graph structure consisting of many disconnected components (subgraphs). Such graph enhances data representation by fitting locally groups of data according to a purity measure, which, in turn, quantifies the overlapping between vertices of different classes. As a result, the graph can be used to accurately estimate the probability of unlabeled data to belong to a given class. The proposed algorithm is benefited from the dynamical evolution of the graph by updating its set of components when new data is presented along time, by removing old components as new components arise. Experimental results on artificial and real domains and further statistical analysis show that the proposed algorithm is an effective solution to non-stationary classification problems.
AAM-based palm segmentation in unrestricted backgrounds and various postures for palmprint recognition
2013, Pattern Recognition Letters
In this paper, the AAM method with novel palm model is proposed for robust palm segmentation. The main advantages of this approach are the ability of efficient palm segmentation on the cluttered backgrounds and making a decision on whether the object in the scene is a palm with high accuracy. Especially, the proposed palm model eliminates the requirement that the whole hand image has to appear in the scene. The performance of the method is measured with two metrics which give more meaningful and quantitative results: the modified point-to-curve distance and a novel margin width suggested in this work. Furthermore, a novel device which performs the online palm image acquisition without any restriction has been developed. Experimental results on our palm image database denote that the proposed method is skillful for the palm segmentation and it can be used for further works.

View all citing articles on Scopus

View full text

Adaptive active appearance model with incremental learning

Abstract

Introduction

Section snippets

Active appearance models

AAM fitting quality measurement

Data set

Conclusion

Acknowledgement

Pattern Recognition

Lucas–Kanade 20 years on: A unifying framework

Internat. J. Comput. Vision

Active appearance models

IEEE Trans. Pattern Anal. Machine Intell.

Robust online appearance models for visual tracking

IEEE Trans. Pattern Anal. Machine Intell.

Efficient region tracking with parametric models of geometry and illumination

IEEE Trans. Pattern Anal. Machine Intell.