Top

International Journal of Computer Vision

Published in:

01-04-2014

Face Alignment by Explicit Shape Regression

Authors: Xudong Cao, Yichen Wei, Fang Wen, Jian Sun

Published in: International Journal of Computer Vision | Issue 2/2014

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

We present a very efficient, highly accurate, “Explicit Shape Regression” approach for face alignment. Unlike previous regression-based approaches, we directly learn a vectorial regression function to infer the whole facial shape (a set of facial landmarks) from the image and explicitly minimize the alignment errors over the training data. The inherent shape constraint is naturally encoded into the regressor in a cascaded learning framework and applied from coarse to fine during the test, without using a fixed parametric shape model as in most previous methods. To make the regression more effective and efficient, we design a two-level boosted regression, shape indexed features and a correlation-based feature selection method. This combination enables us to learn accurate models from large training data in a short time (20 min for 2,000 training images), and run regression extremely fast in test (15 ms for a 87 landmarks shape). Experiments on challenging data show that our approach significantly outperforms the state-of-the-art in terms of both accuracy and efficiency.

previous article The Shape Boltzmann Machine: A Strong Model of Object Shape

next article Max-Margin Early Event Detectors

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

It is also interesting to know that the mean shape is defined as the average of the normalized training shapes. Although it sounds like a circular definition, we still can compute the mean shape in an iterative way. Readers are recommended to Active Shape Model (Cootes et al. 1995) method for details.

Otherwise this degenerates to a one level boosted regression.

According to aforementioned definition, the global coordinates are computed via \(M_{S}^{-1} \circ (\pi _{l} \circ M_{S}^{-1} \circ S + \Delta ^{l})\). By simplifying this formula, we get Eq. (9)

Provided the range of pixel difference feature is \([-c, c]\), the range of the uniform distribution is \([-0.2c,0.2c]\).

We use random sampling for basis construction due to its simplicity and effectiveness. We also tried more sophisticated K-SVD method (Elad and Aharon 2006) for learning basis. It yields similar performance comparing with random sampling.

The median operation is performed on x and y coordinates of all landmarks individually. Although this may violate the shape constraint mentioned before, the resulting median shape is mostly correct as in most cases the multiple results are tightly clustered. We found such a simple median based fusion is comparable to more sophisticated strategies such as weighted combination of input shapes.

The relative improvement is the ratio between the error reduction and the original error.

Belhumeur et al. (2011) discussed in their work: “The localizer requires less than 1 s per fiducial on an Intel Core i7 3.06GHz machine”. We conjecture that it takes more than 10 s to locate 29 landmarks.

Belhumeur, P., Jacobs, D., Kriegman, D., & Kumar, N. (2011). Localizing parts of faces using a concensus of exemplars. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: Applications to image and text data. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) .

Cootes, T., Edwards, G., & Taylor, C. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685.CrossRef

Cootes, T., Taylor, C., Cooper, D., Graham, J., et al. (1995). Active shape models-their training and application. Computer Vision and Image Understanding, 61(1), 38–59.CrossRef

Cristinacce, D., & Cootes, T. (2006). Feature detection and tracking with constrained local models. In British Machine Vision Conference (BMVC).

Cristinacce, D., & Cootes, T. (2007). Boosted regression active shape models. In British Machine Vision Conference (BMVC).

Dollar, P., Welinder, P., & Perona, P. (2010). Cascaded pose regression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Duffy, N., & Helmbold, D. P. (2002). Boosting methods for regression. Machine Learning, 47(2–3), 153–200.CrossRefMATH

Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12), 3736–3745.CrossRefMathSciNet

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.CrossRefMATHMathSciNet

Huang, G., Mattar, M., Berg, T., Learned-Miller, E. et al. (2008) Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on Faces in’Real-Life’Images: Detection, Alignment, and Recognition.

Jesorsky, O., Kirchberg, K. J., & Frischholz, R. W. (2001). Robust face detection using the hausdorff distance (pp. 90–95). New York: Springer.

Jolliffe, I. (2005). Principal component analysis. Wiley Online Library.

Le, V., Brandt, J., Lin, Z., Bourdev, L., & Huang, T. (2012). Interactive facial feature localization. In European Conference on Computer Vision.

Liang, L., Xiao, R., Wen, F., & Sun, J. (2008). Face alignment via component-based discriminative search. In European Conference on Computer Vision (ECCV).

Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60(2), 135–164.CrossRef

Milborrow, S., & Nicolls, F. (2008). Locating facial features with an extended active shape model. In European Conference on Computer Vision (ECCV).

Ozuysal, M., Calonder, M., Lepetit, V., & Fua, P. (2010). Fast keypoint recognition using random ferns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448–461.

Saragih, J., & Goecke, R. (2007). A nonlinear discriminative approach to aam fitting. In International Conference on Computer Vision (ICCV) .

Sauer, P., & Cootes, C. T. T. (2011). Accurate regression procedures for active appearance models. In British Machine Vision Conference (BMVC).

Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., et al. (2011). Real-time human pose recognition in parts from single depth images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Sun, Y., Wang, X., & Tang, X. (2013). Deep convolutional network cascade for facial point detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Tropp, J., & Gilbert, A. (2007). Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53(12), 4655–4666.CrossRefMathSciNet

Valstar, M., Martinez, B., Binefa, X., & Pantic, M. (2010). Facial point detection using boosted regression and graph models. In IEEE Conference on Computeer Vision and Pattern Recognition (CVPR).

Vukadinovic, D., & Pantic, M. (2005). Fully automatic facial feature point detection using gabor feature based boosted classifiers. International Conference on Systems, Man and Cybernetics, 2, 1692–1698.CrossRef

Xiong, X., De la Torre, F. (2013) Supervised descent method and its applications to face alignment. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Zhou, S. K., & Comaniciu, D. (2007). Shape regression machine. In Information Processing in Medical Imaging, (pp. 13–25). Heidelberg: Springer.

Title: Face Alignment by Explicit Shape Regression
Authors: Xudong Cao
Yichen Wei
Fang Wen
Jian Sun
Publication date: 01-04-2014
Publisher: Springer US
Published in: International Journal of Computer Vision / Issue 2/2014
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-013-0667-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 2/2014

A Simple Prior-Free Method for Non-rigid Structure-from-Motion Factorization

A Closed-Form, Consistent and Robust Solution to Uncalibrated Photometric Stereo Via Local Diffuse Reflectance Maxima

Multi-Target Tracking by Online Learning a CRF Model of Appearance and Motion Patterns

Decomposing Global Light Transport Using Time of Flight Imaging

Guest Editorial: Geometry, Lighting, Motion, and Learning

Max-Margin Early Event Detectors

Premium Partner