Skip to main content
Top
Published in: International Journal of Computer Vision 2/2014

01-04-2014

Face Alignment by Explicit Shape Regression

Authors: Xudong Cao, Yichen Wei, Fang Wen, Jian Sun

Published in: International Journal of Computer Vision | Issue 2/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We present a very efficient, highly accurate, “Explicit Shape Regression” approach for face alignment. Unlike previous regression-based approaches, we directly learn a vectorial regression function to infer the whole facial shape (a set of facial landmarks) from the image and explicitly minimize the alignment errors over the training data. The inherent shape constraint is naturally encoded into the regressor in a cascaded learning framework and applied from coarse to fine during the test, without using a fixed parametric shape model as in most previous methods. To make the regression more effective and efficient, we design a two-level boosted regression, shape indexed features and a correlation-based feature selection method. This combination enables us to learn accurate models from large training data in a short time (20 min for 2,000 training images), and run regression extremely fast in test (15 ms for a 87 landmarks shape). Experiments on challenging data show that our approach significantly outperforms the state-of-the-art in terms of both accuracy and efficiency.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
It is also interesting to know that the mean shape is defined as the average of the normalized training shapes. Although it sounds like a circular definition, we still can compute the mean shape in an iterative way. Readers are recommended to Active Shape Model (Cootes et al. 1995) method for details.
 
2
Otherwise this degenerates to a one level boosted regression.
 
3
According to aforementioned definition, the global coordinates are computed via \(M_{S}^{-1} \circ (\pi _{l} \circ M_{S}^{-1} \circ S + \Delta ^{l})\). By simplifying this formula, we get Eq. (9)
 
4
Provided the range of pixel difference feature is \([-c, c]\), the range of the uniform distribution is \([-0.2c,0.2c]\).
 
5
We use random sampling for basis construction due to its simplicity and effectiveness. We also tried more sophisticated K-SVD method (Elad and Aharon 2006) for learning basis. It yields similar performance comparing with random sampling.
 
6
The median operation is performed on x and y coordinates of all landmarks individually. Although this may violate the shape constraint mentioned before, the resulting median shape is mostly correct as in most cases the multiple results are tightly clustered. We found such a simple median based fusion is comparable to more sophisticated strategies such as weighted combination of input shapes.
 
7
The relative improvement is the ratio between the error reduction and the original error.
 
8
Belhumeur et al. (2011) discussed in their work: “The localizer requires less than 1 s per fiducial on an Intel Core i7 3.06GHz machine”. We conjecture that it takes more than 10 s to locate 29 landmarks.
 
Literature
go back to reference Belhumeur, P., Jacobs, D., Kriegman, D., & Kumar, N. (2011). Localizing parts of faces using a concensus of exemplars. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Belhumeur, P., Jacobs, D., Kriegman, D., & Kumar, N. (2011). Localizing parts of faces using a concensus of exemplars. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
go back to reference Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: Applications to image and text data. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) . Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: Applications to image and text data. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) .
go back to reference Cootes, T., Edwards, G., & Taylor, C. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685.CrossRef Cootes, T., Edwards, G., & Taylor, C. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685.CrossRef
go back to reference Cootes, T., Taylor, C., Cooper, D., Graham, J., et al. (1995). Active shape models-their training and application. Computer Vision and Image Understanding, 61(1), 38–59.CrossRef Cootes, T., Taylor, C., Cooper, D., Graham, J., et al. (1995). Active shape models-their training and application. Computer Vision and Image Understanding, 61(1), 38–59.CrossRef
go back to reference Cristinacce, D., & Cootes, T. (2006). Feature detection and tracking with constrained local models. In British Machine Vision Conference (BMVC). Cristinacce, D., & Cootes, T. (2006). Feature detection and tracking with constrained local models. In British Machine Vision Conference (BMVC).
go back to reference Cristinacce, D., & Cootes, T. (2007). Boosted regression active shape models. In British Machine Vision Conference (BMVC). Cristinacce, D., & Cootes, T. (2007). Boosted regression active shape models. In British Machine Vision Conference (BMVC).
go back to reference Dollar, P., Welinder, P., & Perona, P. (2010). Cascaded pose regression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Dollar, P., Welinder, P., & Perona, P. (2010). Cascaded pose regression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
go back to reference Duffy, N., & Helmbold, D. P. (2002). Boosting methods for regression. Machine Learning, 47(2–3), 153–200.CrossRefMATH Duffy, N., & Helmbold, D. P. (2002). Boosting methods for regression. Machine Learning, 47(2–3), 153–200.CrossRefMATH
go back to reference Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12), 3736–3745.CrossRefMathSciNet Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12), 3736–3745.CrossRefMathSciNet
go back to reference Huang, G., Mattar, M., Berg, T., Learned-Miller, E. et al. (2008) Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on Faces in’Real-Life’Images: Detection, Alignment, and Recognition. Huang, G., Mattar, M., Berg, T., Learned-Miller, E. et al. (2008) Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on Faces in’Real-Life’Images: Detection, Alignment, and Recognition.
go back to reference Jesorsky, O., Kirchberg, K. J., & Frischholz, R. W. (2001). Robust face detection using the hausdorff distance (pp. 90–95). New York: Springer. Jesorsky, O., Kirchberg, K. J., & Frischholz, R. W. (2001). Robust face detection using the hausdorff distance (pp. 90–95). New York: Springer.
go back to reference Jolliffe, I. (2005). Principal component analysis. Wiley Online Library. Jolliffe, I. (2005). Principal component analysis. Wiley Online Library.
go back to reference Le, V., Brandt, J., Lin, Z., Bourdev, L., & Huang, T. (2012). Interactive facial feature localization. In European Conference on Computer Vision. Le, V., Brandt, J., Lin, Z., Bourdev, L., & Huang, T. (2012). Interactive facial feature localization. In European Conference on Computer Vision.
go back to reference Liang, L., Xiao, R., Wen, F., & Sun, J. (2008). Face alignment via component-based discriminative search. In European Conference on Computer Vision (ECCV). Liang, L., Xiao, R., Wen, F., & Sun, J. (2008). Face alignment via component-based discriminative search. In European Conference on Computer Vision (ECCV).
go back to reference Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60(2), 135–164.CrossRef Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60(2), 135–164.CrossRef
go back to reference Milborrow, S., & Nicolls, F. (2008). Locating facial features with an extended active shape model. In European Conference on Computer Vision (ECCV). Milborrow, S., & Nicolls, F. (2008). Locating facial features with an extended active shape model. In European Conference on Computer Vision (ECCV).
go back to reference Ozuysal, M., Calonder, M., Lepetit, V., & Fua, P. (2010). Fast keypoint recognition using random ferns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448–461. Ozuysal, M., Calonder, M., Lepetit, V., & Fua, P. (2010). Fast keypoint recognition using random ferns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448–461.
go back to reference Saragih, J., & Goecke, R. (2007). A nonlinear discriminative approach to aam fitting. In International Conference on Computer Vision (ICCV) . Saragih, J., & Goecke, R. (2007). A nonlinear discriminative approach to aam fitting. In International Conference on Computer Vision (ICCV) .
go back to reference Sauer, P., & Cootes, C. T. T. (2011). Accurate regression procedures for active appearance models. In British Machine Vision Conference (BMVC). Sauer, P., & Cootes, C. T. T. (2011). Accurate regression procedures for active appearance models. In British Machine Vision Conference (BMVC).
go back to reference Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., et al. (2011). Real-time human pose recognition in parts from single depth images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., et al. (2011). Real-time human pose recognition in parts from single depth images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
go back to reference Sun, Y., Wang, X., & Tang, X. (2013). Deep convolutional network cascade for facial point detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Sun, Y., Wang, X., & Tang, X. (2013). Deep convolutional network cascade for facial point detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
go back to reference Tropp, J., & Gilbert, A. (2007). Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53(12), 4655–4666.CrossRefMathSciNet Tropp, J., & Gilbert, A. (2007). Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53(12), 4655–4666.CrossRefMathSciNet
go back to reference Valstar, M., Martinez, B., Binefa, X., & Pantic, M. (2010). Facial point detection using boosted regression and graph models. In IEEE Conference on Computeer Vision and Pattern Recognition (CVPR). Valstar, M., Martinez, B., Binefa, X., & Pantic, M. (2010). Facial point detection using boosted regression and graph models. In IEEE Conference on Computeer Vision and Pattern Recognition (CVPR).
go back to reference Vukadinovic, D., & Pantic, M. (2005). Fully automatic facial feature point detection using gabor feature based boosted classifiers. International Conference on Systems, Man and Cybernetics, 2, 1692–1698.CrossRef Vukadinovic, D., & Pantic, M. (2005). Fully automatic facial feature point detection using gabor feature based boosted classifiers. International Conference on Systems, Man and Cybernetics, 2, 1692–1698.CrossRef
go back to reference Xiong, X., De la Torre, F. (2013) Supervised descent method and its applications to face alignment. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Xiong, X., De la Torre, F. (2013) Supervised descent method and its applications to face alignment. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
go back to reference Zhou, S. K., & Comaniciu, D. (2007). Shape regression machine. In Information Processing in Medical Imaging, (pp. 13–25). Heidelberg: Springer. Zhou, S. K., & Comaniciu, D. (2007). Shape regression machine. In Information Processing in Medical Imaging, (pp. 13–25). Heidelberg: Springer.
Metadata
Title
Face Alignment by Explicit Shape Regression
Authors
Xudong Cao
Yichen Wei
Fang Wen
Jian Sun
Publication date
01-04-2014
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 2/2014
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-013-0667-3

Other articles of this Issue 2/2014

International Journal of Computer Vision 2/2014 Go to the issue

Premium Partner