Skip to main content
Erschienen in: International Journal of Computer Vision 2/2014

01.04.2014

Face Alignment by Explicit Shape Regression

verfasst von: Xudong Cao, Yichen Wei, Fang Wen, Jian Sun

Erschienen in: International Journal of Computer Vision | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a very efficient, highly accurate, “Explicit Shape Regression” approach for face alignment. Unlike previous regression-based approaches, we directly learn a vectorial regression function to infer the whole facial shape (a set of facial landmarks) from the image and explicitly minimize the alignment errors over the training data. The inherent shape constraint is naturally encoded into the regressor in a cascaded learning framework and applied from coarse to fine during the test, without using a fixed parametric shape model as in most previous methods. To make the regression more effective and efficient, we design a two-level boosted regression, shape indexed features and a correlation-based feature selection method. This combination enables us to learn accurate models from large training data in a short time (20 min for 2,000 training images), and run regression extremely fast in test (15 ms for a 87 landmarks shape). Experiments on challenging data show that our approach significantly outperforms the state-of-the-art in terms of both accuracy and efficiency.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
It is also interesting to know that the mean shape is defined as the average of the normalized training shapes. Although it sounds like a circular definition, we still can compute the mean shape in an iterative way. Readers are recommended to Active Shape Model (Cootes et al. 1995) method for details.
 
2
Otherwise this degenerates to a one level boosted regression.
 
3
According to aforementioned definition, the global coordinates are computed via \(M_{S}^{-1} \circ (\pi _{l} \circ M_{S}^{-1} \circ S + \Delta ^{l})\). By simplifying this formula, we get Eq. (9)
 
4
Provided the range of pixel difference feature is \([-c, c]\), the range of the uniform distribution is \([-0.2c,0.2c]\).
 
5
We use random sampling for basis construction due to its simplicity and effectiveness. We also tried more sophisticated K-SVD method (Elad and Aharon 2006) for learning basis. It yields similar performance comparing with random sampling.
 
6
The median operation is performed on x and y coordinates of all landmarks individually. Although this may violate the shape constraint mentioned before, the resulting median shape is mostly correct as in most cases the multiple results are tightly clustered. We found such a simple median based fusion is comparable to more sophisticated strategies such as weighted combination of input shapes.
 
7
The relative improvement is the ratio between the error reduction and the original error.
 
8
Belhumeur et al. (2011) discussed in their work: “The localizer requires less than 1 s per fiducial on an Intel Core i7 3.06GHz machine”. We conjecture that it takes more than 10 s to locate 29 landmarks.
 
Literatur
Zurück zum Zitat Belhumeur, P., Jacobs, D., Kriegman, D., & Kumar, N. (2011). Localizing parts of faces using a concensus of exemplars. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Belhumeur, P., Jacobs, D., Kriegman, D., & Kumar, N. (2011). Localizing parts of faces using a concensus of exemplars. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Zurück zum Zitat Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: Applications to image and text data. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) . Bingham, E., & Mannila, H. (2001). Random projection in dimensionality reduction: Applications to image and text data. In ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) .
Zurück zum Zitat Cootes, T., Edwards, G., & Taylor, C. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685.CrossRef Cootes, T., Edwards, G., & Taylor, C. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685.CrossRef
Zurück zum Zitat Cootes, T., Taylor, C., Cooper, D., Graham, J., et al. (1995). Active shape models-their training and application. Computer Vision and Image Understanding, 61(1), 38–59.CrossRef Cootes, T., Taylor, C., Cooper, D., Graham, J., et al. (1995). Active shape models-their training and application. Computer Vision and Image Understanding, 61(1), 38–59.CrossRef
Zurück zum Zitat Cristinacce, D., & Cootes, T. (2006). Feature detection and tracking with constrained local models. In British Machine Vision Conference (BMVC). Cristinacce, D., & Cootes, T. (2006). Feature detection and tracking with constrained local models. In British Machine Vision Conference (BMVC).
Zurück zum Zitat Cristinacce, D., & Cootes, T. (2007). Boosted regression active shape models. In British Machine Vision Conference (BMVC). Cristinacce, D., & Cootes, T. (2007). Boosted regression active shape models. In British Machine Vision Conference (BMVC).
Zurück zum Zitat Dollar, P., Welinder, P., & Perona, P. (2010). Cascaded pose regression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Dollar, P., Welinder, P., & Perona, P. (2010). Cascaded pose regression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Zurück zum Zitat Duffy, N., & Helmbold, D. P. (2002). Boosting methods for regression. Machine Learning, 47(2–3), 153–200.CrossRefMATH Duffy, N., & Helmbold, D. P. (2002). Boosting methods for regression. Machine Learning, 47(2–3), 153–200.CrossRefMATH
Zurück zum Zitat Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12), 3736–3745.CrossRefMathSciNet Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12), 3736–3745.CrossRefMathSciNet
Zurück zum Zitat Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.CrossRefMATHMathSciNet Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.CrossRefMATHMathSciNet
Zurück zum Zitat Huang, G., Mattar, M., Berg, T., Learned-Miller, E. et al. (2008) Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on Faces in’Real-Life’Images: Detection, Alignment, and Recognition. Huang, G., Mattar, M., Berg, T., Learned-Miller, E. et al. (2008) Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on Faces in’Real-Life’Images: Detection, Alignment, and Recognition.
Zurück zum Zitat Jesorsky, O., Kirchberg, K. J., & Frischholz, R. W. (2001). Robust face detection using the hausdorff distance (pp. 90–95). New York: Springer. Jesorsky, O., Kirchberg, K. J., & Frischholz, R. W. (2001). Robust face detection using the hausdorff distance (pp. 90–95). New York: Springer.
Zurück zum Zitat Jolliffe, I. (2005). Principal component analysis. Wiley Online Library. Jolliffe, I. (2005). Principal component analysis. Wiley Online Library.
Zurück zum Zitat Le, V., Brandt, J., Lin, Z., Bourdev, L., & Huang, T. (2012). Interactive facial feature localization. In European Conference on Computer Vision. Le, V., Brandt, J., Lin, Z., Bourdev, L., & Huang, T. (2012). Interactive facial feature localization. In European Conference on Computer Vision.
Zurück zum Zitat Liang, L., Xiao, R., Wen, F., & Sun, J. (2008). Face alignment via component-based discriminative search. In European Conference on Computer Vision (ECCV). Liang, L., Xiao, R., Wen, F., & Sun, J. (2008). Face alignment via component-based discriminative search. In European Conference on Computer Vision (ECCV).
Zurück zum Zitat Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60(2), 135–164.CrossRef Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60(2), 135–164.CrossRef
Zurück zum Zitat Milborrow, S., & Nicolls, F. (2008). Locating facial features with an extended active shape model. In European Conference on Computer Vision (ECCV). Milborrow, S., & Nicolls, F. (2008). Locating facial features with an extended active shape model. In European Conference on Computer Vision (ECCV).
Zurück zum Zitat Ozuysal, M., Calonder, M., Lepetit, V., & Fua, P. (2010). Fast keypoint recognition using random ferns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448–461. Ozuysal, M., Calonder, M., Lepetit, V., & Fua, P. (2010). Fast keypoint recognition using random ferns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448–461.
Zurück zum Zitat Saragih, J., & Goecke, R. (2007). A nonlinear discriminative approach to aam fitting. In International Conference on Computer Vision (ICCV) . Saragih, J., & Goecke, R. (2007). A nonlinear discriminative approach to aam fitting. In International Conference on Computer Vision (ICCV) .
Zurück zum Zitat Sauer, P., & Cootes, C. T. T. (2011). Accurate regression procedures for active appearance models. In British Machine Vision Conference (BMVC). Sauer, P., & Cootes, C. T. T. (2011). Accurate regression procedures for active appearance models. In British Machine Vision Conference (BMVC).
Zurück zum Zitat Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., et al. (2011). Real-time human pose recognition in parts from single depth images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., et al. (2011). Real-time human pose recognition in parts from single depth images. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Zurück zum Zitat Sun, Y., Wang, X., & Tang, X. (2013). Deep convolutional network cascade for facial point detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Sun, Y., Wang, X., & Tang, X. (2013). Deep convolutional network cascade for facial point detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Zurück zum Zitat Tropp, J., & Gilbert, A. (2007). Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53(12), 4655–4666.CrossRefMathSciNet Tropp, J., & Gilbert, A. (2007). Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53(12), 4655–4666.CrossRefMathSciNet
Zurück zum Zitat Valstar, M., Martinez, B., Binefa, X., & Pantic, M. (2010). Facial point detection using boosted regression and graph models. In IEEE Conference on Computeer Vision and Pattern Recognition (CVPR). Valstar, M., Martinez, B., Binefa, X., & Pantic, M. (2010). Facial point detection using boosted regression and graph models. In IEEE Conference on Computeer Vision and Pattern Recognition (CVPR).
Zurück zum Zitat Vukadinovic, D., & Pantic, M. (2005). Fully automatic facial feature point detection using gabor feature based boosted classifiers. International Conference on Systems, Man and Cybernetics, 2, 1692–1698.CrossRef Vukadinovic, D., & Pantic, M. (2005). Fully automatic facial feature point detection using gabor feature based boosted classifiers. International Conference on Systems, Man and Cybernetics, 2, 1692–1698.CrossRef
Zurück zum Zitat Xiong, X., De la Torre, F. (2013) Supervised descent method and its applications to face alignment. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Xiong, X., De la Torre, F. (2013) Supervised descent method and its applications to face alignment. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Zurück zum Zitat Zhou, S. K., & Comaniciu, D. (2007). Shape regression machine. In Information Processing in Medical Imaging, (pp. 13–25). Heidelberg: Springer. Zhou, S. K., & Comaniciu, D. (2007). Shape regression machine. In Information Processing in Medical Imaging, (pp. 13–25). Heidelberg: Springer.
Metadaten
Titel
Face Alignment by Explicit Shape Regression
verfasst von
Xudong Cao
Yichen Wei
Fang Wen
Jian Sun
Publikationsdatum
01.04.2014
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 2/2014
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-013-0667-3

Weitere Artikel der Ausgabe 2/2014

International Journal of Computer Vision 2/2014 Zur Ausgabe

Premium Partner