Skip to main content

2016 | OriginalPaper | Buchkapitel

Robust Facial Landmark Detection via Recurrent Attentive-Refinement Networks

verfasst von : Shengtao Xiao, Jiashi Feng, Junliang Xing, Hanjiang Lai, Shuicheng Yan, Ashraf Kassim

Erschienen in: Computer Vision – ECCV 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this work, we introduce a novel Recurrent Attentive-Refinement (RAR) network for facial landmark detection under unconstrained conditions, suffering from challenges like facial occlusions and/or pose variations. RAR follows the pipeline of cascaded regressions that refines landmark locations progressively. However, instead of updating all the landmark locations together, RAR refines the landmark locations sequentially at each recurrent stage. In this way, more reliable landmark points are refined earlier and help to infer locations of other challenging landmarks that may stay with occlusions and/or extreme poses. RAR can thus effectively control detection errors from those challenging landmarks and improve overall performance even in presence of heavy occlusions and/or extreme conditions. To determine the sequence of landmarks, RAR employs an attentive-refinement mechanism. The attention LSTM (A-LSTM) and refinement LSTM (R-LSTM) models are introduced in RAR. At each recurrent stage, A-LSTM implicitly identifies a reliable landmark as the attention center. Following the sequence of attention centers, R-LSTM sequentially refines the landmarks near or correlated with the attention centers and provides ultimate detection results finally. To further enhance algorithmic robustness, instead of using mean shape for initialization, RAR adaptively determines the initialization by selecting from a pool of shape centers clustered from all training shapes. As an end-to-end trainable model, RAR demonstrates superior performance in detecting challenging landmarks in comprehensive experiments and it also establishes new state-of-the-arts on the 300-W, COFW and AFLW benchmark datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The face shape depicts global spatial configuration of all the landmark points for a face. Throughout the paper, we use shape to denote the collection of all the landmarks.
 
2
This name is inspired by the process how humans annotate facial landmarks manually: one prefers to annotate the most clear and reliable landmark points first and then infer the position of other landmark points according to overall face shape.
 
Literatur
1.
Zurück zum Zitat Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face recognition: a literature survey. ACM Comput. Surv. 35(4), 399–458 (2003)CrossRef Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face recognition: a literature survey. ACM Comput. Surv. 35(4), 399–458 (2003)CrossRef
2.
Zurück zum Zitat Liu, L., Xing, J., Liu, S., Xu, H., Zhou, X., Yan, S.: Wow! you are so beautiful today!. ACM Trans. Multimedia Comput. Commun. Appl. 11(1s), 20 (2014)CrossRef Liu, L., Xing, J., Liu, S., Xu, H., Zhou, X., Yan, S.: Wow! you are so beautiful today!. ACM Trans. Multimedia Comput. Commun. Appl. 11(1s), 20 (2014)CrossRef
3.
Zurück zum Zitat Kemelmacher-Shlizerman, I., Suwajanakorn, S., Seitz, S.M.: Illumination-aware age progression. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3334–3341. IEEE (2014) Kemelmacher-Shlizerman, I., Suwajanakorn, S., Seitz, S.M.: Illumination-aware age progression. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3334–3341. IEEE (2014)
4.
Zurück zum Zitat Cao, C., Hou, Q., Zhou, K.: Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans. Graph. 33(4), 43 (2014) Cao, C., Hou, Q., Zhou, K.: Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans. Graph. 33(4), 43 (2014)
5.
Zurück zum Zitat Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. Int. J. Comput. Vis. 91(2), 200–215 (2011)MathSciNetCrossRefMATH Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. Int. J. Comput. Vis. 91(2), 200–215 (2011)MathSciNetCrossRefMATH
6.
Zurück zum Zitat Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2879–2886. IEEE (2012) Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2879–2886. IEEE (2012)
7.
Zurück zum Zitat Martins, P., Caseiro, R., Batista, J.: Generative face alignment through 2.5 d active appearance models. Comput. Vis. Image Underst. 117(3), 250–268 (2013)CrossRef Martins, P., Caseiro, R., Batista, J.: Generative face alignment through 2.5 d active appearance models. Comput. Vis. Image Underst. 117(3), 250–268 (2013)CrossRef
8.
Zurück zum Zitat Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. Int. J. Comput. Vis. 107(2), 177–190 (2014)MathSciNetCrossRef Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. Int. J. Comput. Vis. 107(2), 177–190 (2014)MathSciNetCrossRef
9.
Zurück zum Zitat Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 532–539. IEEE (2013) Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 532–539. IEEE (2013)
10.
Zurück zum Zitat Dollár, P., Welinder, P., Perona, P.: Cascaded pose regression. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1078–1085. IEEE (2010) Dollár, P., Welinder, P., Perona, P.: Cascaded pose regression. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1078–1085. IEEE (2010)
11.
Zurück zum Zitat Lee, D., Park, H., Yoo, C.D.: Face alignment using cascade gaussian process regression trees. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 4204–4212. IEEE (2015) Lee, D., Park, H., Yoo, C.D.: Face alignment using cascade gaussian process regression trees. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 4204–4212. IEEE (2015)
12.
Zurück zum Zitat Zhu, S., Li, C., Loy, C.C., Tang, X.: Face alignment by coarse-to-fine shape searching. In: CVPR, pp. 4998–5006. IEEE (2015) Zhu, S., Li, C., Loy, C.C., Tang, X.: Face alignment by coarse-to-fine shape searching. In: CVPR, pp. 4998–5006. IEEE (2015)
13.
Zurück zum Zitat Burgos-Artizzu, X.P., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1513–1520. IEEE (2013) Burgos-Artizzu, X.P., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: Proceedings of IEEE International Conference on Computer Vision, pp. 1513–1520. IEEE (2013)
14.
Zurück zum Zitat Zhang, J., Shan, S., Kan, M., Chen, X.: Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment. In: Proceedings of European Conference on Computer Vision, pp. 1–16 (2014) Zhang, J., Shan, S., Kan, M., Chen, X.: Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment. In: Proceedings of European Conference on Computer Vision, pp. 1–16 (2014)
15.
Zurück zum Zitat Luo, P., Wang, X., Tang, X.: Hierarchical face parsing via deep learning. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2480–2487. IEEE (2012) Luo, P., Wang, X., Tang, X.: Hierarchical face parsing via deep learning. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2480–2487. IEEE (2012)
16.
Zurück zum Zitat Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3476–3483. IEEE (2013) Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3476–3483. IEEE (2013)
17.
Zurück zum Zitat Lai, H., Xiao, S., Cui, Z., Pan, Y., Xu, C., Yan, S.: Deep Cascaded Regression for Face Alignment. ArXiv e-prints, October 2015 Lai, H., Xiao, S., Cui, Z., Pan, Y., Xu, C., Yan, S.: Deep Cascaded Regression for Face Alignment. ArXiv e-prints, October 2015
18.
Zurück zum Zitat Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning deep representation for face alignment with auxiliary attributes. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2015) Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning deep representation for face alignment with auxiliary attributes. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2015)
19.
Zurück zum Zitat Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 fps via regressing local binary features. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1685–1692. IEEE (2014) Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 fps via regressing local binary features. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1685–1692. IEEE (2014)
20.
Zurück zum Zitat Xing, J., Niu, Z., Huang, J., Hu, W., Yan, S.: Towards multi-view and partially-occluded face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1829–1836. IEEE (2014) Xing, J., Niu, Z., Huang, J., Hu, W., Yan, S.: Towards multi-view and partially-occluded face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1829–1836. IEEE (2014)
21.
Zurück zum Zitat Sauer, P., Cootes, T.F., Taylor, C.J.: Accurate regression procedures for active appearance models. In: Proceedings of British Machine Vision Conference, pp. 1–11(2011) Sauer, P., Cootes, T.F., Taylor, C.J.: Accurate regression procedures for active appearance models. In: Proceedings of British Machine Vision Conference, pp. 1–11(2011)
22.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
23.
Zurück zum Zitat Graves, A., Mohamed, A.r., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013) Graves, A., Mohamed, A.r., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6645–6649. IEEE (2013)
24.
Zurück zum Zitat Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014) Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
25.
Zurück zum Zitat Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014) Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:​1408.​5093 (2014)
26.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:​1409.​1556 (2014)
27.
Zurück zum Zitat Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: Proceedings of IEEE International Conference on Computer Vision Workshops. IEEE (2013) Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: Proceedings of IEEE International Conference on Computer Vision Workshops. IEEE (2013)
28.
Zurück zum Zitat Ghiasi, G., Fowlkes, C.C.: Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1899–1906. IEEE (2014) Ghiasi, G., Fowlkes, C.C.: Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model. In: Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1899–1906. IEEE (2014)
29.
Zurück zum Zitat Köstinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 2144–2151. IEEE (2011) Köstinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 2144–2151. IEEE (2011)
30.
Zurück zum Zitat Yang, H., He, X., Jia, X., Patras, I.: Robust face alignment under occlusion via regional predictive power estimation. IEEE Trans. Image Process. 24(8), 2393–2403 (2015)MathSciNetCrossRef Yang, H., He, X., Jia, X., Patras, I.: Robust face alignment under occlusion via regional predictive power estimation. IEEE Trans. Image Process. 24(8), 2393–2403 (2015)MathSciNetCrossRef
Metadaten
Titel
Robust Facial Landmark Detection via Recurrent Attentive-Refinement Networks
verfasst von
Shengtao Xiao
Jiashi Feng
Junliang Xing
Hanjiang Lai
Shuicheng Yan
Ashraf Kassim
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46448-0_4

Premium Partner