Skip to main content

2019 | OriginalPaper | Buchkapitel

Real-Time Face Features Localization with Recurrent Refined Dense CNN Architectures

verfasst von : Nicolas Livet

Erschienen in: Advances in Visual Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Based on an innovative, efficient recurrent deep learning architecture, we present a highly stable and robust technique to localize face features on still images, captured and live video sequences. This dense (Fully Convolutional) CNN architecture, referred as the Refined Dense Mobilenet (RDM), is composed of (1) a main encoder-decoder block which aims to approximate face feature locations and, (2) a sequence of refiners which aims to robustly converge at the vicinity of the features. On video sequences, architecture is adapted into a Recurrent RDM where a shape prior component is re-injected in the form of temporal heatmaps obtained at previous frame inference.
Accuracy and stability of RDM/R-RDM architectures are compared with state-of-the-art Random Forest and CNN based approaches. The idea of combining a holistic feature localizer – taking advantage of large receptive fields to minimize large error – and refiners – working at higher resolution to converge at feature vicinities – is proving high accuracy in localizing face features. We demonstrate RDM/R-RDM architectures improve localization scores on 300W and AFLW datasets. Moreover, by relying on modern, efficient convolutional blocks and based on our recurrent architecture, we deliver the first stable and accurate real-time implementation of face feature localization on low-end Mobile devices.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Accompanying video available at https://​vimeo.​com/​348063383.
 
Literatur
1.
Zurück zum Zitat Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2930–2940 (2013)CrossRef Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2930–2940 (2013)CrossRef
2.
3.
Zurück zum Zitat Bulat, A., Tzimiropoulos, G.: Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources, March 2017 Bulat, A., Tzimiropoulos, G.: Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources, March 2017
4.
Zurück zum Zitat Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230, 000 3D facial landmarks). CoRR abs/1703.07332 (2017) Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D face alignment problem? (and a dataset of 230, 000 3D facial landmarks). CoRR abs/1703.07332 (2017)
5.
Zurück zum Zitat Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2887–2894, June 2012 Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2887–2894, June 2012
6.
Zurück zum Zitat Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models: their training and application. Comput. Vis. Image Underst. 61(1), 38–59 (1995)CrossRef Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models: their training and application. Comput. Vis. Image Underst. 61(1), 38–59 (1995)CrossRef
7.
8.
Zurück zum Zitat Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)CrossRef Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)CrossRef
9.
Zurück zum Zitat Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models, vol. 41, pp. 929–938, January 2006 Cristinacce, D., Cootes, T.F.: Feature detection and tracking with constrained local models, vol. 41, pp. 929–938, January 2006
10.
Zurück zum Zitat Dollár, P., Welinder, P., Perona, P.: Cascaded pose regression. In: CVPR (2010) Dollár, P., Welinder, P., Perona, P.: Cascaded pose regression. In: CVPR (2010)
11.
Zurück zum Zitat Feng, Z.H., Kittler, J., Awais, M., Huber, P., Wu, X.J.: Wing loss for robust facial landmark localisation with convolutional neural networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2235–2245. IEEE (2018) Feng, Z.H., Kittler, J., Awais, M., Huber, P., Wu, X.J.: Wing loss for robust facial landmark localisation with convolutional neural networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2235–2245. IEEE (2018)
13.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, June 2016 He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, June 2016
14.
Zurück zum Zitat He, Z., Kan, M., Zhang, J., Chen, X., Shan, S.: A fully end-to-end cascaded CNN for facial landmark detection. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 200–207, May 2017 He, Z., Kan, M., Zhang, J., Chen, X., Shan, S.: A fully end-to-end cascaded CNN for facial landmark detection. In: 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017), pp. 200–207, May 2017
15.
Zurück zum Zitat Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017) Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. CoRR abs/1704.04861 (2017)
16.
Zurück zum Zitat Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. CoRR abs/1712.05877 (2017) Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. CoRR abs/1712.05877 (2017)
17.
Zurück zum Zitat King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009) King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
20.
Zurück zum Zitat Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440, June 2015 Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440, June 2015
21.
Zurück zum Zitat Luo, C., Wang, Z., Wang, S., Zhang, J., Yu, J.: Locating facial landmarks using probabilistic random forest. IEEE Signal Process. Lett. 22(12), 2324–2328 (2015)CrossRef Luo, C., Wang, Z., Wang, S., Zhang, J., Yu, J.: Locating facial landmarks using probabilistic random forest. IEEE Signal Process. Lett. 22(12), 2324–2328 (2015)CrossRef
22.
Zurück zum Zitat Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: Proceedings of the First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (2011) Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: Proceedings of the First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (2011)
24.
Zurück zum Zitat Ranjan, R., Patel, V.M., Chellappa, R.: HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. CoRR abs/1603.01249 (2016) Ranjan, R., Patel, V.M., Chellappa, R.: HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. CoRR abs/1603.01249 (2016)
26.
Zurück zum Zitat Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge. Image Vision Comput. 47(C), 3–18 (2016) CrossRef Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge. Image Vision Comput. 47(C), 3–18 (2016) CrossRef
27.
Zurück zum Zitat Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. CoRR abs/1801.04381 (2018) Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. CoRR abs/1801.04381 (2018)
28.
Zurück zum Zitat Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. Int. J. Comput. Vision 91(2), 200–215 (2011)MathSciNetCrossRef Saragih, J.M., Lucey, S., Cohn, J.F.: Deformable model fitting by regularized landmark mean-shift. Int. J. Comput. Vision 91(2), 200–215 (2011)MathSciNetCrossRef
29.
Zurück zum Zitat Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014) Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
30.
Zurück zum Zitat Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483, June 2013 Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3476–3483, June 2013
31.
Zurück zum Zitat Szegedy, C., et al.: Going deeper with convolutions. CoRR abs/1409.4842 (2014) Szegedy, C., et al.: Going deeper with convolutions. CoRR abs/1409.4842 (2014)
32.
Zurück zum Zitat Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708, June 2014 Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708, June 2014
34.
Zurück zum Zitat Viola, P., Jones, M.: Robust real-time object detection. Int. J. Comput. Vision 4(34–47), 4 (2001) Viola, P., Jones, M.: Robust real-time object detection. Int. J. Comput. Vision 4(34–47), 4 (2001)
35.
Zurück zum Zitat Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. CoRR abs/1602.00134 (2016) Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. CoRR abs/1602.00134 (2016)
37.
Zurück zum Zitat Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: 2013 IEEE International Conference on Computer Vision Workshops, pp. 386–391, December 2013 Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: 2013 IEEE International Conference on Computer Vision Workshops, pp. 386–391, December 2013
38.
Zurück zum Zitat Zhu, S., Li, C., Loy, C.C., Tang, X.: Unconstrained face alignment via cascaded compositional learning, pp. 3409–3417, June 2016 Zhu, S., Li, C., Loy, C.C., Tang, X.: Unconstrained face alignment via cascaded compositional learning, pp. 3409–3417, June 2016
39.
Zurück zum Zitat Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886, June 2012 Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886, June 2012
Metadaten
Titel
Real-Time Face Features Localization with Recurrent Refined Dense CNN Architectures
verfasst von
Nicolas Livet
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-33720-9_35

Premium Partner