Skip to main content
Top
Published in: International Journal of Computer Vision 2/2017

19-04-2017

Pose-Invariant Face Alignment via CNN-Based Dense 3D Model Fitting

Authors: Amin Jourabloo, Xiaoming Liu

Published in: International Journal of Computer Vision | Issue 2/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Pose-invariant face alignment is a very challenging problem in computer vision, which is used as a prerequisite for many facial analysis tasks, e.g., face recognition, expression recognition, and 3D face reconstruction. Recently, there have been a few attempts to tackle this problem, but still more research is needed to achieve higher accuracy. In this paper, we propose a face alignment method that aligns an image with arbitrary poses, by combining the powerful cascaded CNN regressors, 3D Morphable Model (3DMM), and mirrorability constraint. The core of our proposed method is a novel 3DMM fitting algorithm, where the camera projection matrix parameters and 3D shape parameters are estimated by a cascade of CNN-based regressors. Furthermore, we impose the mirrorability constraint during the CNN learning by employing a novel loss function inside the siamese network. The dense 3D shape enables us to design pose-invariant appearance features for effective CNN learning. Extensive experiments are conducted on the challenging large-pose face databases (AFLW and AFW), with comparison to the state of the art.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Amberg, B., Knothe, R., & Vetter, T. (2008). Expression invariant 3D face recognition with a morphable model. In FG (pp. 1–6). Amberg, B., Knothe, R., & Vetter, T. (2008). Expression invariant 3D face recognition with a morphable model. In FG (pp. 1–6).
go back to reference Belhumeur, P.N., Jacobs, D.W., Kriegman, D., & Kumar, N. (2011). Localizing parts of faces using a consensus of exemplars. In CVPR (pp. 545–552). Belhumeur, P.N., Jacobs, D.W., Kriegman, D., & Kumar, N. (2011). Localizing parts of faces using a consensus of exemplars. In CVPR (pp. 545–552).
go back to reference Bell, S., & Bala, K. (2015). Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics, 34(4), 98.CrossRef Bell, S., & Bala, K. (2015). Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics, 34(4), 98.CrossRef
go back to reference Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., et al. (1993). Signature verification using a siamese time delay neural network. International Journal Pattern Recognition, 7(04), 669–688. Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., et al. (1993). Signature verification using a siamese time delay neural network. International Journal Pattern Recognition, 7(04), 669–688.
go back to reference Burgos-Artizzu, X.P., Perona, P., & Dollár, P. (2013). Robust face landmark estimation under occlusion. In ICCV (pp. 1513–1520). Burgos-Artizzu, X.P., Perona, P., & Dollár, P. (2013). Robust face landmark estimation under occlusion. In ICCV (pp. 1513–1520).
go back to reference Cao, C., Hou, Q., & Zhou, K. (2014). Displaced dynamic expression regression for real-time facial tracking and animation. ACM Transactions on Graphics (TOG), 33(4), 43. Cao, C., Hou, Q., & Zhou, K. (2014). Displaced dynamic expression regression for real-time facial tracking and animation. ACM Transactions on Graphics (TOG), 33(4), 43.
go back to reference Cao, X., Wei, Y., Wen, F., & Sun, J. (2014). Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2), 177–190.MathSciNetCrossRef Cao, X., Wei, Y., Wen, F., & Sun, J. (2014). Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2), 177–190.MathSciNetCrossRef
go back to reference Cao, C., Weng, Y., Zhou, S., Tong, Y., & Zhou, K. (2014). Facewarehouse: A 3D facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics, 20(3), 413–425.CrossRef Cao, C., Weng, Y., Zhou, S., Tong, Y., & Zhou, K. (2014). Facewarehouse: A 3D facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics, 20(3), 413–425.CrossRef
go back to reference Cootes, T., Taylor, C., & Lanitis, A. (1994) Active shape models: Evaluation of a multi-resolution method for improving image search. In BMVC vol. 1, (pp. 327–336). Cootes, T., Taylor, C., & Lanitis, A. (1994) Active shape models: Evaluation of a multi-resolution method for improving image search. In BMVC vol. 1, (pp. 327–336).
go back to reference Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In Proceedings artificial intelligence and statistics (AISTATS) (pp. 315–323). Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In Proceedings artificial intelligence and statistics (AISTATS) (pp. 315–323).
go back to reference Hsu, G.S., Chang, K.H., & Huang, S.C. (2015). Regressive tree structured model for facial landmark localization. In ICCV (pp. 3855–3861) Hsu, G.S., Chang, K.H., & Huang, S.C. (2015). Regressive tree structured model for facial landmark localization. In ICCV (pp. 3855–3861)
go back to reference Jeni, L.A., Cohn, J.F., & Kanade, T. (2015). Dense 3D face alignment from 2d videos in real-time. In FG (vol. 1, pp. 1–8) Jeni, L.A., Cohn, J.F., & Kanade, T. (2015). Dense 3D face alignment from 2d videos in real-time. In FG (vol. 1, pp. 1–8)
go back to reference Jeni, L.A., Tulyakov, S., Yin, L., Sebe, N., & Cohn, J.F. (2016). The first 3D face alignment in the wild (3DFAW) challenge. In ECCV (pp. 511–520). Jeni, L.A., Tulyakov, S., Yin, L., Sebe, N., & Cohn, J.F. (2016). The first 3D face alignment in the wild (3DFAW) challenge. In ECCV (pp. 511–520).
go back to reference Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S. & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM MM, (2014) (pp. 675–678). Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S. & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACM MM, (2014) (pp. 675–678).
go back to reference Jourabloo, A., & Liu, X. (2015). Pose-invariant 3D face alignment. In ICCV (pp. 3694–3702). Jourabloo, A., & Liu, X. (2015). Pose-invariant 3D face alignment. In ICCV (pp. 3694–3702).
go back to reference Jourabloo, A., & Liu, X. (2016). Large-pose face alignment via cnn-based dense 3D model fitting. In CVPR (pp. 4188–4196). Jourabloo, A., & Liu, X. (2016). Large-pose face alignment via cnn-based dense 3D model fitting. In CVPR (pp. 4188–4196).
go back to reference Jourabloo, A., Yin, X., & Liu, X. (2015). Attribute preserved face de-identification. In ICB (pp. 278–285). Jourabloo, A., Yin, X., & Liu, X. (2015). Attribute preserved face de-identification. In ICB (pp. 278–285).
go back to reference Köstinger, M., Wohlhart, P., Roth, P.M., & Bischof, H. (2011). Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In ICCVW (pp. 2144–2151). Köstinger, M., Wohlhart, P., Roth, P.M., & Bischof, H. (2011). Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In ICCVW (pp. 2144–2151).
go back to reference Li, H., Lin, Z., Shen, X., Brandt, J., & Hua, G. (2015) A convolutional neural network cascade for face detection. In CVPR (pp. 5325–5334). Li, H., Lin, Z., Shen, X., Brandt, J., & Hua, G. (2015) A convolutional neural network cascade for face detection. In CVPR (pp. 5325–5334).
go back to reference Liu, X. (2009). Discriminative face alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11), 1941–1954.CrossRef Liu, X. (2009). Discriminative face alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11), 1941–1954.CrossRef
go back to reference Liu, X. (2010). Video-based face model fitting using adaptive active appearance model. Journal of Image Vision Computing, 28(7), 1162–1172.CrossRef Liu, X. (2010). Video-based face model fitting using adaptive active appearance model. Journal of Image Vision Computing, 28(7), 1162–1172.CrossRef
go back to reference Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60(2), 135–164.CrossRef Matthews, I., & Baker, S. (2004). Active appearance models revisited. International Journal of Computer Vision, 60(2), 135–164.CrossRef
go back to reference Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3D face model for pose and illumination invariant face recognition. In AVSS (pp. 296–301). Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3D face model for pose and illumination invariant face recognition. In AVSS (pp. 296–301).
go back to reference Pfister, T., Simonyan, K., Charles, J., & Zisserman, A. (2015). Deep convolutional neural networks for efficient pose estimation in gesture videos. In ACCV (pp. 538–552). Pfister, T., Simonyan, K., Charles, J., & Zisserman, A. (2015). Deep convolutional neural networks for efficient pose estimation in gesture videos. In ACCV (pp. 538–552).
go back to reference Phillips, P.J., Moon, H., Rizvi, S., Rauss, P.J., et al. (2000). The FERET evaluation methodology for face-recognition algorithms. IEEE Transactions Pattern Analysis and Machine Intelligence, 22(10), 1090–1104. Phillips, P.J., Moon, H., Rizvi, S., Rauss, P.J., et al. (2000). The FERET evaluation methodology for face-recognition algorithms. IEEE Transactions Pattern Analysis and Machine Intelligence, 22(10), 1090–1104.
go back to reference Qu, C., Monari, E., Schuchert, T., & Beyerer, J. (2015) Adaptive contour fitting for pose-invariant 3D face shape reconstruction. In BMVC (pp. 1–12). Qu, C., Monari, E., Schuchert, T., & Beyerer, J. (2015) Adaptive contour fitting for pose-invariant 3D face shape reconstruction. In BMVC (pp. 1–12).
go back to reference Roth, J., Tong, Y., & Liu, X. (2015). Unconstrained 3D face reconstruction. In CVPR (pp. 2606–2615). Roth, J., Tong, Y., & Liu, X. (2015). Unconstrained 3D face reconstruction. In CVPR (pp. 2606–2615).
go back to reference Roth, J., Tong, Y., & Liu, X. (2016). Adaptive 3D face reconstruction from unconstrained photo collections. In CVPR (pp. 4197–4206). Roth, J., Tong, Y., & Liu, X. (2016). Adaptive 3D face reconstruction from unconstrained photo collections. In CVPR (pp. 4197–4206).
go back to reference Saragih, J.M., Lucey, S., & Cohn, J. (2009). Face alignment through subspace constrained mean-shifts. In ICCV (pp. 1034–1041). Saragih, J.M., Lucey, S., & Cohn, J. (2009). Face alignment through subspace constrained mean-shifts. In ICCV (pp. 1034–1041).
go back to reference Shan, S., Chang, Y., Gao, W., Cao, B., & Yang, P. (2004). Curse of mis-alignment in face recognition: Problem and a novel mis-alignment learning solution. In FG (pp. 314–320). Shan, S., Chang, Y., Gao, W., Cao, B., & Yang, P. (2004). Curse of mis-alignment in face recognition: Problem and a novel mis-alignment learning solution. In FG (pp. 314–320).
go back to reference Sun, Y., Wang, X., & Tang, X. (2013). Deep convolutional network cascade for facial point detection. In CVPR (pp. 3476–3483). Sun, Y., Wang, X., & Tang, X. (2013). Deep convolutional network cascade for facial point detection. In CVPR (pp. 3476–3483).
go back to reference Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In CVPR (pp. 1701–1708). Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In CVPR (pp. 1701–1708).
go back to reference Tulyakov, S., & Sebe, N. (2015) Regressing a 3D face shape from a single image. In ICCV (pp. 3748–3755). Tulyakov, S., & Sebe, N. (2015) Regressing a 3D face shape from a single image. In ICCV (pp. 3748–3755).
go back to reference Tzimiropoulos, G. (2015) Project-out cascaded regression with an application to face alignment. In CVPR (pp. 3659–3667). Tzimiropoulos, G. (2015) Project-out cascaded regression with an application to face alignment. In CVPR (pp. 3659–3667).
go back to reference Valstar, M., Martinez, B., Binefa, X., & Pantic, M. (2010) Facial point detection using boosted regression and graph models. In CVPR pp. 2729–2736. Valstar, M., Martinez, B., Binefa, X., & Pantic, M. (2010) Facial point detection using boosted regression and graph models. In CVPR pp. 2729–2736.
go back to reference Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.MATH Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.MATH
go back to reference Vedaldi, A., & Lenc, K. (2015). MatConvNet—convolutional neural networks for matlab. In ACM MM, (2015) (pp. 689–692). Vedaldi, A., & Lenc, K. (2015). MatConvNet—convolutional neural networks for matlab. In ACM MM, (2015) (pp. 689–692).
go back to reference Wagner, A., Wright, J., Ganesh, A., Zhou, Z., Mobahi, H., & Ma, Y. (2012). Toward a practical face recognition system: Robust alignment and illumination by sparse representation. IEEE Transactions Pattern Analysis Machine Intelligence, 34(2), 372–386.CrossRef Wagner, A., Wright, J., Ganesh, A., Zhou, Z., Mobahi, H., & Ma, Y. (2012). Toward a practical face recognition system: Robust alignment and illumination by sparse representation. IEEE Transactions Pattern Analysis Machine Intelligence, 34(2), 372–386.CrossRef
go back to reference Wang, N., Gao, X., Tao, D., & Li, X. (2014). Facial feature point detection: A comprehensive survey. arXiv preprint arXiv:1410.1037. Wang, N., Gao, X., Tao, D., & Li, X. (2014). Facial feature point detection: A comprehensive survey. arXiv preprint arXiv:​1410.​1037.
go back to reference Wu, Y., & Ji, Q. (2015) Robust facial landmark detection under significant head poses and occlusion. In ICCV (pp. 3658–3666). Wu, Y., & Ji, Q. (2015) Robust facial landmark detection under significant head poses and occlusion. In ICCV (pp. 3658–3666).
go back to reference Xiao, J., Baker, S., Matthews, I., & Kanade, T. (2004). Real-time combined 2D+3D active appearance models. In CVPR (vol. 2, pp. 535–542). Xiao, J., Baker, S., Matthews, I., & Kanade, T. (2004). Real-time combined 2D+3D active appearance models. In CVPR (vol. 2, pp. 535–542).
go back to reference Yang, H., & Patras, I. (2015). Mirror, mirror on the wall, tell me, is the error small? In CVPR (pp. 4685–4693). Yang, H., & Patras, I. (2015). Mirror, mirror on the wall, tell me, is the error small? In CVPR (pp. 4685–4693).
go back to reference Yang, B., Yan, J., Lei, Z., & Li, S.Z. (2015). Convolutional channel features. In ICCV (pp. 82–90). Yang, B., Yan, J., Lei, Z., & Li, S.Z. (2015). Convolutional channel features. In ICCV (pp. 82–90).
go back to reference Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In NIPS (pp. 3320–3328). Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks? In NIPS (pp. 3320–3328).
go back to reference Yu, X., Huang, J., Zhang, S., Yan, W., & Metaxas, D.N. (2013). Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In ICCV (pp. 1944–1951). Yu, X., Huang, J., Zhang, S., Yan, W., & Metaxas, D.N. (2013). Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In ICCV (pp. 1944–1951).
go back to reference Yu, X., Lin, Z., Brandt, J., & Metaxas, D.N. (2014). Consensus of regression for occlusion-robust facial feature localization. In ECCV (pp. 105–118). Yu, X., Lin, Z., Brandt, J., & Metaxas, D.N. (2014). Consensus of regression for occlusion-robust facial feature localization. In ECCV (pp. 105–118).
go back to reference Zagoruyko, S., & Komodakis, N. (2015). Learning to compare image patches via convolutional neural networks. In CVPR (pp. 4353–4361). Zagoruyko, S., & Komodakis, N. (2015). Learning to compare image patches via convolutional neural networks. In CVPR (pp. 4353–4361).
go back to reference Zhang, Z., Luo, P., Loy, C.C., & Tang, X. (2014). Facial landmark detection by deep multi-task learning. In ECCV (pp. 94–108). Zhang, Z., Luo, P., Loy, C.C., & Tang, X. (2014). Facial landmark detection by deep multi-task learning. In ECCV (pp. 94–108).
go back to reference Zhang, J., Shan, S., Kan, M., & Chen, X. (2014). Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In ECCV (pp. 1–16). Zhang, J., Shan, S., Kan, M., & Chen, X. (2014). Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In ECCV (pp. 1–16).
go back to reference Zhang, J., Zhou, S.K., Comaniciu, D., & McMillan, L. (2008). Conditional density learning via regression with application to deformable shape segmentation. In CVPR (pp. 1–8). Zhang, J., Zhou, S.K., Comaniciu, D., & McMillan, L. (2008). Conditional density learning via regression with application to deformable shape segmentation. In CVPR (pp. 1–8).
go back to reference Zhou, E., Fan, H., Cao, Z., Jiang, Y., & Yin, Q. (2013). Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In ICCVW (pp. 386–391). Zhou, E., Fan, H., Cao, Z., Jiang, Y., & Yin, Q. (2013). Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In ICCVW (pp. 386–391).
go back to reference Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation, and landmark localization in the wild. In CVPR (pp. 2879–2886). Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation, and landmark localization in the wild. In CVPR (pp. 2879–2886).
go back to reference Zhu, X., Lei, Z., Yan, J., Yi, D., & Li, S.Z. (2015). High-fidelity pose and expression normalization for face recognition in the wild. In CVPR (pp. 787–796). Zhu, X., Lei, Z., Yan, J., Yi, D., & Li, S.Z. (2015). High-fidelity pose and expression normalization for face recognition in the wild. In CVPR (pp. 787–796).
go back to reference Zhu, S., Li, C., Change Loy, C., & Tang, X. (2015). Face alignment by coarse-to-fine shape searching. In CVPR (pp. 4998–5006). Zhu, S., Li, C., Change Loy, C., & Tang, X. (2015). Face alignment by coarse-to-fine shape searching. In CVPR (pp. 4998–5006).
go back to reference Zhu, X., Yan, J., Yi, D., Lei, Z., & Li, S.Z. (2015). Discriminative 3D morphable model fitting. In FG (pp. 1–8). Zhu, X., Yan, J., Yi, D., Lei, Z., & Li, S.Z. (2015). Discriminative 3D morphable model fitting. In FG (pp. 1–8).
Metadata
Title
Pose-Invariant Face Alignment via CNN-Based Dense 3D Model Fitting
Authors
Amin Jourabloo
Xiaoming Liu
Publication date
19-04-2017
Publisher
Springer US
Published in
International Journal of Computer Vision / Issue 2/2017
Print ISSN: 0920-5691
Electronic ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-017-1012-z

Other articles of this Issue 2/2017

International Journal of Computer Vision 2/2017 Go to the issue

Premium Partner