nach oben

International Journal of Computer Vision

Erschienen in:

15.02.2017

Toward Personalized Modeling: Incremental and Ensemble Alignment for Sequential Faces in the Wild

verfasst von: Xi Peng, Shaoting Zhang, Yang Yu, Dimitris N. Metaxas

Erschienen in: International Journal of Computer Vision | Ausgabe 2-4/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Fitting facial landmarks on unconstrained videos is a challenging task with broad applications. Both generic and joint alignment methods have been proposed with varying degrees of success. However, many generic methods are heavily sensitive to initializations and usually rely on offline-trained static models, which limit their performance on sequential images with extensive variations. On the other hand, joint methods are restricted to offline applications, since they require all frames to conduct batch alignment. To address these limitations, we propose to exploit incremental learning for personalized ensemble alignment. We sample multiple initial shapes to achieve image congealing within one frame, which enables us to incrementally conduct ensemble alignment by group-sparse regularized rank minimization. At the same time, incremental subspace adaptation is performed to achieve personalized modeling in a unified framework. To alleviate the drifting issue, we leverage a very efficient fitting evaluation network to pick out well-aligned faces for robust incremental learning. Extensive experiments on both controlled and unconstrained datasets have validated our approach in different aspects and demonstrated its superior performance compared with state of the arts in terms of fitting accuracy and efficiency.

Vorheriger Artikel Real-Time Accurate 3D Head Tracking and Pose Estimation with Consumer RGB-D Cameras

Nächster Artikel A Comprehensive Performance Evaluation of Deformable Face Tracking “In-the-Wild”

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

(1) 0292_02_002_angelina_jolie (2) 0502_01_005_bruce_willis (3) 1198_01_012_julia_roberts (4) 1621_02_017_ronald_reagan (5) 1786_02_006_sylvester_stallone (6) 1847_01_005_victoria_beckham.

Asthana, A., Zafeiriou, S., Cheng, S., & Pantic, M. (2013). Robust discriminative response map fitting with constrained local models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3444–3451).

Asthana, A., Zafeiriou, S., Cheng, S., & Pantic, M. (2014). Incremental face alignment in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Baró, X., Gonzalez, J., Fabian, J., Bautista, M. A., Oliu, M., Escalante, H. J., Guyon, I., & Escalera, S. (2015). Chalearn looking at people 2015 challenges: Action spotting and cultural event recognition. In 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (pp. 1–9). IEEE.

Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202.MathSciNetCrossRefMATH

Belhumeur, P. N., Jacobs, D. W., Kriegman, D. J., & Kumar, N. (2013). Localizing parts of faces using a consensus of exemplars. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35, 2930–2940.CrossRef

Black, M., & Yacoob, Y. (1995). Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 374–381).

Brand, M. (2006). Fast low-rank modifications of the thin singular value decomposition. Linear Algebra and Its Applications, 415(1), 20–30.MathSciNetCrossRefMATH

Cao, X., Wei, Y., Wen, F., & Sun, J. (2014). Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2), 177–190.MathSciNetCrossRef

Cheng, X., Fookes, C., Sridharan, S., Saragih, J., & Lucey, S. (2013). Deformable face ensemble alignment with robust grouped-l1 anchors. In: Automatic Face and Gesture Recognition (FG). In IEEE International Conference and Workshops on (pp. 1–7). IEEE.

Cheng, X., Sridharan, S., Saraghi, J., & Lucey, S. (2012). Anchored deformable face ensemble alignment. In European Conference on Computer Vision (pp. 133–142). Berlin: Springer.

Cheng, X., Sridharan, S., Saragih, J., & Lucey, S. (2013). Rank minimization across appearance and shape for aam ensemble fitting. In IEEE International Conference on Computer Vision (ICCV) (pp. 577–584).

Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 23(6), 681–685.CrossRef

Decarlo, D., & Metaxas, D. (2000). Optical flow constraints on deformable models with applications to face tracking. International Journal of Computer Vision, 38(2), 99–127.CrossRefMATH

Doucet, A., De Freitas, N., & Gordon, N. (2001). An introduction to sequential monte carlo methods. In Sequential Monte (Ed.), Carlo methods in practice (pp. 3–14). Berlin: Springer.

Edelman, A., Arias, T. A., & Smith, S. T. (1998). The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2), 303–353.MathSciNetCrossRefMATH

Escalera, S., Gonzalez, J., Baró, X., Pardo, P., Fabian, J., Oliu, M., Escalante, H. J., Huerta, I., & Guyon, I. (2015). Chalearn looking at people 2015 new competitions: Age estimation and cultural event recognition. In International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). IEEE.

FGNet. (2004). Talking face video.

Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-pie. Image Vision Computing (IVC), 28(5), 807–813.CrossRef

He, J., Balzano, L., & Szlam, A. (2012). Incremental gradient on the grassmannian for online foreground and background separation in subsampled video. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1568–1575). IEEE.

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In ACMM (pp. 675–678).

Kim, M., Kumar, S., Pavlovic, V., & Rowley, H. (2008). Face tracking and recognition with visual constraints in real-world videos. In IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008 (pp. 1–8). IEEE.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 25 (pp. 1097–1105).

Le, V., Brandt, J., Lin, Z., Bourdev, L., & Huang, T. S. (2012). Interactive facial feature localization. In European Conference on Computer Vision (ECCV) (pp. 679–692).

Lin, Z., Chen, M., & Ma, Y. (2010). The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055.

Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.CrossRef

Mei, X., & Ling, H. (2009). Robust visual tracking using & #x2113; 1 minimization. In 2009 IEEE 12th International Conference on Computer Vision (pp. 1436–1443). IEEE.

Nasrollahi, K., Escalera, S., Rasti, P., Anbarjafari, G., Baro, X., Escalante, H. J., & Moeslund, T.B. (2015). Deep learning based super-resolution for improved action recognition. In: Image Processing Theory, Tools and Applications (IPTA). In 2015 International Conference on IEEE (pp. 67–72). IEEE.

Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In Proceedings of the British Machine Vision Conference (BMVC).

Patras, I., & Pantic, M. (2004). Particle filtering with factorized likelihoodsfor tracking facial features. In The IEEE International Conference on Automatic Face and Gesture Recognition (FG) (pp. 97–102).

Peng, X., Feris, R. S., Wang, X., & Metaxas, D. N. (2016). A recurrent encoder-decoder network for sequential face alignment. In European Conference on Computer Vision (pp. 38–56). Berlin: Springer.

Peng, X., Zhang, S., Yang, Y., & Metaxas, D. N. (2015). Piefa: Personalized incremental and ensemble face alignment. In The IEEE International Conference on Computer Vision (ICCV).

Peng, Y., Ganesh, A., Wright, J., Xu, W., & Ma, Y. (2010). RASL: Robust Alignment by Sparse and Low-rank Decomposition for Linearly Correlated Images. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).

Perakis, P., Passalis, G., Theoharis, T., & Kakadiaris, I. A. (2013). 3d facial landmark detection under large yaw and expression variations. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(7), 1552–1564.CrossRef

Ren, S., Cao, X., Wei, Y., & Sun, J. (2014). Face alignment at 3000 fps via regressing local binary features. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2016). 300 faces in-the-wild challenge: Database and results. In Image and Vision Computing (vol. 47, pp. 3–18). 300-W, the First Automatic Facial Landmark Detection in-the-Wild Challenge.

Sagonas, C., Panagakis, Y., Zafeiriou, S., & Pantic, M. (2014). Raps: Robust and efficient automatic construction of person-specific deformable models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1789–1796).

Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2013). 300 faces in-the-wild challenge: The first facial landmark localization challenge. In The IEEE International Conference on Computer Vision (ICCV) Workshops.

Saragih, J. M., Lucey, S., & Cohn, J. F. (2011). Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision (IJCV), 91(2), 200–215.MathSciNetCrossRefMATH

Schroff, F., Kalenichenko, D., & Philbin, J. (2015) Facenet: A unified embedding for face recognition and clustering. In CVPR (pp. 815–823).

Shen, J., Zafeiriou, S., Chrysos, G., Kossaifi, J., Tzimiropoulos, G., & Pantic, M. (2015) The first facial landmark tracking in-the-wild challenge: Benchmark and results. In The IEEE International Conference on Computer Vision (ICCV) Workshops.

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556.

Sun, Y., Wang, X., & Tang, X. (2013). Deep convolutional network cascade for facial point detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3476–3483).

Sung, J., & Kim, D. (2009). Adaptive active appearance model with incremental learning. Pattern Recognition Letters (PRL), 30(4), 359–367.CrossRef

Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In CVPR.

Tang, M., & Peng, X. (2012). Robust tracking with discriminative ranking lists. IEEE Transactions on Image Processing (TIP), 21(7), 3273–3281.MathSciNetCrossRefMATH

Trigeorgis, G., Snape, P., Nicolaou, M. A., Antonakos, E., & Zafeiriou, S. (2016). Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In IEEE International Conference on Computer Vision Pattern Recognition (CVPR).

Tzimiropoulos, G. (2015). Project-out cascaded regression with an application to face alignment. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 3659–3667). IEEE.

Tzimiropoulos, G., & Pantic, M. (2014). Gauss-newton deformable part models for face alignment in-the-wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1851–1858).

Vogler, C., Li, Z., Kanaujia, A., Goldenstein, S., & Metaxas, D. (2007). The best of both worlds: Combining 3d deformable models with active shape models. In IEEE International Conference on Computer Vision (ICCV) (pp. 1–7). IEEE.

Wang, Z., Mi, H., & Ittycheriah, A. (2016a). Semi-supervised clustering for short text via deep representation learning. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning (CoNLL) (pp. 31–39).

Wang, Z., Mi, H., & Ittycheriah, A. (2016b). Sentence similarity learning by lexical decomposition and composition. In Coling 2016.

Wang, Z., Mi, H., & Nianwen, X. (2015). Feature optimization for constituent parsing via neural networks. In Proceedings of ACL 2015 (pp. 1138–1147).

Wu, L., Romero, E., & Stathopoulos, A. (2016). A high-performance preconditioned svd solver for accurate large-scale computations. SIAM Journal on Scientific Computing. arXiv:1607.01404.

Wu, L., & Stathopoulos, A. (2015). A preconditioned hybrid svd method for accurately computing singular triplets of large matrices. SIAM Journal on Scientific Computing, 37(5), S365–S388.MathSciNetCrossRefMATH

Xiong, X., & De la Torre, F. (2013). Supervised descent method and its application to face alignment. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Yan, J., Lei, Z., Yi, D., & Li, S. (2013). Learn to combine multiple hypotheses for accurate face alignment. In Proceedings of the IEEE International Conference on Computer Vision Workshops (pp. 392–396).

Yang, H., Jia, X., Loy, C. C., & Robinson, P. (2015). An empirical study of recent face alignment methods. arXiv preprint arXiv:1511.05049.

Zafeiriou, L., Antonakos, E., Zafeiriou, S., & Pantic, M. (2014). Joint unsupervised face alignment and behaviour analysis. In D. Fleet, T. Pajdla, B. Schiele, T. Tuytelaars (eds.) European Conference on Computer Vision (ECCV) (pp. 167–183).

Zhang, J., Shan, S., Kan, M., & Chen, X. (2014a). Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In European Conference on Computer Vision (ECCV) (pp. 1–16).

Zhang, T., Liu, S., Ahuja, N., Yang, M. H., & Ghanem, B. (2015). Robust visual tracking via consistent low-rank sparse learning. International Journal of Computer Vision, 111(2), 171–190.CrossRef

Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2014b). Facial landmark detection by deep multi-task learning. In European Conference on Computer Vision (ECCV) (pp. 94–108).

Zhao, C., Cham, W. K., & Wang, X. (2011). Joint face alignment with a generic deformable face model. In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 561–568). IEEE.

Zhu, S., Li, C., Loy, C. C., & Tang, X. (2015). Face alignment by coarse-to-fine shape searching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 4998–5006).

Zhu, X., & Ramanan, D. (2012). Face detection, pose estimation and landmark estimation in the wild. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Titel: Toward Personalized Modeling: Incremental and Ensemble Alignment for Sequential Faces in the Wild
verfasst von: Xi Peng
Shaoting Zhang
Yang Yu
Dimitris N. Metaxas
Publikationsdatum: 15.02.2017
Verlag: Springer US
Erschienen in: International Journal of Computer Vision / Ausgabe 2-4/2018
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI: https://doi.org/10.1007/s11263-017-0996-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 2-4/2018

Deep Multimodal Fusion: A Hybrid Approach

Large Scale 3D Morphable Models

Looking at People Special Issue

Prediction of Manipulation Actions

Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos

Real-Time Accurate 3D Head Tracking and Pose Estimation with Consumer RGB-D Cameras