research-article

Corrective 3D reconstruction of lips from monocular video

Authors:
Pablo Garrido

Max Planck Institute for Informatics

Max Planck Institute for Informatics
View Profile

,
Michael Zollhöfer

Max Planck Institute for Informatics

Max Planck Institute for Informatics
View Profile

,
Chenglei Wu

ETH Zurich

ETH Zurich
View Profile

,
Derek Bradley

Disney Research

Disney Research
View Profile

,
Patrick Pérez

Technicolor

Technicolor
View Profile

,
Thabo Beeler

Disney Research

Disney Research
View Profile

,
Christian Theobalt

Max Planck Institute for Informatics

Max Planck Institute for Informatics
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 35 Issue 6Article No.: 219pp 1–11https://doi.org/10.1145/2980179.2982419

Published:05 December 2016Publication History

ACM Transactions on Graphics

Abstract

In facial animation, the accurate shape and motion of the lips of virtual humans is of paramount importance, since subtle nuances in mouth expression strongly influence the interpretation of speech and the conveyed emotion. Unfortunately, passive photometric reconstruction of expressive lip motions, such as a kiss or rolling lips, is fundamentally hard even with multi-view methods in controlled studios. To alleviate this problem, we present a novel approach for fully automatic reconstruction of detailed and expressive lip shapes along with the dense geometry of the entire face, from just monocular RGB video. To this end, we learn the difference between inaccurate lip shapes found by a state-of-the-art monocular facial performance capture approach, and the true 3D lip shapes reconstructed using a high-quality multi-view system in combination with applied lip tattoos that are easy to track. A robust gradient domain regressor is trained to infer accurate lip shapes from coarse monocular reconstructions, with the additional help of automatically extracted inner and outer 2D lip contours. We quantitatively and qualitatively show that our monocular approach reconstructs higher quality lip shapes, even for complex shapes like a kiss or lip rolling, than previous monocular approaches. Furthermore, we compare the performance of person-specific and multi-person generic regression strategies and show that our approach generalizes to new individuals and general scenes, enabling high-fidelity reconstruction even from commodity video footage.

Supplemental Material

Available for Download

zip

a219-garrido.zip (231.9 MB)

Supplemental file.

References

Alexa, M. 2002. Linear combination of transformations. ACM TOG 21, 3, 380--387. Google ScholarDigital Library
Alexander, O., Rogers, M., Lambeth, W., Chiang, J., Ma, W., Wang, C., and Debevec, P. E. 2010. The digital emily project: Achieving a photorealistic digital actor. IEEE CGAA 30, 4, 20--31. Google ScholarDigital Library
Alexander, O., Fyffe, G., Busch, J., Yu, X., Ichikari, R., Jones, A., Debevec, P., Jimenez, J., Danvoye, E., Antionazzi, B., Eheler, M., Kysela, Z., and von der Pahlen, J. 2013. Digital Ira: Creating a real-time photoreal digital actor. In ACM Siggrah Posters. Google ScholarDigital Library
Anderson, R., Stenger, B., and Cipolla, R. 2013. Lip tracking for 3D face registration. In Proc. MVA, 145--148.Google Scholar
Barnard, M., Holden, E. J., and Owens, R. 2002. Lip tracking using pattern matching snakes. In Proc. ACCV, 1--6.Google Scholar
Beeler, T., Bickel, B., Beardsley, P., Sumner, B., and Gross, M. 2010. High-quality single-shot capture of facial geometry. ACM TOG 29, 4, 40:1--40:9. Google ScholarDigital Library
Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG 30, 4, 75:1--75:10. Google ScholarDigital Library
Beeler, T., Bickel, B., Noris, G., Marschner, S., Beardsley, P., Sumner, R. W., and Gross, M. 2012. Coupled 3D reconstruction of sparse facial hair and skin. ACM TOG 31, 4, 117:1--117:10. Google ScholarDigital Library
Bérard, P., Bradley, D., Nitti, M., Beeler, T., and Gross, M. 2014. High-quality capture of eyes. ACM TOG 33, 6, 223:1--223:12. Google ScholarDigital Library
Bermano, A., Beeler, T., Kozlov, Y., Bradley, D., Bickel, B., and Gross, M. 2015. Detailed spatio-temporal reconstruction of eyelids. ACM TOG 34, 4, 44:1--44:11. Google ScholarDigital Library
Bhat, K. S., Goldenthal, R., Ye, Y., Mallet, R., and Koperwas, M. 2013. High fidelity facial animation capture and retargeting with contours. In Proc. ACM SCA, 7--14. Google ScholarDigital Library
Bickel, B., Botsch, M., Angst, R., Matusik, W., Otaduy, M. A., Pfister, H., and Gross, M. H. 2007. Multi-scale capture of facial geometry and motion. ACM TOG 26, 3, 33:1--33:10. Google ScholarDigital Library
Bishop, C. M. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA. Google ScholarDigital Library
Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3D faces. In Proc. ACM Siggraph, 187--194. Google ScholarDigital Library
Borshukov, G., Piponi, D., Larsen, O., Lewis, J. P., and Tempelaar-Lietz, C. 2003. Universal capture: Image-based facial animation for "The Matrix Reloaded". In ACM SIGGRAPH 2003 Sketches & Applications. Google ScholarDigital Library
Bouaziz, S., Wang, Y., and Pauly, M. 2013. Online modeling for realtime facial animation. ACM TOG 32, 4, 40:1--40:10. Google ScholarDigital Library
Bradley, D., Heidrich, W., Popa, T., and Sheffer, A. 2010. High resolution passive facial performance capture. ACM TOG 29, 4, 41:1--41:10. Google ScholarDigital Library
Cao, C., Hou, Q., and Zhou, K. 2014. Displaced dynamic expression regression for real-time facial tracking and animation. ACM TOG 33, 4, 43:1--43:10. Google ScholarDigital Library
Cao, C., Bradley, D., Zhou, K., and Beeler, T. 2015. Real-time high-fidelity facial performance capture. ACM TOG 34, 4, 46:1--46:9. Google ScholarDigital Library
Chen, Y.-L., Wu, H.-T., Shi, F., Tong, X., and Chai, J. 2013. Accurate and robust 3D facial capture using a single RGBD camera. In Proc. ICCV, 3615--3622. Google ScholarDigital Library
Cootes, T. F., Edwards, G. J., and Taylor, C. J. 2001. Active appearance models. IEEE Trans. Pattern Anal. Machine Intell. 23, 6, 681--685. Google ScholarDigital Library
Dale, K., Sunkavalli, K., Johnson, M. K., Vlasic, D., Matusik, W., and Pfister, H. 2011. Video face replacement. ACM TOG 30, 6, 130:1--130:10. Google ScholarDigital Library
Dollár, P., Tu, Z., and Belongie, S. 2006. Supervised learning of edges and object boundaries. In Proc. CVPR, 1964--1971. Google ScholarDigital Library
Echevarria, J. I., Bradley, D., Gutierrez, D., and Beeler, T. 2014. Capturing and stylizing hair for 3D fabrication. ACM TOG 33, 4, 125:1--125:11. Google ScholarDigital Library
Eveno, N., Caplier, A., and Coulon, P. Y. 2004. Accurate and quasi-automatic lip tracking. IEEE Trans. Circuit and Systems for Video Tech. 14, 5, 706--715. Google ScholarDigital Library
Fyffe, G., Jones, A., Alexander, O., Ichikari, R., and Debevec, P. 2014. Driving high-resolution facial scans with video performance capture. ACM TOG 34, 1, 8:1--8:14. Google ScholarDigital Library
Garrido, P., Valgaerts, L., Wu, C., and Theobalt, C. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM TOG 32, 6, 158:1--158:10. Google ScholarDigital Library
Garrido, P., Valgaerts, L., Sarmadi, H., Steiner, I., Varanasi, K., Perez, P., and Theobalt, C. 2015. VDub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. CGF 34, 2, 193--204. Google ScholarDigital Library
Garrido, P., Zollhöfer, M., Casas, D., Valgaerts, L., Varanasi, K., Pérez, P., and Theobalt, C. 2016. Reconstruction of personalized 3D face rigs from monocular video. ACM TOG 35, 3, 28:1--28:15. Google ScholarDigital Library
Ghosh, A., Fyffe, G., Tunwattanapong, B., Busch, J., Yu, X., and Debevec, P. 2011. Multiview face capture using polarized spherical gradient illumination. ACM TOG 30, 6, 129:1--129:10. Google ScholarDigital Library
Graham, P., Tunwattanapong, B., Busch, J., Yu, X., Jones, A., Debevec, P. E., and Ghosh, A. 2013. Measurement-based synthesis of facial microgeometry. CGF 32, 2, 335--344.Google ScholarCross Ref
Guenter, B., Grimm, C., Wood, D., Malvar, H., and Pighin, F. 1998. Making faces. In Proc. ACM Siggraph, 55--66. Google ScholarDigital Library
Higham, N. J. 1986. Computing the polar decomposition with applications. SIAM J. Sci. Stat. Comput. 7, 4, 1160--1174. Google ScholarDigital Library
Hoerl, A. E., and Kennard, R. W. 2000. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 42, 1, 80--86. Google ScholarDigital Library
Hsieh, P.-L., Ma, C., Yu, J., and Li, H. 2015. Unconstrained realtime facial performance capture. In Proc. CVPR, 1675--1683.Google Scholar
Hu, L., Ma, C., Luo, L., and Li, H. 2015. Single-view hair modeling using a hairstyle database. ACM TOG 34, 4, 125:1--125:9. Google ScholarDigital Library
Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Lever-aging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG 30, 4, 74:1--74:10. Google ScholarDigital Library
Ichim, A. E., Bouaziz, S., and Pauly, M. 2015. Dynamic 3D avatar creation from hand-held video input. ACM TOG 34, 4, 45:1--45:14. Google ScholarDigital Library
Kaucic, R., and Blake, A. 1998. Accurate, real-time, unadorned lip tracking. In Proc. ICCV, 370--375. Google ScholarDigital Library
Kawai, M., Iwao, T., Maejima, A., and Morishima, S. 2014. Automatic photorealistic 3D inner mouth restoration from frontal images. In Proc. ISVC, 51--62.Google Scholar
Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. M. 2010. Being John Malkovich. In Proc. ECCV, 341--353. Google ScholarDigital Library
Klaudiny, M., and Hilton, A. 2012. High-detail 3D capture and non-sequential alignment of facial performance. In Proc. 3DIMPVT, 17--24. Google ScholarDigital Library
Lewis, J., and Anjyo, K.-i. 2010. Direct manipulation blend-shapes. IEEE Comp. Graphics and Applications 30, 4, 42--50. Google ScholarDigital Library
Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM TOG 32, 4, 42:1--42:10. Google ScholarDigital Library
Liu, Y., Xu, F., Chai, J., Tong, X., Wang, L., and Huo, Q. 2015. Video-audio driven real-time facial animation. ACM Trans. Graph. 34, 6, 182:1--182:10. Google ScholarDigital Library
Luo, L., Li, H., Paris, S., Weise, T., Pauly, M., and Rusinkiewicz, S. 2012. Multi-view hair capture using orientation fields. In Proc. CVPR, 1490--1497. Google ScholarDigital Library
Nagano, K., Fyffe, G., Alexander, O., Barbič, J., Li, H., Ghosh, A., and Debevec, P. 2015. Skin microstructure deformation with displacement map convolution. ACM TOG 34, 4, 109:1--109:10. Google ScholarDigital Library
Nath, A. R., and Beauchamp, M. S. 2012. A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion. NeuroImage 59, 1, 781--787.Google ScholarCross Ref
Nguyen, Q. D., and Milgram, M. 2009. Semi adaptive appearance models for lip tracking. In Proc. ICIP, 2437--2440. Google ScholarDigital Library
Pighin, F., and Lewis, J. 2006. Performance-driven facial animation. In ACM Siggraph Courses.Google Scholar
Saragih, J. M., Lucey, S., and Cohn, J. F. 2009. Face alignment through subspace constrained mean-shifts. In Proc. ICCV, 1034--1041.Google Scholar
Saragih, J. M., Lucey, S., and Cohn, J. F. 2011. Deformable model fitting by regularized landmark mean-shift. Int. J. Computer Vision 91, 2, 200--215. Google ScholarDigital Library
Shi, F., Wu, H.-T., Tong, X., and Chai, J. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM TOG 33, 6, 222:1--222:13. Google ScholarDigital Library
Sifakis, E., Neverov, I., and Fedkiw, R. 2005. Automatic determination of facial muscle activations from sparse motion capture marker data. ACM TOG 24, 3, 417--425. Google ScholarDigital Library
Sumner, R. W., and Popovic, J. 2004. Deformation transfer for triangle meshes. ACM TOG 23, 3, 399--405. Google ScholarDigital Library
Suwajanakorn, S., Kemelmacher-Shlizerman, I., and Seitz, S. M. 2014. Total moving face reconstruction. In Proc. ECCV, 796--812.Google Scholar
Suwajanakorn, S., Seitz, S. M., and Kemelmacher-Shlizerman, I. 2015. What makes Tom Hanks look like Tom Hanks. In Proc. ICCV, 3952--3960. Google ScholarDigital Library
Thies, J., Zollhöfer, M., Niessner, M., Valgaerts, L., Stamminger, M., and Theobalt, C. 2015. Real-time expression transfer for facial reenactment. ACM TOG 34, 6, 183:1--183:14. Google ScholarDigital Library
Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., and Niessner, M. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proc. CVPR.Google Scholar
Tian, Y.-L., Kanade, T., and Cohn, J. F. 2000. Robust lip tracking by combining shape, color and motion. In Proc. ACCV, 1--6.Google Scholar
Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.-P., and Theobalt, C. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG 31, 6, 187:1--187:11. Google ScholarDigital Library
Vlasic, D., Brand, M., Pfister, H., and Popovic, J. 2005. Face transfer with multilinear models. ACM TOG 24, 3, 426--433. Google ScholarDigital Library
Wang, S. L., Lau, W. H., and Leung, S. H. 2004. Automatic lip contour extraction from color images. Pattern Recogn. 37, 12, 2375--2387. Google ScholarDigital Library
Wang, Y., Huang, X., Su Lee, C., Zhang, S., Li, Z., Samaras, D., Metaxas, D., Elgammal, A., and Huang, P. 2004. High resolution acquisition, learning and transfer of dynamic 3D facial expressions. CGF 23, 3, 677--686.Google ScholarCross Ref
Weise, T., Li, H., Gool, L. J. V., and Pauly, M. 2009. Face/Off: Live facial puppetry. In Proc. ACM SCA, 7--16. Google ScholarDigital Library
Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM TOG 30, 77:1--77:10. Google ScholarDigital Library
Wenger, A., Gardner, A., Tchou, C., Unger, J., Hawkins, T., and Debevec, P. 2005. Performance relighting and reflectance transformation with time-multiplexed illumination. ACM TOG 24, 3, 756--764. Google ScholarDigital Library
Weyrich, T., Matusik, W., Pfister, H., Bickel, B., Donner, C., Tu, C., McAndless, J., Lee, J., Ngan, A., Jensen, H. W., and Gross, M. 2006. Analysis of human faces using a measurement-based skin reflectance model. ACM TOG 25, 3, 1013--1024. Google ScholarDigital Library
Williams, L. 1990. Performance-driven facial animation. In Proc. ACM Siggraph, 235--242. Google ScholarDigital Library

Index Terms

Corrective 3D reconstruction of lips from monocular video
1. Computing methodologies

Recommendations

Reconstructing detailed dynamic face geometry from monocular video

Detailed facial performance geometry can be reconstructed using dense camera and light setups in controlled studios. However, a wide range of important applications cannot employ these approaches, including all movie productions shot from a single ...
Read More
Automatic acquisition of high-fidelity facial performances using monocular videos

This paper presents a facial performance capture system that automatically captures high-fidelity facial performances using uncontrolled monocular videos (e.g., Internet videos). We start the process by detecting and tracking important facial features ...
Read More
Robust 3D face modeling and reconstruction from frontal and side images

Robust and effective capture and reconstruction of 3D face models directly by smartphone users enables many applications. This paper presents a novel 3D face modeling and reconstruction solution that robustly and accurately acquire 3D face models from a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Graphics Volume 35, Issue 6
November 2016
1045 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2980179
Issue’s Table of Contents

Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 December 2016
Published in tog Volume 35, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
face modeling
facial performance capture
lip shape reconstruction
radial basis function networks
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 25
  Total Citations
  View Citations
- 441
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Corrective 3D reconstruction of lips from monocular video

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Reconstructing detailed dynamic face geometry from monocular video

Automatic acquisition of high-fidelity facial performances using monocular videos

Robust 3D face modeling and reconstruction from frontal and side images

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Corrective 3D reconstruction of lips from monocular video

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Reconstructing detailed dynamic face geometry from monocular video

Automatic acquisition of high-fidelity facial performances using monocular videos

Robust 3D face modeling and reconstruction from frontal and side images

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media