skip to main content
research-article

Corrective 3D reconstruction of lips from monocular video

Published:05 December 2016Publication History
Skip Abstract Section

Abstract

In facial animation, the accurate shape and motion of the lips of virtual humans is of paramount importance, since subtle nuances in mouth expression strongly influence the interpretation of speech and the conveyed emotion. Unfortunately, passive photometric reconstruction of expressive lip motions, such as a kiss or rolling lips, is fundamentally hard even with multi-view methods in controlled studios. To alleviate this problem, we present a novel approach for fully automatic reconstruction of detailed and expressive lip shapes along with the dense geometry of the entire face, from just monocular RGB video. To this end, we learn the difference between inaccurate lip shapes found by a state-of-the-art monocular facial performance capture approach, and the true 3D lip shapes reconstructed using a high-quality multi-view system in combination with applied lip tattoos that are easy to track. A robust gradient domain regressor is trained to infer accurate lip shapes from coarse monocular reconstructions, with the additional help of automatically extracted inner and outer 2D lip contours. We quantitatively and qualitatively show that our monocular approach reconstructs higher quality lip shapes, even for complex shapes like a kiss or lip rolling, than previous monocular approaches. Furthermore, we compare the performance of person-specific and multi-person generic regression strategies and show that our approach generalizes to new individuals and general scenes, enabling high-fidelity reconstruction even from commodity video footage.

Skip Supplemental Material Section

Supplemental Material

References

  1. Alexa, M. 2002. Linear combination of transformations. ACM TOG 21, 3, 380--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexander, O., Rogers, M., Lambeth, W., Chiang, J., Ma, W., Wang, C., and Debevec, P. E. 2010. The digital emily project: Achieving a photorealistic digital actor. IEEE CGAA 30, 4, 20--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Alexander, O., Fyffe, G., Busch, J., Yu, X., Ichikari, R., Jones, A., Debevec, P., Jimenez, J., Danvoye, E., Antionazzi, B., Eheler, M., Kysela, Z., and von der Pahlen, J. 2013. Digital Ira: Creating a real-time photoreal digital actor. In ACM Siggrah Posters. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Anderson, R., Stenger, B., and Cipolla, R. 2013. Lip tracking for 3D face registration. In Proc. MVA, 145--148.Google ScholarGoogle Scholar
  5. Barnard, M., Holden, E. J., and Owens, R. 2002. Lip tracking using pattern matching snakes. In Proc. ACCV, 1--6.Google ScholarGoogle Scholar
  6. Beeler, T., Bickel, B., Beardsley, P., Sumner, B., and Gross, M. 2010. High-quality single-shot capture of facial geometry. ACM TOG 29, 4, 40:1--40:9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM TOG 30, 4, 75:1--75:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Beeler, T., Bickel, B., Noris, G., Marschner, S., Beardsley, P., Sumner, R. W., and Gross, M. 2012. Coupled 3D reconstruction of sparse facial hair and skin. ACM TOG 31, 4, 117:1--117:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bérard, P., Bradley, D., Nitti, M., Beeler, T., and Gross, M. 2014. High-quality capture of eyes. ACM TOG 33, 6, 223:1--223:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Bermano, A., Beeler, T., Kozlov, Y., Bradley, D., Bickel, B., and Gross, M. 2015. Detailed spatio-temporal reconstruction of eyelids. ACM TOG 34, 4, 44:1--44:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bhat, K. S., Goldenthal, R., Ye, Y., Mallet, R., and Koperwas, M. 2013. High fidelity facial animation capture and retargeting with contours. In Proc. ACM SCA, 7--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bickel, B., Botsch, M., Angst, R., Matusik, W., Otaduy, M. A., Pfister, H., and Gross, M. H. 2007. Multi-scale capture of facial geometry and motion. ACM TOG 26, 3, 33:1--33:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bishop, C. M. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3D faces. In Proc. ACM Siggraph, 187--194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Borshukov, G., Piponi, D., Larsen, O., Lewis, J. P., and Tempelaar-Lietz, C. 2003. Universal capture: Image-based facial animation for "The Matrix Reloaded". In ACM SIGGRAPH 2003 Sketches & Applications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Bouaziz, S., Wang, Y., and Pauly, M. 2013. Online modeling for realtime facial animation. ACM TOG 32, 4, 40:1--40:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Bradley, D., Heidrich, W., Popa, T., and Sheffer, A. 2010. High resolution passive facial performance capture. ACM TOG 29, 4, 41:1--41:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Cao, C., Hou, Q., and Zhou, K. 2014. Displaced dynamic expression regression for real-time facial tracking and animation. ACM TOG 33, 4, 43:1--43:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Cao, C., Bradley, D., Zhou, K., and Beeler, T. 2015. Real-time high-fidelity facial performance capture. ACM TOG 34, 4, 46:1--46:9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chen, Y.-L., Wu, H.-T., Shi, F., Tong, X., and Chai, J. 2013. Accurate and robust 3D facial capture using a single RGBD camera. In Proc. ICCV, 3615--3622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Cootes, T. F., Edwards, G. J., and Taylor, C. J. 2001. Active appearance models. IEEE Trans. Pattern Anal. Machine Intell. 23, 6, 681--685. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Dale, K., Sunkavalli, K., Johnson, M. K., Vlasic, D., Matusik, W., and Pfister, H. 2011. Video face replacement. ACM TOG 30, 6, 130:1--130:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Dollár, P., Tu, Z., and Belongie, S. 2006. Supervised learning of edges and object boundaries. In Proc. CVPR, 1964--1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Echevarria, J. I., Bradley, D., Gutierrez, D., and Beeler, T. 2014. Capturing and stylizing hair for 3D fabrication. ACM TOG 33, 4, 125:1--125:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Eveno, N., Caplier, A., and Coulon, P. Y. 2004. Accurate and quasi-automatic lip tracking. IEEE Trans. Circuit and Systems for Video Tech. 14, 5, 706--715. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Fyffe, G., Jones, A., Alexander, O., Ichikari, R., and Debevec, P. 2014. Driving high-resolution facial scans with video performance capture. ACM TOG 34, 1, 8:1--8:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Garrido, P., Valgaerts, L., Wu, C., and Theobalt, C. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM TOG 32, 6, 158:1--158:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Garrido, P., Valgaerts, L., Sarmadi, H., Steiner, I., Varanasi, K., Perez, P., and Theobalt, C. 2015. VDub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. CGF 34, 2, 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Garrido, P., Zollhöfer, M., Casas, D., Valgaerts, L., Varanasi, K., Pérez, P., and Theobalt, C. 2016. Reconstruction of personalized 3D face rigs from monocular video. ACM TOG 35, 3, 28:1--28:15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ghosh, A., Fyffe, G., Tunwattanapong, B., Busch, J., Yu, X., and Debevec, P. 2011. Multiview face capture using polarized spherical gradient illumination. ACM TOG 30, 6, 129:1--129:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Graham, P., Tunwattanapong, B., Busch, J., Yu, X., Jones, A., Debevec, P. E., and Ghosh, A. 2013. Measurement-based synthesis of facial microgeometry. CGF 32, 2, 335--344.Google ScholarGoogle ScholarCross RefCross Ref
  32. Guenter, B., Grimm, C., Wood, D., Malvar, H., and Pighin, F. 1998. Making faces. In Proc. ACM Siggraph, 55--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Higham, N. J. 1986. Computing the polar decomposition with applications. SIAM J. Sci. Stat. Comput. 7, 4, 1160--1174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Hoerl, A. E., and Kennard, R. W. 2000. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 42, 1, 80--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hsieh, P.-L., Ma, C., Yu, J., and Li, H. 2015. Unconstrained realtime facial performance capture. In Proc. CVPR, 1675--1683.Google ScholarGoogle Scholar
  36. Hu, L., Ma, C., Luo, L., and Li, H. 2015. Single-view hair modeling using a hairstyle database. ACM TOG 34, 4, 125:1--125:9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Lever-aging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM TOG 30, 4, 74:1--74:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ichim, A. E., Bouaziz, S., and Pauly, M. 2015. Dynamic 3D avatar creation from hand-held video input. ACM TOG 34, 4, 45:1--45:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Kaucic, R., and Blake, A. 1998. Accurate, real-time, unadorned lip tracking. In Proc. ICCV, 370--375. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Kawai, M., Iwao, T., Maejima, A., and Morishima, S. 2014. Automatic photorealistic 3D inner mouth restoration from frontal images. In Proc. ISVC, 51--62.Google ScholarGoogle Scholar
  41. Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. M. 2010. Being John Malkovich. In Proc. ECCV, 341--353. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Klaudiny, M., and Hilton, A. 2012. High-detail 3D capture and non-sequential alignment of facial performance. In Proc. 3DIMPVT, 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Lewis, J., and Anjyo, K.-i. 2010. Direct manipulation blend-shapes. IEEE Comp. Graphics and Applications 30, 4, 42--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM TOG 32, 4, 42:1--42:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Liu, Y., Xu, F., Chai, J., Tong, X., Wang, L., and Huo, Q. 2015. Video-audio driven real-time facial animation. ACM Trans. Graph. 34, 6, 182:1--182:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Luo, L., Li, H., Paris, S., Weise, T., Pauly, M., and Rusinkiewicz, S. 2012. Multi-view hair capture using orientation fields. In Proc. CVPR, 1490--1497. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Nagano, K., Fyffe, G., Alexander, O., Barbič, J., Li, H., Ghosh, A., and Debevec, P. 2015. Skin microstructure deformation with displacement map convolution. ACM TOG 34, 4, 109:1--109:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Nath, A. R., and Beauchamp, M. S. 2012. A neural basis for interindividual differences in the McGurk effect, a multisensory speech illusion. NeuroImage 59, 1, 781--787.Google ScholarGoogle ScholarCross RefCross Ref
  49. Nguyen, Q. D., and Milgram, M. 2009. Semi adaptive appearance models for lip tracking. In Proc. ICIP, 2437--2440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Pighin, F., and Lewis, J. 2006. Performance-driven facial animation. In ACM Siggraph Courses.Google ScholarGoogle Scholar
  51. Saragih, J. M., Lucey, S., and Cohn, J. F. 2009. Face alignment through subspace constrained mean-shifts. In Proc. ICCV, 1034--1041.Google ScholarGoogle Scholar
  52. Saragih, J. M., Lucey, S., and Cohn, J. F. 2011. Deformable model fitting by regularized landmark mean-shift. Int. J. Computer Vision 91, 2, 200--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Shi, F., Wu, H.-T., Tong, X., and Chai, J. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM TOG 33, 6, 222:1--222:13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Sifakis, E., Neverov, I., and Fedkiw, R. 2005. Automatic determination of facial muscle activations from sparse motion capture marker data. ACM TOG 24, 3, 417--425. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Sumner, R. W., and Popovic, J. 2004. Deformation transfer for triangle meshes. ACM TOG 23, 3, 399--405. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Suwajanakorn, S., Kemelmacher-Shlizerman, I., and Seitz, S. M. 2014. Total moving face reconstruction. In Proc. ECCV, 796--812.Google ScholarGoogle Scholar
  57. Suwajanakorn, S., Seitz, S. M., and Kemelmacher-Shlizerman, I. 2015. What makes Tom Hanks look like Tom Hanks. In Proc. ICCV, 3952--3960. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Thies, J., Zollhöfer, M., Niessner, M., Valgaerts, L., Stamminger, M., and Theobalt, C. 2015. Real-time expression transfer for facial reenactment. ACM TOG 34, 6, 183:1--183:14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., and Niessner, M. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proc. CVPR.Google ScholarGoogle Scholar
  60. Tian, Y.-L., Kanade, T., and Cohn, J. F. 2000. Robust lip tracking by combining shape, color and motion. In Proc. ACCV, 1--6.Google ScholarGoogle Scholar
  61. Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.-P., and Theobalt, C. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. ACM TOG 31, 6, 187:1--187:11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Vlasic, D., Brand, M., Pfister, H., and Popovic, J. 2005. Face transfer with multilinear models. ACM TOG 24, 3, 426--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Wang, S. L., Lau, W. H., and Leung, S. H. 2004. Automatic lip contour extraction from color images. Pattern Recogn. 37, 12, 2375--2387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Wang, Y., Huang, X., Su Lee, C., Zhang, S., Li, Z., Samaras, D., Metaxas, D., Elgammal, A., and Huang, P. 2004. High resolution acquisition, learning and transfer of dynamic 3D facial expressions. CGF 23, 3, 677--686.Google ScholarGoogle ScholarCross RefCross Ref
  65. Weise, T., Li, H., Gool, L. J. V., and Pauly, M. 2009. Face/Off: Live facial puppetry. In Proc. ACM SCA, 7--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. ACM TOG 30, 77:1--77:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Wenger, A., Gardner, A., Tchou, C., Unger, J., Hawkins, T., and Debevec, P. 2005. Performance relighting and reflectance transformation with time-multiplexed illumination. ACM TOG 24, 3, 756--764. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Weyrich, T., Matusik, W., Pfister, H., Bickel, B., Donner, C., Tu, C., McAndless, J., Lee, J., Ngan, A., Jensen, H. W., and Gross, M. 2006. Analysis of human faces using a measurement-based skin reflectance model. ACM TOG 25, 3, 1013--1024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Williams, L. 1990. Performance-driven facial animation. In Proc. ACM Siggraph, 235--242. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Corrective 3D reconstruction of lips from monocular video

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Graphics
          ACM Transactions on Graphics  Volume 35, Issue 6
          November 2016
          1045 pages
          ISSN:0730-0301
          EISSN:1557-7368
          DOI:10.1145/2980179
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 5 December 2016
          Published in tog Volume 35, Issue 6

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader