research-article

paGAN: real-time avatars using dynamic textures

Authors:
Koki Nagano

USC Institute for Creative Technologies

USC Institute for Creative Technologies
View Profile

,
Jaewoo Seo

Pinscreen

Pinscreen
View Profile

,
Jun Xing

USC Institute for Creative Technologies

USC Institute for Creative Technologies
View Profile

,
Lingyu Wei

Pinscreen

Pinscreen
View Profile

,
Zimo Li

University of Southern California

University of Southern California
View Profile

,
Shunsuke Saito

University of Southern California

University of Southern California
View Profile

,
Aviral Agarwal

Pinscreen

Pinscreen
View Profile

,
Jens Fursund

Pinscreen

Pinscreen
View Profile

,
Hao Li

University of Southern California, USC Institute for Creative Technologies

University of Southern California, USC Institute for Creative Technologies
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 37 Issue 6Article No.: 258pp 1–12https://doi.org/10.1145/3272127.3275075

Published:04 December 2018Publication History

ACM Transactions on Graphics

Abstract

With the rising interest in personalized VR and gaming experiences comes the need to create high quality 3D avatars that are both low-cost and variegated. Due to this, building dynamic avatars from a single unconstrained input image is becoming a popular application. While previous techniques that attempt this require multiple input images or rely on transferring dynamic facial appearance from a source actor, we are able to do so using only one 2D input image without any form of transfer from a source image. We achieve this using a new conditional Generative Adversarial Network design that allows fine-scale manipulation of any facial input image into a new expression while preserving its identity. Our photoreal avatar GAN (paGAN) can also synthesize the unseen mouth interior and control the eye-gaze direction of the output, as well as produce the final image from a novel viewpoint. The method is even capable of generating fully-controllable temporally stable video sequences, despite not using temporal information during training. After training, we can use our network to produce dynamic image-based avatars that are controllable on mobile devices in real time. To do this, we compute a fixed set of output images that correspond to key blendshapes, from which we extract textures in UV space. Using a subject's expression blendshapes at run-time, we can linearly blend these key textures together to achieve the desired appearance. Furthermore, we can use the mouth interior and eye textures produced by our network to synthesize on-the-fly avatar animations for those regions. Our work produces state-of-the-art quality image and video synthesis, and is the first to our knowledge that is able to generate a dynamically textured avatar with a mouth interior, all from a single image.

Supplemental Material

a258-nagano.mp4

mp4

117.9 MB

Download

References

P. Ekman and W. Friesen. 1978. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto.Google Scholar
O. Alexander, G. Fyffe, J. Busch, X. Yu, R. Ichikari, A. Jones, P. Debevec, J. Jimenez, E. Danvoye, B. Antionazzi, M. Eheler, Z. Kysela, and J. von der Pahlen. 2013. Digital Ira: Creating a Real-time Photoreal Digital Actor. In ACM SIGGRAPH 2013 Posters (SIGGRAPH '13). ACM, New York, NY, USA, Article 1, 1 pages. Google ScholarDigital Library
B. Amberg, R. Knothe, and T. Vetter. 2008. Expression Invariant 3D Face Recognition with a Morphable Model. In International Conference on Automatic Face Gesture Recognition. 1--6.Google Scholar
H. Averbuch-Elor, D. Cohen-Or, J. Kopf, and M. F. Cohen. 2017. Bringing Portraits to Life. ACM Trans. Graph. 36, 4 (2017), to appear. Google ScholarDigital Library
V. Blanz and T. Vetter. 1999. A Morphable Model for the Synthesis of 3D Faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '99). 187--194. Google ScholarDigital Library
J. Booth, A. Roussos, A. Ponniah, D. Dunaway, and S. Zafeiriou. 2017. Large Scale 3D Morphable Models. International Journal of Computer Vision (2017), 1--22. Google ScholarDigital Library
J. Booth, A. Roussos, S. Zafeiriou, A. Ponniahy, and D. Dunaway. 2016. A 3D Morphable Model Learnt from 10,000 Faces. In Conference on Computer Vision and Pattern Recognition. 5543--5552.Google Scholar
S. Bouaziz, Y. Wang, and M. Pauly. 2013. Online Modeling for Realtime Facial Animation. ACM Trans. Graph. 32, 4, Article 40 (July 2013), 10 pages. Google ScholarDigital Library
C. Cao, D. Bradley, K. Zhou, and T. Beeler. 2015. Real-time high-fidelity facial performance capture. ACM Trans. Graph. 34, 4 (2015), 46. Google ScholarDigital Library
C. Cao, Y. Weng, S. Zhou, Y. Tong, and K. Zhou. 2014. Facewarehouse: A 3d facial expression database for visual computing. IEEE TVCG 20, 3 (2014), 413--425. Google ScholarDigital Library
C. Cao, H. Wu, Y. Weng, T. Shao, and K. Zhou. 2016. Real-time facial animation with image-based dynamic avatars. ACM Trans. Graph. 35, 4 (2016), 126. Google ScholarDigital Library
D. Casas, A. Feng, O. Alexander, G. Fyffe, P. Debevec, R. Ichikari, H. Li, K. Olszewski, E. Suma, and A. Shapiro. 2016. Rapid Photorealistic Blendshape Modeling from RGB-D Sensors. In Proceedings of the 29th International Conference on Computer Animation and Social Agents. ACM, 121--129. Google ScholarDigital Library
Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo. 2017. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. arXiv preprint arXiv:1711.09020 (2017).Google Scholar
K. Dale, K. Sunkavalli, M. K. Johnson, D. Vlasic, W. Matusik, and H. Pfister. 2011. Video Face Replacement. ACM Trans. Graph. 30, 6, Article 130 (Dec. 2011), 10 pages. Google ScholarDigital Library
H. Ding, K. Sricharan, and R. Chellappa. 2017. Exprgan: Facial expression editing with controllable expression intensity. arXiv preprint arXiv:1709.03842 (2017).Google Scholar
S. Du, Y. Tao, and A. M. Martinez. 2014. Compound facial expressions of emotion. Proceedings of the National Academy of Sciences 111, 15 (2014), E1454--E1462.Google ScholarCross Ref
P. Garrido, M. Zollhöfer, D. Casas, L. Valgaerts, K. Varanasi, P. Pérez, and C. Theobalt. 2016. Reconstruction of personalized 3D face rigs from monocular video. ACM Trans. Graph. 35, 3 (2016), 28. Google ScholarDigital Library
L. A. Gatys, A. S. Ecker, and M. Bethge. 2015. A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015).Google Scholar
P.-L. Hsieh, C. Ma, J. Yu, and H. Li. 2015. Unconstrained realtime facial performance capture. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1675--1683.Google Scholar
L. Hu, S. Saito, L. Wei, K. Nagano, J. Seo, J. Fursund, I. Sadeghi, C. Sun, Y.-C. Chen, and H. Li. 2017. Avatar Digitization From a Single Image For Real-Time Rendering. ACM Trans. Graph. 36, 6 (2017). Google ScholarDigital Library
L. Huynh, W. Chen, S. Saito, J. Xing, K. Nagano, A. Jones, P. Debevec, and H. Li. 2018. Mesoscopic Facial Geometry Inference Using Deep Neural Networks. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. 2016. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004 (2016).Google Scholar
J. Jimenez, T. Scully, N. Barbosa, C. Donner, X. Alvarez, T. Viera, P. Matts, V. Orvalho, D. Gutierrez, and T. Weyrich. 2010. A Practical Appearance Model for Dynamic Facial Color. 29, 5 (2010), 141:1--141:9.Google Scholar
T. Karras, T. Aila, S. Laine, and J. Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).Google Scholar
H. Kim, P. Carrido, A. Tewari, W. Xu, J. Thies, M. Niessner, P. Pérez, C. Richardt, M. Zollhöfer, and C. Theobalt. 2018. Deep Video Portraits. ACM Trans. Graph. 37, 4, Article 163 (July 2018), 14 pages. Google ScholarDigital Library
D. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
O. Langner, R. Dotsch, G. Bijlstra, D. H. Wigboldus, S. T. Hawk, and A. Van Knippenberg. 2010. Presentation and validation of the Radboud Faces Database. Cognition and emotion 24, 8 (2010), 1377--1388.Google Scholar
H. Li, B. Adams, L. J. Guibas, and M. Pauly. 2009. Robust Single-View Geometry And Motion Reconstruction. ACM Trans. Graph. 28, 5 (2009). Google ScholarDigital Library
H. Li, T. Weise, and M. Pauly. 2010. Example-Based Facial Rigging. ACM Trans. Graph. 29, 3 (July 2010). Google ScholarDigital Library
H. Li, J. Yu, Y. Ye, and C. Bregler. 2013. Realtime Facial Animation with On-the-fly Correctives. ACM Trans. Graph. 32, 4 (July 2013). Google ScholarDigital Library
D. S. Ma, J. Correll, and B. Wittenbrink. 2015. The Chicago face database: A free stimulus set of faces and norming data. Behavior research methods 47, 4 (2015), 1122--1135.Google Scholar
K. Olszewski, Z. Li, C. Yang, Y. Zhou, R. Yu, Z. Huang, S. Xiang, S. Saito, P. Kohli, and H. Li. Realistic dynamic facial textures from a single image using gans.Google Scholar
S. Saito, T. Li, and H. Li. 2016. Real-Time Facial Segmentation and Performance Capture from RGB Input. In ECCV.Google Scholar
S. Saito, L. Wei, L. Hu, K. Nagano, and H. Li. 2017. Photorealistic Facial Texture Inference Using Deep Neural Networks. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE.Google Scholar
Y. Seol, J. Seo, P. H. Kim, J. Lewis, and J. Noh. 2011. Artist friendly facial animation retargeting. In ACM Trans. Graph., Vol. 30. ACM, 162. Google ScholarDigital Library
L. Song, Z. Lu, R. He, Z. Sun, and T. Tan. 2017. Geometry Guided Adversarial Facial Expression Synthesis. arXiv preprint arXiv:1712.03474 (2017).Google Scholar
G. Stratou, A. Ghosh, P. Debevec, and L.-P. Morency. 2011. Effect of illumination on automatic expression recognition: a novel 3D relightable facial database. In Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on. IEEE, 611--618.Google ScholarCross Ref
S. Suwajanakorn, I. Kemelmacher-Shlizerman, and S. M. Seitz. 2014. Total moving face reconstruction. In ECCV. Springer, 796--812.Google Scholar
S. Suwajanakorn, S. M. Seitz, and I. Kemelmacher-Shlizerman. 2017. Synthesizing obama: learning lip sync from audio. ACM Trans. Graph. 36, 4 (2017), 95. Google ScholarDigital Library
J. Thies, M. Zollhöfer, M. Nießner, L. Valgaerts, M. Stamminger, and C. Theobalt. 2015. Real-time Expression Transfer for Facial Reenactment. ACM Trans. Graph. 34, 6 (2015). Google ScholarDigital Library
J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner. 2016a. Face2Face: Real-time Face Capture and Reenactment of RGB Videos. In IEEE CVPR.Google Scholar
J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner. 2016b. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2387--2395.Google Scholar
D. Vlasic, M. Brand, H. Pfister, and J. Popović. 2005. Face transfer with multilinear models. ACM Trans. Graph. 24, 3 (2005), 426--433. Google ScholarDigital Library
T. Weise, S. Bouaziz, H. Li, and M. Pauly. 2011. Realtime Performance-Based Facial Animation. ACM Trans. Graph. 30, 4 (July 2011). Google ScholarDigital Library
T. Weise, H. Li, L. V. Gool, and M. Pauly. 2009. Face/Off: Live Facial Puppetry. In Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer animation (Proc. SCA'09). Eurographics Association, ETH Zurich. Google ScholarDigital Library
X. Wu, R. He, Z. Sun, and T. Tan. 2015. A light CNN for deep face representation with noisy labels. arXiv preprint arXiv:1511.02683 (2015).Google Scholar
S. Yamaguchi, S. Saito, K. Nagano, Y. Zhao, W. Chen, K. Olszewski, S. Morishima, and H. Li. 2018. High-fidelity Facial Reflectance and Geometry Inference from an Unconstrained Image. ACM Trans. Graph. 37, 4, Article 162 (July 2018), 14 pages. Google ScholarDigital Library
F. Yang, J. Wang, E. Shechtman, L. Bourdev, and D. Metaxas. 2011. Expression flow for 3D-aware face component transfer. ACM Trans. Graph. 30, 4 (2011), 60:1--10. Google ScholarDigital Library
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017).Google Scholar

Index Terms

paGAN: real-time avatars using dynamic textures
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation

Recommendations

Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces

Synthesizing expressive facial animation is a very challenging topic within the graphics community. In this paper, we present an expressive facial animation synthesis system enabled by automated learning from facial motion capture data. Accurate 3D ...
Read More
Real-time facial animation with image-based dynamic avatars

We present a novel image-based representation for dynamic 3D avatars, which allows effective handling of various hairstyles and headwear, and can generate expressive facial animations with fine-scale details in real-time. We develop algorithms for ...
Read More
Video textures
SIGGRAPH '00: Proceedings of the 27th annual conference on Computer graphics and interactive techniques

This paper introduces a new type of medium, called a video texture, which has qualities somewhere between those of a photograph and a video. A video texture provides a continuous infinitely varying stream of images. While the individual frames of a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Graphics Volume 37, Issue 6
December 2018
1401 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3272127
Editor:
Takeo Igarashi
The University of Tokyo, Japan
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 December 2018
Published in tog Volume 37, Issue 6

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
digital avatar
facial animation
generative adversarial network
image-based rendering
texture synthesis
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 104
  Total Citations
  View Citations
- 1,291
  Total Downloads
- Downloads (Last 12 months)88
- Downloads (Last 6 weeks)20
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

paGAN: real-time avatars using dynamic textures

ACM Transactions on Graphics

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Expressive Facial Animation Synthesis by Learning Speech Coarticulation and Expression Spaces

Real-time facial animation with image-based dynamic avatars

Video textures