Skip to main content
Top

2025 | OriginalPaper | Chapter

Bridging the Gap: Studio-Like Avatar Creation from a Monocular Phone Capture

Authors : ShahRukh Athar, Shunsuke Saito, Zhengyu Yang, Stanislav Pidhorskyi, Chen Cao

Published in: Computer Vision – ECCV 2024

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Creating photorealistic avatars for individuals traditionally involves extensive capture sessions with complex and expensive studio devices like the LightStage system. While recent strides in neural representations have enabled the generation of photorealistic and animatable 3D avatars from quick phone scans, they have the capture-time lighting baked-in, lack facial details and have missing regions in areas such as the back of the ears. Thus, they lag in quality compared to studio-captured avatars. In this paper, we propose a method that bridges this gap by generating studio-like illuminated texture maps from short, monocular phone captures. We do this by parameterizing the phone texture maps using the \(W^+\) space of a StyleGAN2, enabling near-perfect reconstruction. Then, we finetune a StyleGAN2 by sampling in the \(W^+\) parameterized space using a very small set of studio-captured textures as an adversarial training signal. To further enhance the realism and accuracy of facial details, we super-resolve the output of the StyleGAN2 using carefully designed diffusion model that is guided by image gradients of the phone-captured texture map. Once trained, our method excels at producing studio-like facial texture maps from casual monocular smartphone videos. Demonstrating its capabilities, we showcase the generation of photorealistic, uniformly lit, complete avatars from monocular phone captures.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
We use the network from here.
 
Literature
1.
go back to reference Alexander, O., et al.: The digital Emily project: Achieving a photoreal digital actor. IEEE Comput. Graph. Appl. 30 (2009) Alexander, O., et al.: The digital Emily project: Achieving a photoreal digital actor. IEEE Comput. Graph. Appl. 30 (2009)
2.
go back to reference Alexander, O., et al.: Digital ira: creating a real-time photoreal digital actor. In: ACM SIGGRAPH 2013 Posters, p. 1 (2013) Alexander, O., et al.: Digital ira: creating a real-time photoreal digital actor. In: ACM SIGGRAPH 2013 Posters, p. 1 (2013)
3.
go back to reference Athar, S., Xu, Z., Sunkavalli, K., Shechtman, E., Shu, Z.: RigNeRF: fully controllable neural 3D portraits. In: CVPR (2022) Athar, S., Xu, Z., Sunkavalli, K., Shechtman, E., Shu, Z.: RigNeRF: fully controllable neural 3D portraits. In: CVPR (2022)
4.
go back to reference Bi, S., et al.: Deep relightable appearance models for animatable faces. ACM Trans. Graph. (TOG) 40(4), 1–15 (2021)CrossRef Bi, S., et al.: Deep relightable appearance models for animatable faces. ACM Trans. Graph. (TOG) 40(4), 1–15 (2021)CrossRef
5.
go back to reference Borshukov, G., Lewis, J.P.: Realistic human face rendering for “the matrix reloaded”. In: ACM Siggraph 2005 Courses, pp. 13–es (2005) Borshukov, G., Lewis, J.P.: Realistic human face rendering for “the matrix reloaded”. In: ACM Siggraph 2005 Courses, pp. 13–es (2005)
6.
go back to reference Cao, C., et al.: Authentic volumetric avatars from a phone scan. ACM Trans. Graph. (2022) Cao, C., et al.: Authentic volumetric avatars from a phone scan. ACM Trans. Graph. (2022)
7.
go back to reference Cao, C., Wu, H., Weng, Y., Shao, T., Zhou, K.: Real-time facial animation with image-based dynamic avatars. ACM Trans. Graph. 35(4) (2016) Cao, C., Wu, H., Weng, Y., Shao, T., Zhou, K.: Real-time facial animation with image-based dynamic avatars. ACM Trans. Graph. 35(4) (2016)
8.
go back to reference Casas, D., et al.: Rapid photorealistic blendshapes from commodity RGB-D sensors. In: Proceedings of the 19th Symposium on Interactive 3D Graphics and Games, p. 134 (2015) Casas, D., et al.: Rapid photorealistic blendshapes from commodity RGB-D sensors. In: Proceedings of the 19th Symposium on Interactive 3D Graphics and Games, p. 134 (2015)
10.
go back to reference Gafni, G., Thies, J., Zollhofer, M., Nießner, M.: Dynamic neural radiance fields for monocular 4D facial avatar reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8649–8658 (2021) Gafni, G., Thies, J., Zollhofer, M., Nießner, M.: Dynamic neural radiance fields for monocular 4D facial avatar reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8649–8658 (2021)
11.
go back to reference Garrido, P., et al.: Reconstruction of personalized 3D face rigs from monocular video. ACM Trans. Graph. (TOG) 35(3), 1–15 (2016)CrossRef Garrido, P., et al.: Reconstruction of personalized 3D face rigs from monocular video. ACM Trans. Graph. (TOG) 35(3), 1–15 (2016)CrossRef
12.
go back to reference Ghosh, A., Fyffe, G., Tunwattanapong, B., Busch, J., Yu, X., Debevec, P.: Multiview face capture using polarized spherical gradient illumination. ACM Trans. Graph. (2011) Ghosh, A., Fyffe, G., Tunwattanapong, B., Busch, J., Yu, X., Debevec, P.: Multiview face capture using polarized spherical gradient illumination. ACM Trans. Graph. (2011)
14.
go back to reference Grassal, P.W., Prinzler, M., Leistner, T., Rother, C., Nießner, M., Thies, J.: Neural head avatars from monocular RGB videos. In: CVPR, pp. 18653–18664 (2022) Grassal, P.W., Prinzler, M., Leistner, T., Rother, C., Nießner, M., Thies, J.: Neural head avatars from monocular RGB videos. In: CVPR, pp. 18653–18664 (2022)
15.
go back to reference Hu, L., et al.: Avatar digitization from a single image for real-time rendering. ACM Trans. Graph. (ToG) 36(6), 1–14 (2017)CrossRef Hu, L., et al.: Avatar digitization from a single image for real-time rendering. ACM Trans. Graph. (ToG) 36(6), 1–14 (2017)CrossRef
16.
go back to reference Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–189 (2018) Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–189 (2018)
17.
go back to reference Ichim, A.E., Bouaziz, S., Pauly, M.: Dynamic 3D avatar creation from hand-held video input. ACM Trans. Graph. (ToG) 34(4), 1–14 (2015)CrossRef Ichim, A.E., Bouaziz, S., Pauly, M.: Dynamic 3D avatar creation from hand-held video input. ACM Trans. Graph. (ToG) 34(4), 1–14 (2015)CrossRef
18.
go back to reference Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017) Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
19.
go back to reference Jimenez, J., Echevarria, J.I., Oat, C., Gutierrez, D.: Practical and realistic facial wrinkles animation. In: GPU Pro 360 Guide to Geometry Manipulation, pp. 95–107. AK Peters/CRC Press (2018) Jimenez, J., Echevarria, J.I., Oat, C., Gutierrez, D.: Practical and realistic facial wrinkles animation. In: GPU Pro 360 Guide to Geometry Manipulation, pp. 95–107. AK Peters/CRC Press (2018)
20.
go back to reference Jimenez, J., et al.: A practical appearance model for dynamic facial color. In: ACM SIGGRAPH Asia 2010 Papers, pp. 1–10 (2010) Jimenez, J., et al.: A practical appearance model for dynamic facial color. In: ACM SIGGRAPH Asia 2010 Papers, pp. 1–10 (2010)
22.
go back to reference Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. In: Proceedings of the NeurIPS (2020) Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. In: Proceedings of the NeurIPS (2020)
23.
go back to reference Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019) Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)
24.
go back to reference Lattas, A., et al.: AvatarMe: realistically renderable 3D facial reconstruction “in-the-wild”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 760–769 (2020) Lattas, A., et al.: AvatarMe: realistically renderable 3D facial reconstruction “in-the-wild”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 760–769 (2020)
25.
go back to reference Lattas, A., Moschoglou, S., Ploumpis, S., Gecer, B., Ghosh, A., Zafeiriou, S.: AvatarMe++: facial shape and BRDF inference with photorealistic rendering-aware GANs. IEEE Trans. Pattern Anal. Mach. Intell. 44(12), 9269–9284 (2021)CrossRef Lattas, A., Moschoglou, S., Ploumpis, S., Gecer, B., Ghosh, A., Zafeiriou, S.: AvatarMe++: facial shape and BRDF inference with photorealistic rendering-aware GANs. IEEE Trans. Pattern Anal. Mach. Intell. 44(12), 9269–9284 (2021)CrossRef
26.
go back to reference Lin, J., Yuan, Y., Zou, Z.: MeInGame: create a game character face from a single portrait. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 311–319 (2021) Lin, J., Yuan, Y., Zou, Z.: MeInGame: create a game character face from a single portrait. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 311–319 (2021)
27.
go back to reference Lombardi, S., Saragih, J., Simon, T., Sheikh, Y.: Deep appearance models for face rendering. ACM Trans. Graph. (ToG) 37(4), 1–13 (2018)CrossRef Lombardi, S., Saragih, J., Simon, T., Sheikh, Y.: Deep appearance models for face rendering. ACM Trans. Graph. (ToG) 37(4), 1–13 (2018)CrossRef
28.
go back to reference Lombardi, S., Simon, T., Schwartz, G., Zollhoefer, M., Sheikh, Y., Saragih, J.: Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph. (ToG) 40(4), 1–13 (2021)CrossRef Lombardi, S., Simon, T., Schwartz, G., Zollhoefer, M., Sheikh, Y., Saragih, J.: Mixture of volumetric primitives for efficient neural rendering. ACM Trans. Graph. (ToG) 40(4), 1–13 (2021)CrossRef
29.
go back to reference Luo, H., et al.: Normalized avatar synthesis using StyleGAN and perceptual refinement. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021) Luo, H., et al.: Normalized avatar synthesis using StyleGAN and perceptual refinement. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)
30.
go back to reference Ma, S., et al.: Pixel codec avatars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 64–73 (2021) Ma, S., et al.: Pixel codec avatars. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 64–73 (2021)
31.
go back to reference Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for GANs do actually converge? (2018) Mescheder, L., Geiger, A., Nowozin, S.: Which training methods for GANs do actually converge? (2018)
32.
go back to reference Nagano, K., et al.: Skin microstructure deformation with displacement map convolution. ACM Trans. Graph. 34(4), 109–1 (2015) Nagano, K., et al.: Skin microstructure deformation with displacement map convolution. ACM Trans. Graph. 34(4), 109–1 (2015)
34.
go back to reference Pinkney, J.N., Adler, D.: Resolution dependent GAN interpolation for controllable image synthesis between domains. arXiv preprint arXiv:2010.05334 (2020) Pinkney, J.N., Adler, D.: Resolution dependent GAN interpolation for controllable image synthesis between domains. arXiv preprint arXiv:​2010.​05334 (2020)
35.
go back to reference Sang, S., et al.: AgileAvatar: stylized 3D avatar creation via cascaded domain bridging. In: SIGGRAPH Asia 2022 Conference Papers, pp. 1–8 (2022) Sang, S., et al.: AgileAvatar: stylized 3D avatar creation via cascaded domain bridging. In: SIGGRAPH Asia 2022 Conference Papers, pp. 1–8 (2022)
36.
go back to reference Seymour, M., Evans, C., Libreri, K.: Meet mike: epic avatars. In: ACM SIGGRAPH 2017 VR Village, pp. 1–2 (2017) Seymour, M., Evans, C., Libreri, K.: Meet mike: epic avatars. In: ACM SIGGRAPH 2017 VR Village, pp. 1–2 (2017)
37.
go back to reference Shi, T., Yuan, Y., Fan, C., Zou, Z., Shi, Z., Liu, Y.: Face-to-parameter translation for game character auto-creation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 161–170 (2019) Shi, T., Yuan, Y., Fan, C., Zou, Z., Shi, Z., Liu, Y.: Face-to-parameter translation for game character auto-creation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 161–170 (2019)
38.
go back to reference Song, G., et al.: AgileGAN: stylizing portraits by inversion-consistent transfer learning. ACM Trans. Graph. (TOG) 40(4), 1–13 (2021)CrossRef Song, G., et al.: AgileGAN: stylizing portraits by inversion-consistent transfer learning. ACM Trans. Graph. (TOG) 40(4), 1–13 (2021)CrossRef
39.
go back to reference Wang, T.C., Liu, M.Y., Tao, A., Liu, G., Kautz, J., Catanzaro, B.: Few-shot video-to-video synthesis (2019) Wang, T.C., Liu, M.Y., Tao, A., Liu, G., Kautz, J., Catanzaro, B.: Few-shot video-to-video synthesis (2019)
40.
go back to reference Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018) Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)
41.
go back to reference Yamaguchi, S., et al.: High-fidelity facial reflectance and geometry inference from an unconstrained image. ACM Trans. Graph. (TOG) 37(4), 1–14 (2018)CrossRef Yamaguchi, S., et al.: High-fidelity facial reflectance and geometry inference from an unconstrained image. ACM Trans. Graph. (TOG) 37(4), 1–14 (2018)CrossRef
42.
go back to reference Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networkss. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017) Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networkss. In: 2017 IEEE International Conference on Computer Vision (ICCV) (2017)
43.
go back to reference Zongsheng, Y., Jianyi, W., Chen Change, L.: ResShift: efficient diffusion model for image super-resolution by residual shifting. In: Advances in Neural Information Processing Systems (NeurIPS) (2023) Zongsheng, Y., Jianyi, W., Chen Change, L.: ResShift: efficient diffusion model for image super-resolution by residual shifting. In: Advances in Neural Information Processing Systems (NeurIPS) (2023)
Metadata
Title
Bridging the Gap: Studio-Like Avatar Creation from a Monocular Phone Capture
Authors
ShahRukh Athar
Shunsuke Saito
Zhengyu Yang
Stanislav Pidhorskyi
Chen Cao
Copyright Year
2025
DOI
https://doi.org/10.1007/978-3-031-73254-6_5

Premium Partner