Skip to main content

2022 | OriginalPaper | Buchkapitel

Generating High-Resolution 3D Faces Using VQ-VAE-2 with PixelSNAIL Networks

verfasst von : Alessio Gallucci, Dmitry Znamenskiy, Nicola Pezzotti, Milan Petkovic

Erschienen in: Image Analysis and Processing. ICIAP 2022 Workshops

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The realistic generation of synthetic 3D faces is an open challenge due to the complexity of the geometry and the lack of large and diverse publicly available datasets. Generative models based on convolutional neural networks (CNNs) have recently demonstrated great ability to produce novel synthetic high-resolution images indistinguishable from the original pictures by an expert human observer. However, applying them to non-grid-like data like 3D meshes presents many challenges. In our work, we overcome the challenges by first reducing the face mesh to a 2D regular image representation and then exploiting one prominent state-of-the-art generative approach. The approach uses a Vector Quantized Variational Autoencoder VQ-VAE-2 to learn a latent discrete representation of the 2D images. Then, the 3D synthesis is achieved by fitting the latent space and sampling it with an autoregressive model, PixelSNAIL. The quantitative and qualitative evaluation demonstrate that synthetic faces generated with our method are statistically closer to the real faces when compared to a classical synthesis approach based on Principal Component Analysis (PCA).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Liu, S.-L., Liu, Y., Dong, L.-F., Tong, X.: RAS: a data-driven rigidity-aware skinning model for 3D facial animation. In: Computer Graphics Forum, pp. 581–594 (2020) Liu, S.-L., Liu, Y., Dong, L.-F., Tong, X.: RAS: a data-driven rigidity-aware skinning model for 3D facial animation. In: Computer Graphics Forum, pp. 581–594 (2020)
2.
Zurück zum Zitat Carrigan, E., Zell, E., Guiard, C., McDonnell, R.: Expression packing: as-few-as-possible training expressions for blendshape transfer. In: Computer Graphics Forum, pp. 219–233 (2020) Carrigan, E., Zell, E., Guiard, C., McDonnell, R.: Expression packing: as-few-as-possible training expressions for blendshape transfer. In: Computer Graphics Forum, pp. 219–233 (2020)
3.
Zurück zum Zitat Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36, 191–194 (2017)CrossRef Li, T., Bolkart, T., Black, M.J., Li, H., Romero, J.: Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36, 191–194 (2017)CrossRef
4.
5.
Zurück zum Zitat Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014) Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)
6.
Zurück zum Zitat Varol, G., et al.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017) Varol, G., et al.: Learning from synthetic humans. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 109–117 (2017)
7.
Zurück zum Zitat Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 187–194 (1999) Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 187–194 (1999)
8.
Zurück zum Zitat Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond euclidean data. IEEE Signal Process. Mag. 34, 18–42 (2017)CrossRef Bronstein, M.M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P.: Geometric deep learning: going beyond euclidean data. IEEE Signal Process. Mag. 34, 18–42 (2017)CrossRef
10.
Zurück zum Zitat De Haan, P., Weiler, M., Cohen, T., Welling, M.: Gauge equivariant mesh CNNs: anisotropic convolutions on geometric graphs. arXiv Prepr. arXiv2003.05425 (2020) De Haan, P., Weiler, M., Cohen, T., Welling, M.: Gauge equivariant mesh CNNs: anisotropic convolutions on geometric graphs. arXiv Prepr. arXiv2003.05425 (2020)
12.
Zurück zum Zitat Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Advances in Neural Information Processing Systems, pp. 14837–14847 (2019) Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. In: Advances in Neural Information Processing Systems, pp. 14837–14847 (2019)
13.
Zurück zum Zitat Van Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: International Conference on Machine Learning, pp. 1747–1756 (2016) Van Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: International Conference on Machine Learning, pp. 1747–1756 (2016)
14.
Zurück zum Zitat den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al.: Conditional image generation with pixelcnn decoders. In: Advances in Neural Information Processing Systems, pp. 4790–4798 (2016) den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al.: Conditional image generation with pixelcnn decoders. In: Advances in Neural Information Processing Systems, pp. 4790–4798 (2016)
15.
Zurück zum Zitat Vaswani, A., e al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017) Vaswani, A., e al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
16.
Zurück zum Zitat Chen, X., Mishra, N., Rohaninejad, M., Abbeel, P.: PixelSNAIL: an improved autoregressive generative model. In: 35th International Conference on Machine Learning ICML 2018, vol. 2, pp. 1364–1372 (2018) Chen, X., Mishra, N., Rohaninejad, M., Abbeel, P.: PixelSNAIL: an improved autoregressive generative model. In: 35th International Conference on Machine Learning ICML 2018, vol. 2, pp. 1364–1372 (2018)
18.
Zurück zum Zitat Abrevaya, V.F., Boukhayma, A., Wuhrer, S., Boyer, E.: A decoupled 3D facial shape model by adversarial training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9419–9428 (2019) Abrevaya, V.F., Boukhayma, A., Wuhrer, S., Boyer, E.: A decoupled 3D facial shape model by adversarial training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9419–9428 (2019)
19.
Zurück zum Zitat Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: real-time face capture and reenactment of RGB videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2387–2395 (2016) Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: real-time face capture and reenactment of RGB videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2387–2395 (2016)
20.
Zurück zum Zitat Vlasic, D., Brand, M., Pfister, H., Popovic, J.: Face transfer with multilinear models. In: ACM SIGGRAPH 2006 Courses, pp. 24–es (2006) Vlasic, D., Brand, M., Pfister, H., Popovic, J.: Face transfer with multilinear models. In: ACM SIGGRAPH 2006 Courses, pp. 24–es (2006)
21.
Zurück zum Zitat Booth, J., Roussos, A., Zafeiriou, S., Ponniah, A., Dunaway, D.: A 3D morphable model learnt from 10,000 faces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5543–5552 (2016) Booth, J., Roussos, A., Zafeiriou, S., Ponniah, A., Dunaway, D.: A 3D morphable model learnt from 10,000 faces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5543–5552 (2016)
22.
Zurück zum Zitat Tuan Tran, A., Hassner, T., Masi, I., Medioni, G.: Regressing robust and discriminative 3D morphable models with a very deep neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5163–5172 (2017) Tuan Tran, A., Hassner, T., Masi, I., Medioni, G.: Regressing robust and discriminative 3D morphable models with a very deep neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5163–5172 (2017)
23.
Zurück zum Zitat Gu, X., Gortler, S.J., Hoppe, H.: Geometry images. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, pp. 355–361 (2002) Gu, X., Gortler, S.J., Hoppe, H.: Geometry images. In: Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques, pp. 355–361 (2002)
24.
Zurück zum Zitat Booth, J., Zafeiriou, S.: Optimal UV spaces for facial morphable model construction. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 4672–4676 (2014) Booth, J., Zafeiriou, S.: Optimal UV spaces for facial morphable model construction. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 4672–4676 (2014)
25.
Zurück zum Zitat Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014) Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
26.
Zurück zum Zitat Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017) Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017)
28.
Zurück zum Zitat Shamai, G., Slossberg, R., Kimmel, R.: Synthesizing facial photometries and corresponding geometries using generative adversarial networks. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1–24 (2019)CrossRef Shamai, G., Slossberg, R., Kimmel, R.: Synthesizing facial photometries and corresponding geometries using generative adversarial networks. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1–24 (2019)CrossRef
29.
Zurück zum Zitat Moschoglou, S., Ploumpis, S., Nicolaou, M.A., Papaioannou, A., Zafeiriou, S.: 3DFaceGAN: adversarial nets for 3D face representation, generation, and translation. Int. J. Comput. Vis. 128, 2534–2551 (2020)CrossRef Moschoglou, S., Ploumpis, S., Nicolaou, M.A., Papaioannou, A., Zafeiriou, S.: 3DFaceGAN: adversarial nets for 3D face representation, generation, and translation. Int. J. Comput. Vis. 128, 2534–2551 (2020)CrossRef
30.
Zurück zum Zitat Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: 2nd International Conference on Learning Representations ICLR 2014 - Conference Track Proceedings, pp. 1–14 (2014) Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: 2nd International Conference on Learning Representations ICLR 2014 - Conference Track Proceedings, pp. 1–14 (2014)
31.
Zurück zum Zitat Bagautdinov, T., Wu, C., Saragih, J., Fua, P., Sheikh, Y.: Modeling facial geometry using compositional VAEs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3877–3886 (2018) Bagautdinov, T., Wu, C., Saragih, J., Fua, P., Sheikh, Y.: Modeling facial geometry using compositional VAEs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3877–3886 (2018)
32.
Zurück zum Zitat Abrevaya, V.F., Wuhrer, S., Boyer, E.: Multilinear autoencoder for 3D face model learning. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–9 (2018) Abrevaya, V.F., Wuhrer, S., Boyer, E.: Multilinear autoencoder for 3D face model learning. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–9 (2018)
33.
Zurück zum Zitat Li, K., Liu, J., Lai, Y.-K., Yang, J.: Generating 3D faces using multi-column graph convolutional networks. In: Computer Graphics Forum, pp. 215–224 (2019) Li, K., Liu, J., Lai, Y.-K., Yang, J.: Generating 3D faces using multi-column graph convolutional networks. In: Computer Graphics Forum, pp. 215–224 (2019)
34.
Zurück zum Zitat Tam, G.K.L.L., et al.: Registration of 3D point clouds and meshes: a survey from rigid to Nonrigid. IEEE Trans. Vis. Comput. Graph. 19, 1199–1217 (2013) Tam, G.K.L.L., et al.: Registration of 3D point clouds and meshes: a survey from rigid to Nonrigid. IEEE Trans. Vis. Comput. Graph. 19, 1199–1217 (2013)
35.
Zurück zum Zitat van Kaick, O., Zhang, H., Hamarneh, G., Cohen-Or, D.: A survey on shape correspondence. In: Eurographics Symposium on Geometry Processing (2011) van Kaick, O., Zhang, H., Hamarneh, G., Cohen-Or, D.: A survey on shape correspondence. In: Eurographics Symposium on Geometry Processing (2011)
36.
Zurück zum Zitat Gallucci, A., Znamenskiy, D., Petkovic, M.: Prediction of 3D body parts from face shape and anthropometric measurements. J. Image Graph. 8, 67–77 (2020) Gallucci, A., Znamenskiy, D., Petkovic, M.: Prediction of 3D body parts from face shape and anthropometric measurements. J. Image Graph. 8, 67–77 (2020)
37.
Zurück zum Zitat van den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. In: Advances in Neural Information Processing Systems, pp. 6306–6315 (2017) van den Oord, A., Vinyals, O., et al.: Neural discrete representation learning. In: Advances in Neural Information Processing Systems, pp. 6306–6315 (2017)
38.
Zurück zum Zitat Kingma, D.P., Welling, M.: An introduction to variational autoencoders. arXiv Prepr. arXiv1906.02691 (2019) Kingma, D.P., Welling, M.: An introduction to variational autoencoders. arXiv Prepr. arXiv1906.02691 (2019)
40.
Zurück zum Zitat Kabsch, W.: A solution for the best rotation to relate two sets of vectors. Acta Crystallogr. Sect. A Cryst. Phys. Diffr. Theor. Gen. Crystallogr. 32, 922–923 (1976) Kabsch, W.: A solution for the best rotation to relate two sets of vectors. Acta Crystallogr. Sect. A Cryst. Phys. Diffr. Theor. Gen. Crystallogr. 32, 922–923 (1976)
41.
Zurück zum Zitat Ball, R., Molenbroek, J.F.M.: Measuring Chinese heads and faces. In: Proceedings of the 9th International Congress of Physiological Anthropology, Human Diversity Design for Life, pp. 150–155 (2008) Ball, R., Molenbroek, J.F.M.: Measuring Chinese heads and faces. In: Proceedings of the 9th International Congress of Physiological Anthropology, Human Diversity Design for Life, pp. 150–155 (2008)
42.
Zurück zum Zitat Robinette, K.M., Daanen, H., Paquet, E.: The CAESAR project: a 3-D surface anthropometry survey. In: Second International Conference on 3-D Digital Imaging and Modeling (Cat. No.PR00062), pp. 380–386 (1999) Robinette, K.M., Daanen, H., Paquet, E.: The CAESAR project: a 3-D surface anthropometry survey. In: Second International Conference on 3-D Digital Imaging and Modeling (Cat. No.PR00062), pp. 380–386 (1999)
43.
Zurück zum Zitat Robinette, K.M., Daanen, H.: Lessons learned from CAESAR: a 3-D anthropometric survey, 5 (2003) Robinette, K.M., Daanen, H.: Lessons learned from CAESAR: a 3-D anthropometric survey, 5 (2003)
44.
Zurück zum Zitat Gallucci, A., Pezzotti, N., Znamenskiy, D., Petkovic, M.: A latent space exploration for microscopic skin lesion augmentations with VQ-VAE-2 and PixelSNAIL. In: SPIE Medical Imaging Proceedings (2021) Gallucci, A., Pezzotti, N., Znamenskiy, D., Petkovic, M.: A latent space exploration for microscopic skin lesion augmentations with VQ-VAE-2 and PixelSNAIL. In: SPIE Medical Imaging Proceedings (2021)
45.
Zurück zum Zitat Paszke, A., et al.: Automatic differentiation in PyTorch (2017) Paszke, A., et al.: Automatic differentiation in PyTorch (2017)
Metadaten
Titel
Generating High-Resolution 3D Faces Using VQ-VAE-2 with PixelSNAIL Networks
verfasst von
Alessio Gallucci
Dmitry Znamenskiy
Nicola Pezzotti
Milan Petkovic
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-031-13324-4_20