Skip to main content
Erschienen in: International Journal of Computer Vision 3/2020

24.08.2019

GANimation: One-Shot Anatomically Consistent Facial Animation

verfasst von: Albert Pumarola, Antonio Agudo, Aleix M. Martinez, Alberto Sanfeliu, Francesc Moreno-Noguer

Erschienen in: International Journal of Computer Vision | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recent advances in generative adversarial networks (GANs) have shown impressive results for the task of facial expression synthesis. The most successful architecture is StarGAN (Choi et al. in CVPR, 2018), that conditions GANs’ generation process with images of a specific domain, namely a set of images of people sharing the same expression. While effective, this approach can only generate a discrete number of expressions, determined by the content and granularity of the dataset. To address this limitation, in this paper, we introduce a novel GAN conditioning scheme based on action units (AU) annotations, which describes in a continuous manifold the anatomical facial movements defining a human expression. Our approach allows controlling the magnitude of activation of each AU and combining several of them. Additionally, we propose a weakly supervised strategy to train the model, that only requires images annotated with their activated AUs, and exploit a novel self-learned attention mechanism that makes our network robust to changing backgrounds, lighting conditions and occlusions. Extensive evaluation shows that our approach goes beyond competing conditional generators both in the capability to synthesize a much wider range of expressions ruled by anatomically feasible muscle movements, as in the capacity of dealing with images in the wild. The code of this work is publicly available at https://​github.​com/​albertpumarola/​GANimation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
The dataset was re-annotated with Baltrušaitis et al. (2015) to obtain continuous activation annotations.
 
Literatur
Zurück zum Zitat Baltrušaitis, T., Mahmoud, M., & Robinson, P. (2015). Cross-dataset learning and person-specific normalisation for automatic action unit detection. In FG. Baltrušaitis, T., Mahmoud, M., & Robinson, P. (2015). Cross-dataset learning and person-specific normalisation for automatic action unit detection. In FG.
Zurück zum Zitat Benitez-Quiroz, C. F., Srinivasan, R., & Martinez, A. M. (2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In CVPR. Benitez-Quiroz, C. F., Srinivasan, R., & Martinez, A. M. (2016). Emotionet: An accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In CVPR.
Zurück zum Zitat Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on computer graphics and interactive techniques (pp. 187–194). New York: ACM Press. Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on computer graphics and interactive techniques (pp. 187–194). New York: ACM Press.
Zurück zum Zitat Choi, Y., Choi, M., Kim, M., Ha, J. W., Kim, S., & Choo, J. (2018). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In CVPR. Choi, Y., Choi, M., Kim, M., Ha, J. W., Kim, S., & Choo, J. (2018). Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In CVPR.
Zurück zum Zitat Dolhansky, B., & Canton Ferrer, C. (2018). Eye in-painting with exemplar generative adversarial networks. In CVPR. Dolhansky, B., & Canton Ferrer, C. (2018). Eye in-painting with exemplar generative adversarial networks. In CVPR.
Zurück zum Zitat Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15), E1454–E1462.CrossRef Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15), E1454–E1462.CrossRef
Zurück zum Zitat Ekman, P., & Friesen, W. (1978). Facial action coding system: A technique for the measurement of facial movement. Palo Altom: Consulting Psychologists Press. Ekman, P., & Friesen, W. (1978). Facial action coding system: A technique for the measurement of facial movement. Palo Altom: Consulting Psychologists Press.
Zurück zum Zitat Fischler, M. A., & Elschlager, R. A. (1973). The representation and matching of pictorial structures. IEEE Transactions on Computers, 22(1), 67–92.CrossRef Fischler, M. A., & Elschlager, R. A. (1973). The representation and matching of pictorial structures. IEEE Transactions on Computers, 22(1), 67–92.CrossRef
Zurück zum Zitat Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In NIPS. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In NIPS.
Zurück zum Zitat Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of wasserstein GANs. In NIPS. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. C. (2017). Improved training of wasserstein GANs. In NIPS.
Zurück zum Zitat Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A.: Image-to-image translation with conditional adversarial networks. In CVPR (2017) Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A.: Image-to-image translation with conditional adversarial networks. In CVPR (2017)
Zurück zum Zitat Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In ECCV. Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In ECCV.
Zurück zum Zitat Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive growing of GANs for improved quality, stability, and variation. In ICLR. Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive growing of GANs for improved quality, stability, and variation. In ICLR.
Zurück zum Zitat Kim, T., Cha, M., Kim, H., Lee, J., & Kim, J. (2017). Learning to discover cross-domain relations with generative adversarial networks. In ICML. Kim, T., Cha, M., Kim, H., Lee, J., & Kim, J. (2017). Learning to discover cross-domain relations with generative adversarial networks. In ICML.
Zurück zum Zitat Kim, H., Garrido, P., Tewari, A., Xu, W., Thies, J., Nießner, N., et al. (2018). Deep video portraits. ACM Transactions on Graphics, 37, 163. Kim, H., Garrido, P., Tewari, A., Xu, W., Thies, J., Nießner, N., et al. (2018). Deep video portraits. ACM Transactions on Graphics, 37, 163.
Zurück zum Zitat Kingma, D., & Ba, J. (2015). ADAM: A method for stochastic optimization. In ICLR. Kingma, D., & Ba, J. (2015). ADAM: A method for stochastic optimization. In ICLR.
Zurück zum Zitat Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. In ICLR. Kingma, D. P., & Welling, M. (2014). Auto-encoding variational bayes. In ICLR.
Zurück zum Zitat Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D. H., Hawk, S. T., & Van Knippenberg, A. (2010). Presentation and validation of the radboud faces database. Cognition and Emotion, 24(8), 1377–1388.CrossRef Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D. H., Hawk, S. T., & Van Knippenberg, A. (2010). Presentation and validation of the radboud faces database. Cognition and Emotion, 24(8), 1377–1388.CrossRef
Zurück zum Zitat Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016). Autoencoding beyond pixels using a learned similarity metric. In ICML. Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016). Autoencoding beyond pixels using a learned similarity metric. In ICML.
Zurück zum Zitat Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In CVPR. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., et al. (2017). Photo-realistic single image super-resolution using a generative adversarial network. In CVPR.
Zurück zum Zitat Li, C., & Wand, M. (2016). Precomputed real-time texture synthesis with Markovian generative adversarial networks. In ECCV. Li, C., & Wand, M. (2016). Precomputed real-time texture synthesis with Markovian generative adversarial networks. In ECCV.
Zurück zum Zitat Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. In NIPS. Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. In NIPS.
Zurück zum Zitat Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In ICCV. Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In ICCV.
Zurück zum Zitat Mathieu, M., Couprie, C., & LeCun, Y. (2016). Deep multi-scale video prediction beyond mean square error. In ICLR. Mathieu, M., Couprie, C., & LeCun, Y. (2016). Deep multi-scale video prediction beyond mean square error. In ICLR.
Zurück zum Zitat Nagano, K., Seo, J., Xing, J., Wei, L., Li, Z., Saito, S., et al. (2018). paGAN: Real-time avatars using dynamic textures. ACM Transactions on Graphics, 37(6), 258:1–258:12.CrossRef Nagano, K., Seo, J., Xing, J., Wei, L., Li, Z., Saito, S., et al. (2018). paGAN: Real-time avatars using dynamic textures. ACM Transactions on Graphics, 37(6), 258:1–258:12.CrossRef
Zurück zum Zitat Nam, S., Ma, C., Chai, M., Brendel, W., Xu, N., & Joo Kim, S. (2019). End-to-end time-lapse video synthesis from a single outdoor image. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1409–1418). Nam, S., Ma, C., Chai, M., Brendel, W., Xu, N., & Joo Kim, S. (2019). End-to-end time-lapse video synthesis from a single outdoor image. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1409–1418).
Zurück zum Zitat Odena, A., Olah, C., & Shlens, J. (2017). Conditional image synthesis with auxiliary classifier GANs. In ICML. Odena, A., Olah, C., & Shlens, J. (2017). Conditional image synthesis with auxiliary classifier GANs. In ICML.
Zurück zum Zitat Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., & Efros, A. A. (2016). Context encoders: Feature learning by inpainting. In CVPR. Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., & Efros, A. A. (2016). Context encoders: Feature learning by inpainting. In CVPR.
Zurück zum Zitat Perarnau, G., van de Weijer, J., Raducanu, B., & Álvarez, J. M. (2016). Invertible conditional GANs for image editing. arXiv preprint arXiv:1611.06355 Perarnau, G., van de Weijer, J., Raducanu, B., & Álvarez, J. M. (2016). Invertible conditional GANs for image editing. arXiv preprint arXiv:​1611.​06355
Zurück zum Zitat Pumarola, A., Agudo, A., Martinez, A. A., Sanfeliu, A., & Moreno-Noguer, F. (2018). Ganimation: Anatomically-aware facial animation from a single image. In ECCV. Pumarola, A., Agudo, A., Martinez, A. A., Sanfeliu, A., & Moreno-Noguer, F. (2018). Ganimation: Anatomically-aware facial animation from a single image. In ECCV.
Zurück zum Zitat Pumarola, A., Agudo, A., Sanfeliu, A., & Moreno-Noguer, F. (2018) Unsupervised person image synthesis in arbitrary poses. In CVPR. Pumarola, A., Agudo, A., Sanfeliu, A., & Moreno-Noguer, F. (2018) Unsupervised person image synthesis in arbitrary poses. In CVPR.
Zurück zum Zitat Radford, A., Metz, L., & Chintala, S. (2016). Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICLR. Radford, A., Metz, L., & Chintala, S. (2016). Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICLR.
Zurück zum Zitat Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee., H. (2016). Generative adversarial text to image synthesis. In ICML. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee., H. (2016). Generative adversarial text to image synthesis. In ICML.
Zurück zum Zitat Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. In Advances in neural information processing systems. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. In Advances in neural information processing systems.
Zurück zum Zitat Scherer, K. R. (1982). Emotion as a process: Function, origin and regulation. Social Science Information, 21, 555–570.CrossRef Scherer, K. R. (1982). Emotion as a process: Function, origin and regulation. Social Science Information, 21, 555–570.CrossRef
Zurück zum Zitat Shen, W., & Liu, R. (2017). Learning residual images for face attribute manipulation. In CVPR. Shen, W., & Liu, R. (2017). Learning residual images for face attribute manipulation. In CVPR.
Zurück zum Zitat Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2017). Learning from simulated and unsupervised images through adversarial training. In CVPR. Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2017). Learning from simulated and unsupervised images through adversarial training. In CVPR.
Zurück zum Zitat Song, Y., Zhu, J., Wang, X., & Qi, H. (2018). Talking face generation by conditional recurrent adversarial network. arXiv preprint arXiv:1804.04786 Song, Y., Zhu, J., Wang, X., & Qi, H. (2018). Talking face generation by conditional recurrent adversarial network. arXiv preprint arXiv:​1804.​04786
Zurück zum Zitat Susskind, J. M., Hinton, G. E., Movellan, J. R., & Anderson, A. K. (2008). Generating facial expressions with deep belief nets. In Affective computing. IntechOpen. Susskind, J. M., Hinton, G. E., Movellan, J. R., & Anderson, A. K. (2008). Generating facial expressions with deep belief nets. In Affective computing. IntechOpen.
Zurück zum Zitat Suwajanakorn, S., Seitz, S. M., & Kemelmacher-Shlizerman, I. (2017). Synthesizing obama: Learning lip sync from audio. ACM Transactions on Graphics, 36(4), 95.CrossRef Suwajanakorn, S., Seitz, S. M., & Kemelmacher-Shlizerman, I. (2017). Synthesizing obama: Learning lip sync from audio. ACM Transactions on Graphics, 36(4), 95.CrossRef
Zurück zum Zitat Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., & Nießner, M. (2016). Face2face: Real-time face capture and reenactment of RGB videos. In CVPR. Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., & Nießner, M. (2016). Face2face: Real-time face capture and reenactment of RGB videos. In CVPR.
Zurück zum Zitat Tulyakov, S., Liu, M. Y., Yang, X., & Kautz, J. (2018). Mocogan: Decomposing motion and content for video generation. In CVPR. Tulyakov, S., Liu, M. Y., Yang, X., & Kautz, J. (2018). Mocogan: Decomposing motion and content for video generation. In CVPR.
Zurück zum Zitat Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Generating videos with scene dynamics. In Advances in neural information processing systems. Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Generating videos with scene dynamics. In Advances in neural information processing systems.
Zurück zum Zitat Vougioukas, K., Petridis, S., & Pantic, M. (2018). End-to-end speech-driven facial animation with temporal gans. In BMVC. Vougioukas, K., Petridis, S., & Pantic, M. (2018). End-to-end speech-driven facial animation with temporal gans. In BMVC.
Zurück zum Zitat Wang, X., & Gupta, A. (2016). Generative image modeling using style and structure adversarial networks. In ECCV. Wang, X., & Gupta, A. (2016). Generative image modeling using style and structure adversarial networks. In ECCV.
Zurück zum Zitat Wang, Z., Liu, D., Yang, J., Han, W., & Huang, T. (2015). Deep networks for image super-resolution with sparse prior. In ICCV. Wang, Z., Liu, D., Yang, J., Han, W., & Huang, T. (2015). Deep networks for image super-resolution with sparse prior. In ICCV.
Zurück zum Zitat Yu, H., Garrod, O. G., & Schyns, P. G. (2012). Perception-driven facial expression synthesis. Computers and Graphics, 36(3), 152–162.CrossRef Yu, H., Garrod, O. G., & Schyns, P. G. (2012). Perception-driven facial expression synthesis. Computers and Graphics, 36(3), 152–162.CrossRef
Zurück zum Zitat Zafeiriou, S., Trigeorgis, G., Chrysos, G., Deng, J., & Shen, J. (2017). The menpo facial landmark localisation challenge: A step towards the solution. In CVPRW. Zafeiriou, S., Trigeorgis, G., Chrysos, G., Deng, J., & Shen, J. (2017). The menpo facial landmark localisation challenge: A step towards the solution. In CVPRW.
Zurück zum Zitat Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., & Metaxas, D. (2017). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV. Zhang, H., Xu, T., Li, H., Zhang, S., Huang, X., Wang, X., & Metaxas, D. (2017). Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV.
Zurück zum Zitat Zhou, H., Liu, Y., Liu, Z., Luo, P., & Wang, X. (2019). Talking face generation by adversarially disentangled audio–visual representation. In AAAI. Zhou, H., Liu, Y., Liu, Z., Luo, P., & Wang, X. (2019). Talking face generation by adversarially disentangled audio–visual representation. In AAAI.
Zurück zum Zitat Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017a). Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV. Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017a). Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV.
Zurück zum Zitat Zhu, S., Fidler, S., Urtasun, R., Lin, D., & Loy, C. C. (2017b). Be your own prada: Fashion synthesis with structural coherence. In ICCV. Zhu, S., Fidler, S., Urtasun, R., Lin, D., & Loy, C. C. (2017b). Be your own prada: Fashion synthesis with structural coherence. In ICCV.
Metadaten
Titel
GANimation: One-Shot Anatomically Consistent Facial Animation
verfasst von
Albert Pumarola
Antonio Agudo
Aleix M. Martinez
Alberto Sanfeliu
Francesc Moreno-Noguer
Publikationsdatum
24.08.2019
Verlag
Springer US
Erschienen in
International Journal of Computer Vision / Ausgabe 3/2020
Print ISSN: 0920-5691
Elektronische ISSN: 1573-1405
DOI
https://doi.org/10.1007/s11263-019-01210-3

Weitere Artikel der Ausgabe 3/2020

International Journal of Computer Vision 3/2020 Zur Ausgabe

OriginalPaper

Group Normalization