research-article

StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows

Authors:
Rameen Abdal

KAUST

KAUST
View Profile

,
Peihao Zhu

KAUST

KAUST

0000-0002-7122-1551
View Profile

,
Niloy J. Mitra

University College London and Adobe Research

University College London and Adobe Research
View Profile

,
Peter Wonka

KAUST

KAUST
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 40 Issue 3Article No.: 21pp 1–21https://doi.org/10.1145/3447648

Published:05 May 2021Publication History

ACM Transactions on Graphics

Abstract

High-quality, diverse, and photorealistic images can now be generated by unconditional GANs (e.g., StyleGAN). However, limited options exist to control the generation process using (semantic) attributes while still

preserving the quality of the output. Further, due to the entangled nature of the GAN latent space, performing edits along one attribute can easily result in unwanted changes along other attributes. In this article, in the context of conditional exploration of entangled latent spaces, we investigate the two sub-problems of attribute-conditioned sampling and attribute-controlled editing. We present StyleFlow as a simple, effective, and robust solution to both the sub-problems by formulating conditional exploration as an instance of conditional continuous normalizing flows in the GAN latent space conditioned by attribute features. We evaluate our method using the face and the car latent space of StyleGAN, and demonstrate fine-grained disentangled edits along various attributes on both real photographs and StyleGAN generated images. For example, for faces, we vary camera pose, illumination variation, expression, facial hair, gender, and age. Finally, via extensive qualitative and quantitative comparisons, we demonstrate the superiority of StyleFlow over prior and several concurrent works. Project Page and Video: https://rameenabdal.github.io/StyleFlow.

Supplemental Material

Available for Download

zip

abdal.zip (9.3 MB)

Supplemental movie, appendix, image and software files for, StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows

References

Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2StyleGAN: How to embed images into the StyleGAN latent space? In Proceedings of the IEEE International Conference on Computer Vision. 4432–4441.Google ScholarCross Ref
Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. Image2StyleGAN++: How to edit the embedded images? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20).Google ScholarCross Ref
Kfir Aberman, Mingyi Shi, Jing Liao, Dani Lischinski, Baoquan Chen, and Daniel Cohen-Or. 2019. Deep video-based performance cloning. In Computer Graphics Forum, Vol. 38. Wiley Online Library, 219–233.Google Scholar
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arxiv:cs.LG/1809.11096 (2018).Google Scholar
Kaidi Cao, Jing Liao, and Lu Yuan. 2019. CariGANs. ACM Trans. Graph. 37, 6 (Jan 2019), 1–14. DOI:https://doi.org/10.1145/3272127.3275046Google ScholarDigital Library
Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K. Duvenaud. 2018. Neural ordinary differential equations. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 6571–6583. Google ScholarDigital Library
Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. DOI:https://doi.org/10.1109/cvpr.2018.00916Google ScholarCross Ref
Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2019. StarGAN v2: Diverse Image Synthesis for Multiple Domains. arxiv:cs.CV/1912.01865 (2019).Google Scholar
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’09).Google Scholar
Ohad Fried, Ayush Tewari, Michael Zollhöfer, Adam Finkelstein, Eli Shechtman, Dan B. Goldman, Kyle Genova, Zeyu Jin, Christian Theobalt, and Maneesh Agrawala. 2019. Text-based editing of talking-head video. ACM Trans. Graph. 38, 4 (July 2019). DOI:https://doi.org/10.1145/3306346.3323028 Google ScholarDigital Library
Adam Geitgey. 2020. GitHub—Face Recognition. Retrieved from: https://github.com/ageitgey/face_recognition.Google Scholar
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2672–2680. Google ScholarDigital Library
Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. 2018. FFJORD: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367 (2018).Google Scholar
Éric Guérin, Julie Digne, Eric Galin, Adrien Peytavie, Christian Wolf, Bedrich Benes, and Benoît Martinez. 2017. Interactive example-based terrain authoring with conditional generative adversarial networks. ACM Trans. Graph. 36, 6 (2017), 228. Google ScholarDigital Library
Kaiwen Guo, Peter Lincoln, Philip Davidson, Jay Busch, Xueming Yu, Matt Whalen, Geoff Harvey, Sergio Orts-Escolano, Rohit Pandey, Jason Dourgarian et al. 2019. The relightables: Volumetric performance capture of humans with realistic relighting. ACM Trans. Graph. 38, 6 (2019), 1–19. Google ScholarDigital Library
Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering interpretable GAN controls. arXiv preprint arXiv:2004.02546 (2020).Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarCross Ref
Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, George Drettakis, and Gabriel Brostow. 2018. Deep blending for free-viewpoint image-based rendering. ACM Trans. Graph. 37, 6 (2018), 1–15. Google ScholarDigital Library
IMPA-FACE3D. 2012. IMPA-FACE3D. Retrieved from: http://app.visgraf.impa.br/database/faces.Google Scholar
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). DOI:https://doi.org/10.1109/cvpr.2017.632Google Scholar
Wentao Jiang, Si Liu, Chen Gao, Jie Cao, Ran He, Jiashi Feng, and Shuicheng Yan. 2019. PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer. arxiv:cs.CV/1909.06956 (2019).Google Scholar
Youngjoo Jo and Jongyoul Park. 2019. SC-FEGAN: Face editing generative adversarial network with user’s sketch and color. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19).Google ScholarCross Ref
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arxiv:cs.NE/1710.10196 (2017).Google Scholar
Tero Karras, Samuli Laine, and Timo Aila. 2019b. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). DOI:https://doi.org/10.1109/cvpr.2019.00453Google ScholarCross Ref
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2019a. Analyzing and Improving the Image Quality of StyleGAN. arxiv:cs.CV/1912.04958 (2019).Google Scholar
Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Nießner, Patrick Pérez, Christian Richardt, Michael Zollhöfer, and Christian Theobalt. 2018. Deep video portraits. ACM Trans. Graph. 37, 4 (2018), 1–14. Google ScholarDigital Library
Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. 2017. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning. JMLR.org, 1857–1865. Google ScholarDigital Library
Diederik P. Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes. arxiv:stat.ML/1312.6114 (2013).Google Scholar
Ivan Kobyzev, Simon Prince, and Marcus Brubaker. 2020. Normalizing flows: An introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. (2020). Retrieved from: https://arxiv.org/abs/1908.09257.Google Scholar
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3D object representations for fine-grained categorization. In Proceedings of the 4th International IEEE Workshop on 3D Representation and Recognition (3DRR’13).Google ScholarDigital Library
Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. 2019. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. arxiv:cs.CV/1907.11922 (2019).Google Scholar
Ricardo Martin-Brualla, Rohit Pandey, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Julien Valentin, Sameh Khamis, Philip Davidson, Anastasia Tkach, Peter Lincoln et al. 2018. LookinGood: Enhancing performance capture with real-time neural re-rendering. arXiv preprint arXiv:1811.05029 (2018). Google ScholarDigital Library
Microsoft. 2020. Azure Face. Retrieved from: https://azure.microsoft.com/en-in/services/cognitive-services/face/.Google Scholar
Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. ArXiv abs/1411.1784 (2014).Google Scholar
Weili Nie, Tero Karras, Animesh Garg, Shoubhik Debnath, Anjul Patney, Ankit B. Patel, and Anima Anandkumar. 2020. Semi-Supervised StyleGAN for Disentanglement Learning. arxiv:cs.CV/2003.03461 (2020).Google Scholar
Yotam Nitzan, Amit Bermano, Yangyan Li, and Daniel Cohen-Or. 2020. Disentangling in Latent Space by Harnessing a Pretrained Generator. arxiv:cs.CV/2005.07728 (2020).Google Scholar
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
Lev Semenovich Pontryagin. 2018. Mathematical Theory of Optimal Processes. Routledge.Google Scholar
Tiziano Portenier, Qiyang Hu, Attila Szabó, Siavash Arjomand Bigdeli, Paolo Favaro, and Matthias Zwicker. 2018. FaceShop: Deep sketch-based face image editing. ACM Trans. Graph. 37, 4 (July 2018). DOI:https://doi.org/10.1145/3197517.3201393 Google ScholarDigital Library
Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google Scholar
Danilo Jimenez Rezende and Shakir Mohamed. 2015. Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770 (2015).Google Scholar
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2020. Encoding in style: A StyleGAN encoder for image-to-image translation. arXiv preprint arXiv:2008.00951 (2020).Google Scholar
Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2019. Interpreting the Latent Space of GANs for Semantic Face Editing. arxiv:cs.CV/1907.10786 (2019).Google Scholar
Aliaksandra Shysheya, Egor Zakharov, Kara-Ali Aliev, Renat Bashirov, Egor Burkov, Karim Iskakov, Aleksei Ivakhnenko, Yury Malkov, Igor Pasechnik, Dmitry Ulyanov et al. 2019. Textured neural avatars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2387–2397.Google ScholarCross Ref
Aliaksandr Siarohin, Stéphane Lathuilière,, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2020. First Order Motion Model for Image Animation. arxiv:cs.CV/2003.00196 (2020).Google Scholar
Vincent Sitzmann, Justus Thies, Felix Heide, Matthias Nießner, Gordon Wetzstein, and Michael Zollhofer. 2019. DeepVoxels: Learning persistent 3D feature embeddings. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2437–2446.Google ScholarCross Ref
Spectrico. 2020. Spectrico. Retrieved from: http://spectrico.com/.Google Scholar
Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, and Nenghai Yu. 2020. MichiGAN: Multi-input-conditioned hair image generation for portrait editing. ACM Trans. Graph. 38, 4 (July 2020).Google Scholar
Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhöfer, and Christian Theobalt. 2020a. StyleRig: Rigging StyleGAN for 3D control over portrait images. arXiv preprint arXiv:2004.00121 (2020).Google Scholar
A. Tewari, O. Fried, J. Thies, V. Sitzmann, S. Lombardi, K. Sunkavalli, R. Martin-Brualla, T. Simon, J. Saragih, M. Nießner, R. Pandey, S. Fanello, G. Wetzstein, J.-Y. Zhu, C. Theobalt, M. Agrawala, E. Shechtman, D. B Goldman, and M. Zollhöfer. 2020b. State of the art on neural rendering. Comput. Graph. Forum (EG STAR 2020) (2020). Retrieved from: https://arxiv.org/abs/2004.03805.Google Scholar
Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Trans. Graph. 38, 4 (2019), 1–12. Google ScholarDigital Library
Justus Thies, Michael Zollhöfer, Christian Theobalt, Marc Stamminger, and Matthias Nießner. 2020. Image-guided neural object rendering. In Proceedings of the International Conference on Learning Representations. Retrieved from: https://openreview. net/forum.Google Scholar
Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. 2016. Conditional Image Generation with PixelCNN Decoders. arxiv:cs.CV/1606.05328 (2016).Google Scholar
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
Taihong Xiao, Jiapeng Hong, and Jinwen Ma. 2018. ELEGANT: Exchanging latent encodings with GAN for transferring multiple face attributes. In Proceedings of the European Conference on Computer Vision (ECCV’18). 172–187.Google ScholarCross Ref
Zexiang Xu, Kalyan Sunkavalli, Sunil Hadap, and Ravi Ramamoorthi. 2018. Deep image-based relighting from optimal sparse samples. ACM Trans. Graph. 37, 4 (2018), 1–13. Google ScholarDigital Library
Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge Belongie, and Bharath Hariharan. 2019. PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows. Retrieved from: https://arxiv.org/abs/1906.12320.Google Scholar
Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017. DualGAN: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision. 2849–2857.Google ScholarCross Ref
Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. 2015. LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).Google Scholar
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Free-form image inpainting with gated convolution. arXiv preprint arXiv:1806.03589 (2018).Google Scholar
Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, and Victor Lempitsky. 2019. Few-shot adversarial learning of realistic neural talking head models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19). DOI:https://doi.org/10.1109/iccv.2019.00955Google ScholarCross Ref
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
Hao Zhou, Sunil Hadap, Kalyan Sunkavalli, and David W. Jacobs. 2019. Deep single portrait image relighting. In Proceedings of the International Conference on Computer Vision (ICCV’19).Google Scholar
Jiapeng Zhu, Yujun Shen, Deli Zhao, and Bolei Zhou. 2020a. In-Domain GAN Inversion for Real Image Editing. arxiv:cs.CV/2004.00049 (2020).Google Scholar
Jiapeng Zhu, Deli Zhao, Bo Zhang, and Bolei Zhou. 2020b. Disentangled Inference for GANs with Latently Invertible Autoencoder. arxiv:cs.LG/1906.08090 (2020).Google Scholar
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A. Efros. 2016. Generative Visual Manipulation on the Natural Image Manifold. arxiv:cs.CV/1609.03552 (2016).Google Scholar
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017a. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223–2232.Google Scholar
Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, and Eli Shechtman. 2017b. Toward multimodal image-to-image translation. In Proceedings of the Conference on Advances in Neural Information Processing Systems. Google ScholarDigital Library
Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. SEAN: Image synthesis with semantic region-adaptive normalization. arXiv preprint arXiv:1911.12861 (2019).Google Scholar

Index Terms

StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation

Recommendations

Designing an encoder for StyleGAN image manipulation

Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the ...
Read More
Attribute-Guided Face Generation Using Conditional CycleGAN
Computer Vision – ECCV 2018
Abstract
We are interested in attribute-guided face generation: given a low-res face input image, an attribute vector that can be extracted from a high-res image (attribute image), our new method generates a high-res face image for the low-res input that ...
Read More
StyleDisentangle: Disentangled Image Editing Based on StyleGAN2
PRICAI 2023: Trends in Artificial Intelligence
Abstract
Thanks to the development of Generative Adversarial Networks (GANs), StyleGAN2 can generate highly realistic images by inputting a latent code and then editing them in the latent space. Disentangled image editing is crucial, where the goal is to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Graphics Volume 40, Issue 3
June 2021
264 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3463476
Editor:
Marc Alexa
TU Berlin, Germany
Issue’s Table of Contents
Copyright © 2021 Association for Computing Machinery.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 May 2021
- Accepted: 1 January 2021
- Revised: 1 December 2020
- Received: 1 September 2020
Published in tog Volume 40, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Generative adversarial networks
image editing
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 141
  Total Citations
  View Citations
- 3,626
  Total Downloads
- Downloads (Last 12 months)591
- Downloads (Last 6 weeks)75
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Designing an encoder for StyleGAN image manipulation

Attribute-Guided Face Generation Using Conditional CycleGAN

StyleDisentangle: Disentangled Image Editing Based on StyleGAN2