skip to main content
research-article

StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows

Published:05 May 2021Publication History
Skip Abstract Section

Abstract

High-quality, diverse, and photorealistic images can now be generated by unconditional GANs (e.g., StyleGAN). However, limited options exist to control the generation process using (semantic) attributes while still

preserving the quality of the output. Further, due to the entangled nature of the GAN latent space, performing edits along one attribute can easily result in unwanted changes along other attributes. In this article, in the context of conditional exploration of entangled latent spaces, we investigate the two sub-problems of attribute-conditioned sampling and attribute-controlled editing. We present StyleFlow as a simple, effective, and robust solution to both the sub-problems by formulating conditional exploration as an instance of conditional continuous normalizing flows in the GAN latent space conditioned by attribute features. We evaluate our method using the face and the car latent space of StyleGAN, and demonstrate fine-grained disentangled edits along various attributes on both real photographs and StyleGAN generated images. For example, for faces, we vary camera pose, illumination variation, expression, facial hair, gender, and age. Finally, via extensive qualitative and quantitative comparisons, we demonstrate the superiority of StyleFlow over prior and several concurrent works. Project Page and Video: https://rameenabdal.github.io/StyleFlow.

Skip Supplemental Material Section

Supplemental Material

References

  1. Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2StyleGAN: How to embed images into the StyleGAN latent space? In Proceedings of the IEEE International Conference on Computer Vision. 4432–4441.Google ScholarGoogle ScholarCross RefCross Ref
  2. Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020. Image2StyleGAN++: How to edit the embedded images? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20).Google ScholarGoogle ScholarCross RefCross Ref
  3. Kfir Aberman, Mingyi Shi, Jing Liao, Dani Lischinski, Baoquan Chen, and Daniel Cohen-Or. 2019. Deep video-based performance cloning. In Computer Graphics Forum, Vol. 38. Wiley Online Library, 219–233.Google ScholarGoogle Scholar
  4. Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arxiv:cs.LG/1809.11096 (2018).Google ScholarGoogle Scholar
  5. Kaidi Cao, Jing Liao, and Lu Yuan. 2019. CariGANs. ACM Trans. Graph. 37, 6 (Jan 2019), 1–14. DOI:https://doi.org/10.1145/3272127.3275046Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David K. Duvenaud. 2018. Neural ordinary differential equations. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 6571–6583. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. DOI:https://doi.org/10.1109/cvpr.2018.00916Google ScholarGoogle ScholarCross RefCross Ref
  8. Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2019. StarGAN v2: Diverse Image Synthesis for Multiple Domains. arxiv:cs.CV/1912.01865 (2019).Google ScholarGoogle Scholar
  9. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’09).Google ScholarGoogle Scholar
  10. Ohad Fried, Ayush Tewari, Michael Zollhöfer, Adam Finkelstein, Eli Shechtman, Dan B. Goldman, Kyle Genova, Zeyu Jin, Christian Theobalt, and Maneesh Agrawala. 2019. Text-based editing of talking-head video. ACM Trans. Graph. 38, 4 (July 2019). DOI:https://doi.org/10.1145/3306346.3323028 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Adam Geitgey. 2020. GitHub—Face Recognition. Retrieved from: https://github.com/ageitgey/face_recognition.Google ScholarGoogle Scholar
  12. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 2672–2680. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Will Grathwohl, Ricky T. Q. Chen, Jesse Bettencourt, Ilya Sutskever, and David Duvenaud. 2018. FFJORD: Free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367 (2018).Google ScholarGoogle Scholar
  14. Éric Guérin, Julie Digne, Eric Galin, Adrien Peytavie, Christian Wolf, Bedrich Benes, and Benoît Martinez. 2017. Interactive example-based terrain authoring with conditional generative adversarial networks. ACM Trans. Graph. 36, 6 (2017), 228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Kaiwen Guo, Peter Lincoln, Philip Davidson, Jay Busch, Xueming Yu, Matt Whalen, Geoff Harvey, Sergio Orts-Escolano, Rohit Pandey, Jason Dourgarian et al. 2019. The relightables: Volumetric performance capture of humans with realistic relighting. ACM Trans. Graph. 38, 6 (2019), 1–19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering interpretable GAN controls. arXiv preprint arXiv:2004.02546 (2020).Google ScholarGoogle Scholar
  17. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.Google ScholarGoogle ScholarCross RefCross Ref
  18. Peter Hedman, Julien Philip, True Price, Jan-Michael Frahm, George Drettakis, and Gabriel Brostow. 2018. Deep blending for free-viewpoint image-based rendering. ACM Trans. Graph. 37, 6 (2018), 1–15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. IMPA-FACE3D. 2012. IMPA-FACE3D. Retrieved from: http://app.visgraf.impa.br/database/faces.Google ScholarGoogle Scholar
  20. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). DOI:https://doi.org/10.1109/cvpr.2017.632Google ScholarGoogle Scholar
  21. Wentao Jiang, Si Liu, Chen Gao, Jie Cao, Ran He, Jiashi Feng, and Shuicheng Yan. 2019. PSGAN: Pose and Expression Robust Spatial-Aware GAN for Customizable Makeup Transfer. arxiv:cs.CV/1909.06956 (2019).Google ScholarGoogle Scholar
  22. Youngjoo Jo and Jongyoul Park. 2019. SC-FEGAN: Face editing generative adversarial network with user’s sketch and color. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19).Google ScholarGoogle ScholarCross RefCross Ref
  23. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive Growing of GANs for Improved Quality, Stability, and Variation. arxiv:cs.NE/1710.10196 (2017).Google ScholarGoogle Scholar
  24. Tero Karras, Samuli Laine, and Timo Aila. 2019b. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). DOI:https://doi.org/10.1109/cvpr.2019.00453Google ScholarGoogle ScholarCross RefCross Ref
  25. Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2019a. Analyzing and Improving the Image Quality of StyleGAN. arxiv:cs.CV/1912.04958 (2019).Google ScholarGoogle Scholar
  26. Hyeongwoo Kim, Pablo Garrido, Ayush Tewari, Weipeng Xu, Justus Thies, Matthias Nießner, Patrick Pérez, Christian Richardt, Michael Zollhöfer, and Christian Theobalt. 2018. Deep video portraits. ACM Trans. Graph. 37, 4 (2018), 1–14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. 2017. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning. JMLR.org, 1857–1865. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Diederik P. Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes. arxiv:stat.ML/1312.6114 (2013).Google ScholarGoogle Scholar
  29. Ivan Kobyzev, Simon Prince, and Marcus Brubaker. 2020. Normalizing flows: An introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. (2020). Retrieved from: https://arxiv.org/abs/1908.09257.Google ScholarGoogle Scholar
  30. Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3D object representations for fine-grained categorization. In Proceedings of the 4th International IEEE Workshop on 3D Representation and Recognition (3DRR’13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Cheng-Han Lee, Ziwei Liu, Lingyun Wu, and Ping Luo. 2019. MaskGAN: Towards Diverse and Interactive Facial Image Manipulation. arxiv:cs.CV/1907.11922 (2019).Google ScholarGoogle Scholar
  32. Ricardo Martin-Brualla, Rohit Pandey, Shuoran Yang, Pavel Pidlypenskyi, Jonathan Taylor, Julien Valentin, Sameh Khamis, Philip Davidson, Anastasia Tkach, Peter Lincoln et al. 2018. LookinGood: Enhancing performance capture with real-time neural re-rendering. arXiv preprint arXiv:1811.05029 (2018). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Microsoft. 2020. Azure Face. Retrieved from: https://azure.microsoft.com/en-in/services/cognitive-services/face/.Google ScholarGoogle Scholar
  34. Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. ArXiv abs/1411.1784 (2014).Google ScholarGoogle Scholar
  35. Weili Nie, Tero Karras, Animesh Garg, Shoubhik Debnath, Anjul Patney, Ankit B. Patel, and Anima Anandkumar. 2020. Semi-Supervised StyleGAN for Disentanglement Learning. arxiv:cs.CV/2003.03461 (2020).Google ScholarGoogle Scholar
  36. Yotam Nitzan, Amit Bermano, Yangyan Li, and Daniel Cohen-Or. 2020. Disentangling in Latent Space by Harnessing a Pretrained Generator. arxiv:cs.CV/2005.07728 (2020).Google ScholarGoogle Scholar
  37. Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  38. Lev Semenovich Pontryagin. 2018. Mathematical Theory of Optimal Processes. Routledge.Google ScholarGoogle Scholar
  39. Tiziano Portenier, Qiyang Hu, Attila Szabó, Siavash Arjomand Bigdeli, Paolo Favaro, and Matthias Zwicker. 2018. FaceShop: Deep sketch-based face image editing. ACM Trans. Graph. 37, 4 (July 2018). DOI:https://doi.org/10.1145/3197517.3201393 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the International Conference on Learning Representations (ICLR’16).Google ScholarGoogle Scholar
  41. Danilo Jimenez Rezende and Shakir Mohamed. 2015. Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770 (2015).Google ScholarGoogle Scholar
  42. Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2020. Encoding in style: A StyleGAN encoder for image-to-image translation. arXiv preprint arXiv:2008.00951 (2020).Google ScholarGoogle Scholar
  43. Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2019. Interpreting the Latent Space of GANs for Semantic Face Editing. arxiv:cs.CV/1907.10786 (2019).Google ScholarGoogle Scholar
  44. Aliaksandra Shysheya, Egor Zakharov, Kara-Ali Aliev, Renat Bashirov, Egor Burkov, Karim Iskakov, Aleksei Ivakhnenko, Yury Malkov, Igor Pasechnik, Dmitry Ulyanov et al. 2019. Textured neural avatars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2387–2397.Google ScholarGoogle ScholarCross RefCross Ref
  45. Aliaksandr Siarohin, Stéphane Lathuilière,, Sergey Tulyakov, Elisa Ricci, and Nicu Sebe. 2020. First Order Motion Model for Image Animation. arxiv:cs.CV/2003.00196 (2020).Google ScholarGoogle Scholar
  46. Vincent Sitzmann, Justus Thies, Felix Heide, Matthias Nießner, Gordon Wetzstein, and Michael Zollhofer. 2019. DeepVoxels: Learning persistent 3D feature embeddings. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2437–2446.Google ScholarGoogle ScholarCross RefCross Ref
  47. Spectrico. 2020. Spectrico. Retrieved from: http://spectrico.com/.Google ScholarGoogle Scholar
  48. Zhentao Tan, Menglei Chai, Dongdong Chen, Jing Liao, Qi Chu, Lu Yuan, Sergey Tulyakov, and Nenghai Yu. 2020. MichiGAN: Multi-input-conditioned hair image generation for portrait editing. ACM Trans. Graph. 38, 4 (July 2020).Google ScholarGoogle Scholar
  49. Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhöfer, and Christian Theobalt. 2020a. StyleRig: Rigging StyleGAN for 3D control over portrait images. arXiv preprint arXiv:2004.00121 (2020).Google ScholarGoogle Scholar
  50. A. Tewari, O. Fried, J. Thies, V. Sitzmann, S. Lombardi, K. Sunkavalli, R. Martin-Brualla, T. Simon, J. Saragih, M. Nießner, R. Pandey, S. Fanello, G. Wetzstein, J.-Y. Zhu, C. Theobalt, M. Agrawala, E. Shechtman, D. B Goldman, and M. Zollhöfer. 2020b. State of the art on neural rendering. Comput. Graph. Forum (EG STAR 2020) (2020). Retrieved from: https://arxiv.org/abs/2004.03805.Google ScholarGoogle Scholar
  51. Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Trans. Graph. 38, 4 (2019), 1–12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Justus Thies, Michael Zollhöfer, Christian Theobalt, Marc Stamminger, and Matthias Nießner. 2020. Image-guided neural object rendering. In Proceedings of the International Conference on Learning Representations. Retrieved from: https://openreview. net/forum.Google ScholarGoogle Scholar
  53. Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. 2016. Conditional Image Generation with PixelCNN Decoders. arxiv:cs.CV/1606.05328 (2016).Google ScholarGoogle Scholar
  54. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  55. Taihong Xiao, Jiapeng Hong, and Jinwen Ma. 2018. ELEGANT: Exchanging latent encodings with GAN for transferring multiple face attributes. In Proceedings of the European Conference on Computer Vision (ECCV’18). 172–187.Google ScholarGoogle ScholarCross RefCross Ref
  56. Zexiang Xu, Kalyan Sunkavalli, Sunil Hadap, and Ravi Ramamoorthi. 2018. Deep image-based relighting from optimal sparse samples. ACM Trans. Graph. 37, 4 (2018), 1–13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Guandao Yang, Xun Huang, Zekun Hao, Ming-Yu Liu, Serge Belongie, and Bharath Hariharan. 2019. PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows. Retrieved from: https://arxiv.org/abs/1906.12320.Google ScholarGoogle Scholar
  58. Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017. DualGAN: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision. 2849–2857.Google ScholarGoogle ScholarCross RefCross Ref
  59. Fisher Yu, Yinda Zhang, Shuran Song, Ari Seff, and Jianxiong Xiao. 2015. LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).Google ScholarGoogle Scholar
  60. Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Free-form image inpainting with gated convolution. arXiv preprint arXiv:1806.03589 (2018).Google ScholarGoogle Scholar
  61. Egor Zakharov, Aliaksandra Shysheya, Egor Burkov, and Victor Lempitsky. 2019. Few-shot adversarial learning of realistic neural talking head models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV’19). DOI:https://doi.org/10.1109/iccv.2019.00955Google ScholarGoogle ScholarCross RefCross Ref
  62. Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  63. Hao Zhou, Sunil Hadap, Kalyan Sunkavalli, and David W. Jacobs. 2019. Deep single portrait image relighting. In Proceedings of the International Conference on Computer Vision (ICCV’19).Google ScholarGoogle Scholar
  64. Jiapeng Zhu, Yujun Shen, Deli Zhao, and Bolei Zhou. 2020a. In-Domain GAN Inversion for Real Image Editing. arxiv:cs.CV/2004.00049 (2020).Google ScholarGoogle Scholar
  65. Jiapeng Zhu, Deli Zhao, Bo Zhang, and Bolei Zhou. 2020b. Disentangled Inference for GANs with Latently Invertible Autoencoder. arxiv:cs.LG/1906.08090 (2020).Google ScholarGoogle Scholar
  66. Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A. Efros. 2016. Generative Visual Manipulation on the Natural Image Manifold. arxiv:cs.CV/1609.03552 (2016).Google ScholarGoogle Scholar
  67. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017a. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223–2232.Google ScholarGoogle Scholar
  68. Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A. Efros, Oliver Wang, and Eli Shechtman. 2017b. Toward multimodal image-to-image translation. In Proceedings of the Conference on Advances in Neural Information Processing Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. SEAN: Image synthesis with semantic region-adaptive normalization. arXiv preprint arXiv:1911.12861 (2019).Google ScholarGoogle Scholar

Index Terms

  1. StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 40, Issue 3
      June 2021
      264 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/3463476
      Issue’s Table of Contents

      Copyright © 2021 Association for Computing Machinery.

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 May 2021
      • Accepted: 1 January 2021
      • Revised: 1 December 2020
      • Received: 1 September 2020
      Published in tog Volume 40, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format