Skip to main content
main-content
Top

Hint

Swipe to navigate through the chapters of this book

2021 | OriginalPaper | Chapter

Network Bending: Expressive Manipulation of Deep Generative Models

Authors: Terence Broad, Frederic Fol Leymarie, Mick Grierson

Published in: Artificial Intelligence in Music, Sound, Art and Design

Publisher: Springer International Publishing

share
SHARE

Abstract

We introduce a new framework for manipulating and interacting with deep generative models that we call network bending. We present a comprehensive set of deterministic transformations that can be inserted as distinct layers into the computational graph of a trained generative neural network and applied during inference. In addition, we present a novel algorithm for analysing the deep generative model and clustering features based on their spatial activation maps. This allows features to be grouped together based on spatial similarity in an unsupervised fashion. This results in the meaningful manipulation of sets of features that correspond to the generation of a broad array of semantically significant features of the generated images. We outline this framework, demonstrating our results on state-of-the-art deep generative models trained on several image datasets. We show how it allows for the direct manipulation of semantically meaningful aspects of the generative process as well as allowing for a broad range of expressive outcomes.
Footnotes
1
Our implementation and the datasets we have used for training the clustering models are publicly available and can be found at: https://​github.​com/​terrybroad/​network-bending.
 
Literature
1.
go back to reference Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: how to embed images into the StyleGAN latent space? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4432–4441 (2019) Abdal, R., Qin, Y., Wonka, P.: Image2StyleGAN: how to embed images into the StyleGAN latent space? In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4432–4441 (2019)
3.
go back to reference Bau, D., et al.: Semantic photo manipulation with a generative image prior. ACM Trans. Graph. (TOG) 38(4), 1–11 (2019) CrossRef Bau, D., et al.: Semantic photo manipulation with a generative image prior. ACM Trans. Graph. (TOG) 38(4), 1–11 (2019) CrossRef
4.
go back to reference Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations. In: Proceedings of the IEEE Conference on Computer Vsion and Pattern Recognition, pp. 6541–6549. openaccess.thecvf.com (2017) Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations. In: Proceedings of the IEEE Conference on Computer Vsion and Pattern Recognition, pp. 6541–6549. openaccess.thecvf.com (2017)
5.
go back to reference Bau, D., et al.: GAN dissection: visualizing and understanding generative adversarial networks. In: International Conference on Learning Representations (November 2018) Bau, D., et al.: GAN dissection: visualizing and understanding generative adversarial networks. In: International Conference on Learning Representations (November 2018)
6.
go back to reference Ben-Kiki, O., Evans, C., Ingerson, B.: YAML ain’t markup language (YAML \(^\text{TM}\)) version 1.1. Working Draft 2008–05 11 (2009) Ben-Kiki, O., Evans, C., Ingerson, B.: YAML ain’t markup language (YAML \(^\text{TM}\)) version 1.1. Working Draft 2008–05 11 (2009)
7.
go back to reference Berns, S., Colton, S.: Bridging generative deep learning and computational creativity. In: Proceedings of the 11th International Conference on Computational Creativity (2020) Berns, S., Colton, S.: Bridging generative deep learning and computational creativity. In: Proceedings of the 11th International Conference on Computational Creativity (2020)
8.
go back to reference Bontrager, P., Roy, A., Togelius, J., Memon, N., Ross, A.: DeepMasterPrints: generating MasterPrints for dictionary attacks via latent variable evolution. In: 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp. 1–9. IEEE (2018) Bontrager, P., Roy, A., Togelius, J., Memon, N., Ross, A.: DeepMasterPrints: generating MasterPrints for dictionary attacks via latent variable evolution. In: 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), pp. 1–9. IEEE (2018)
9.
go back to reference Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing. NATO ASI Series (Series F: Computer and Systems Sciences), vol. 68. Springer, Heidelberg (1990). https://​doi.​org/​10.​1007/​978-3-642-76153-9_​28 Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing. NATO ASI Series (Series F: Computer and Systems Sciences), vol. 68. Springer, Heidelberg (1990). https://​doi.​org/​10.​1007/​978-3-642-76153-9_​28
10.
go back to reference Brink, P.: Dissection of a generative network for music composition. Master’s thesis (2019) Brink, P.: Dissection of a generative network for music composition. Master’s thesis (2019)
11.
go back to reference Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations (2019) Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations (2019)
12.
go back to reference Brouwer, H.: Audio-reactive latent interpolations with StyleGAN. In: NeurIPS 2020 Workshop on Machine Learning for Creativity and Design (2020) Brouwer, H.: Audio-reactive latent interpolations with StyleGAN. In: NeurIPS 2020 Workshop on Machine Learning for Creativity and Design (2020)
13.
go back to reference Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40(1), 200–210 (2013) CrossRef Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40(1), 200–210 (2013) CrossRef
14.
go back to reference Dhariwal, P., Jun, H., Payne, C., Kim, J.W., Radford, A., Sutskever, I.: Jukebox: a generative model for music. arXiv preprint arXiv:​2005.​00341 (2020) Dhariwal, P., Jun, H., Payne, C., Kim, J.W., Radford, A., Sutskever, I.: Jukebox: a generative model for music. arXiv preprint arXiv:​2005.​00341 (2020)
15.
go back to reference Dobrian, C., Koppelman, D.: The ‘E’ in NIME: musical expression with new computer interfaces. In: NIME, vol. 6, pp. 277–282 (2006) Dobrian, C., Koppelman, D.: The ‘E’ in NIME: musical expression with new computer interfaces. In: NIME, vol. 6, pp. 277–282 (2006)
16.
go back to reference Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 766–774 (2014) Dosovitskiy, A., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 766–774 (2014)
18.
go back to reference Forgy, E.W.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21, 768–769 (1965) Forgy, E.W.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21, 768–769 (1965)
19.
go back to reference Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014) Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
20.
go back to reference Grézl, F., Karafiát, M., Kontár, S., Cernocky, J.: Probabilistic and bottle-neck features for LVCSR of meetings. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP 2007, vol. 4, pp. IV-757. IEEE (2007) Grézl, F., Karafiát, M., Kontár, S., Cernocky, J.: Probabilistic and bottle-neck features for LVCSR of meetings. In: 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP 2007, vol. 4, pp. IV-757. IEEE (2007)
21.
22.
go back to reference Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017) Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1125–1134 (2017)
23.
go back to reference Jacobs, J., Gogia, S., Mĕch, R., Brandt, J.R.: Supporting expressive procedural art creation through direct manipulation. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 6330–6341 (2017) Jacobs, J., Gogia, S., Mĕch, R., Brandt, J.R.: Supporting expressive procedural art creation through direct manipulation. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 6330–6341 (2017)
24.
go back to reference Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations (2017) Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations (2017)
25.
go back to reference Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019) Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401–4410 (2019)
26.
go back to reference Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. arXiv preprint arXiv:​1912.​04958 (2019) Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. arXiv preprint arXiv:​1912.​04958 (2019)
28.
go back to reference Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (2013) Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: International Conference on Learning Representations (2013)
30.
go back to reference McCallum, L., Yee-King, M.: Network bending neural vocoders. In: NeurIPS 2020 Workshop on Machine Learning for Creativity and Design (2020) McCallum, L., Yee-King, M.: Network bending neural vocoders. In: NeurIPS 2020 Workshop on Machine Learning for Creativity and Design (2020)
32.
go back to reference Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019) Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)
33.
go back to reference Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8024–8035 (2019) Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8024–8035 (2019)
34.
go back to reference Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference on Learning Representations (2016) Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference on Learning Representations (2016)
35.
go back to reference Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31st International Conference on Machine Learning (2014) Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31st International Conference on Machine Learning (2014)
36.
go back to reference Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9243–9252 (2020) Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9243–9252 (2020)
38.
go back to reference Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:​1312.​6034 (2013) Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:​1312.​6034 (2013)
40.
go back to reference Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:​1506.​03365 (2015) Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:​1506.​03365 (2015)
42.
go back to reference Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018) Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
43.
go back to reference Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5122–5130 (2017) Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5122–5130 (2017)
44.
go back to reference Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. In: International Conference on Learning Representations (2015) Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. In: International Conference on Learning Representations (2015)
Metadata
Title
Network Bending: Expressive Manipulation of Deep Generative Models
Authors
Terence Broad
Frederic Fol Leymarie
Mick Grierson
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-72914-1_2