Top

Published in:

2023 | OriginalPaper | Chapter

Diffusion-Adapter: Text Guided Image Manipulation with Frozen Diffusion Models

Authors : Rongting Wei, Chunxiao Fan, Yuexin Wu

Published in: Artificial Neural Networks and Machine Learning – ICANN 2023

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Research on vision-language models has seen rapid development, enabling natural language-based processing for image generation and manipulation. Existing text-driven image manipulation is typically implemented by GAN inversion or fine-tuning diffusion models. The former is limited by the inversion capability of GANs, which fail to reconstruct pictures with novel poses and perspectives. The latter methods require expensive optimization for each input, and fine-tuning is still a complex process. To mitigate these problems, we propose a novel approach, dubbed Diffusion-Adapter, which performs text-driven image manipulation using frozen pre-trained diffusion models. In this work, we design an Adapter architecture to modify the target attributes without fine-tuning the pre-trained models. Our approach can be applied to diffusion models in any domain and only take a few examples to train the Adapter that could successfully edit images from unknown data. Compared with previous work, Diffusion-Adapter preserves a maximal amount of details from the original image without unintended changes to the input content. Extensive experiments demonstrate the advantages of our approach over competing baselines, and we make a novel attempt at text-driven image manipulation.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Cross Classroom Domain Adaptive Object Detector for Student’s Heads

next chapter DWA: Differential Wavelet Amplifier for Image Super-Resolution

Avrahami, O., Lischinski, D., Fried, O.: Blended diffusion for text-driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18208–18218 (2022)

Choi, Y., Uh, Y., Yoo, J., Ha, J.-W.: Stargan v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8188–8197 (2020)

Crowson, K., Biderman, S., Kornis, D., Stander, D., Hallahan.: Vqgan-clip: open domain image generation and editing with natural language guidance. In: Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October, 2022, Proceedings, Part XXXVII, pp. 88–105. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19836-6_6

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. In: Advances in Neural Information Processing Systems 34, pp. 8780–8794 (2021)

Dong, H., Yu, S., Wu, C., Guo, Y.: Semantic image synthesis via adversarial learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5706–5714 (2017)

Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022)

Gal, R., Patashnik, O., Maron, H., Bermano, A.H., Chechik. Stylegan-nada: clip-guided domain adaptation of image generators. ACM Trans. Graph. (TOG) 41(4), 1–13 (2022)

Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840–6851 (2020)

10.

Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017)

11.

Kawar, B., Zada, S., Lang, O., Tov, O., Chang, H.: Imagic: text-based real image editing with diffusion models. arXiv preprint arXiv:2210.09276 (2022)

12.

Kim, G., Kwon, T., Ye, J.G.: Diffusionclip: text-guided diffusion models for robust image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2426–2435 (2022)

13.

Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)

14.

Liu, X., Gong, C., Wu, L., Zhang, S., Su, H., Liu, Q.: Fusedream: training-free text-to-image generation with improved clip+ gan space optimization. arXiv preprint arXiv:2112.01573 (2021)

15.

Meng, C., Song, Y., Song, J., Wu, J., Zhu, J.-Y., Ermon, S.: Sdedit: image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021)

16.

Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: International Conference on Machine Learning, pp. 8162–8171. PMLR (2021)

17.

Patashnik, O., Wu, Z., Shechtman, E., Cohen-Or, D.: Styleclip: text-driven manipulation of stylegan imagery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2085–2094 (2021)

18.

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

19.

Rombach, R., Blattmann, A., Lorenz, D., Esser, P.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)

20.

Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28CrossRef

21.

Ruiz, N., Li, Y., Jampani, V.: Dreambooth: fine tuning text-to-image diffusion models for subject-driven generation. arXiv preprint arXiv:2208.12242 (2022)

22.

Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487 (2022)

23.

Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020)

24.

Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S.: Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020)

25.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit. Attention is all you need. Advances in neural information processing systems, 30 (2017)

26.

Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo., P.: Segformer: simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34, pp. 12077–12090 (2021)

27.

Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: Lsun: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)

Title: Diffusion-Adapter: Text Guided Image Manipulation with Frozen Diffusion Models
Authors: Rongting Wei
Chunxiao Fan
Yuexin Wu
Publisher: Springer Nature Switzerland
Book: Artificial Neural Networks and Machine Learning – ICANN 2023
Print ISBN: 978-3-031-44209-4

Electronic ISBN: 978-3-031-44210-0

Copyright Year: 2023
DOI: https://doi.org/10.1007/978-3-031-44210-0_18

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner