Skip to main content
Top

20-06-2024

A Comprehensive Survey of Image Generation Models Based on Deep Learning

Authors: Jun Li, Chenyang Zhang, Wei Zhu, Yawei Ren

Published in: Annals of Data Science

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In recent years, generative artificial intelligence has been developing rapidly. In the image domain, image generation models based on deep learning have made remarkable achievements. Early frameworks for image generation models were dominated by generative adversarial networks (GANs) and variational autoencoders (VAEs). Nowadays, large-scale generative models based on diffusion models have become mainstream, and the quality of their generated images is significantly improved. We will review the research and development of image generation models and delve into the significant progress made in the field in recent years. Initially, we revisit the development of traditional image generation models like GANs and VAEs, emphasizing their contributions and challenges. We also introduce diffusion models, which have received much attention in the field of image generation due to their unique generative process and excellent generative performance. Subsequently, we emphasized the large vision models with SAM as the focal point. We also pay special attention to large-scale generative models like Stable Diffusion, which have demonstrated unprecedented capabilities in high-quality image generation tasks. Additionally, we explore target models and respective fine-tuning methods for domain-oriented image generation tasks, predicts future directions in image generation, and proposes potential research focuses and challenges.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Olson DL, Shi Y, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York Olson DL, Shi Y, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York
3.
go back to reference Shi Y, Tian Y, Kou G et al (2011) Optimization based data mining: theory and applications. Springer, LondonCrossRef Shi Y, Tian Y, Kou G et al (2011) Optimization based data mining: theory and applications. Springer, LondonCrossRef
4.
go back to reference Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef
11.
go back to reference Sohl-Dickstein J, Weiss E, Maheswaranathan N, et al. (2015) Deep unsupervised learning using nonequilibrium thermodynamics. In: International conference on machine learning. PMLR, pp. 2256–2265. Sohl-Dickstein J, Weiss E, Maheswaranathan N, et al. (2015) Deep unsupervised learning using nonequilibrium thermodynamics. In: International conference on machine learning. PMLR, pp. 2256–2265.
13.
go back to reference Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. Advances in neural information processing systems, 30. Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. Advances in neural information processing systems, 30.
14.
go back to reference Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33:6840–6851 Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33:6840–6851
15.
go back to reference Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. Advances in neural information processing systems 28. Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. Advances in neural information processing systems 28.
17.
go back to reference Van Den Oord A, Vinyals O (2017) Neural discrete representation learning. Advances in neural information processing systems, 30 Van Den Oord A, Vinyals O (2017) Neural discrete representation learning. Advances in neural information processing systems, 30
18.
go back to reference Huang H, He R, Sun Z, et al. (2018) Introvae: introspective variational autoencoders for photographic image synthesis. Advances in neural information processing systems, 31. Huang H, He R, Sun Z, et al. (2018) Introvae: introspective variational autoencoders for photographic image synthesis. Advances in neural information processing systems, 31.
21.
go back to reference Denton E L, Chintala S, Fergus R (2015) Deep generative image models using aOBJ laplacian pyramid of adversarial networks. Advances in neural information processing systems, 28 Denton E L, Chintala S, Fergus R (2015) Deep generative image models using aOBJ laplacian pyramid of adversarial networks. Advances in neural information processing systems, 28
24.
go back to reference Nowozin S, Cseke B, Tomioka R (2016) f-gan: training generative neural samplers using variational divergence minimization. Advances in neural information processing systems. 29. Nowozin S, Cseke B, Tomioka R (2016) f-gan: training generative neural samplers using variational divergence minimization. Advances in neural information processing systems. 29.
25.
go back to reference Chen X, Duan Y, Houthooft R, et al. (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. Advances in neural information processing systems, 29. Chen X, Duan Y, Houthooft R, et al. (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. Advances in neural information processing systems, 29.
28.
go back to reference Gulrajani I, Ahmed F, Arjovsky M, et al. (2017) Improved training of wasserstein gans. Advances in neural information processing systems, 30. Gulrajani I, Ahmed F, Arjovsky M, et al. (2017) Improved training of wasserstein gans. Advances in neural information processing systems, 30.
29.
go back to reference Zhang H, Goodfellow I, Metaxas D, et al. (2019) Self-attention generative adversarial networks. In: International conference on machine learning. PMLR. pp. 7354–7363 Zhang H, Goodfellow I, Metaxas D, et al. (2019) Self-attention generative adversarial networks. In: International conference on machine learning. PMLR. pp. 7354–7363
32.
go back to reference Karras T, Aittala M, Laine S et al (2021) Alias-free generative adversarial networks. Adv Neural Inf Process Syst 34:852–863 Karras T, Aittala M, Laine S et al (2021) Alias-free generative adversarial networks. Adv Neural Inf Process Syst 34:852–863
33.
go back to reference Dhariwal P, Nichol A (2021) Diffusion models beat gans on image synthesis. Adv Neural Inf Process Syst 34:8780–8794 Dhariwal P, Nichol A (2021) Diffusion models beat gans on image synthesis. Adv Neural Inf Process Syst 34:8780–8794
35.
go back to reference Nichol AQ, Dhariwal P (2021) Improved denoising diffusion probabilistic models. In: International Conference on Machine Learning. PMLR, pp. 8162–8171 Nichol AQ, Dhariwal P (2021) Improved denoising diffusion probabilistic models. In: International Conference on Machine Learning. PMLR, pp. 8162–8171
39.
go back to reference Lu C, Zhou Y, Bao F et al (2022) Dpm-solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps. Adv Neural Inf Process Syst 35:5775–5787 Lu C, Zhou Y, Bao F et al (2022) Dpm-solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps. Adv Neural Inf Process Syst 35:5775–5787
42.
go back to reference Watson D, Chan W, Ho J, et al. (2021) Learning fast samplers for diffusion models by differentiating through sample quality. In: International Conference on Learning Representations Watson D, Chan W, Ho J, et al. (2021) Learning fast samplers for diffusion models by differentiating through sample quality. In: International Conference on Learning Representations
47.
go back to reference Bansal A, Borgnia E, Chu H M, et al. (2024) Cold diffusion: inverting arbitrary image transforms without noise. Advances in Neural Information Processing Systems 36. Bansal A, Borgnia E, Chu H M, et al. (2024) Cold diffusion: inverting arbitrary image transforms without noise. Advances in Neural Information Processing Systems 36.
48.
go back to reference Kingma D, Salimans T, Poole B et al (2021) Variational diffusion models. Adv Neural Inf Process Syst 34:21696–21707 Kingma D, Salimans T, Poole B et al (2021) Variational diffusion models. Adv Neural Inf Process Syst 34:21696–21707
49.
go back to reference Sinha A, Song J, Meng C et al (2021) D2c: Diffusion-decoding models for few-shot conditional generation. Adv Neural Inf Process Syst 34:12533–12548 Sinha A, Song J, Meng C et al (2021) D2c: Diffusion-decoding models for few-shot conditional generation. Adv Neural Inf Process Syst 34:12533–12548
52.
go back to reference Radford A, Kim J W, Hallacy C, et al. (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning. PMLR, pp. 8748–8763 Radford A, Kim J W, Hallacy C, et al. (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning. PMLR, pp. 8748–8763
61.
go back to reference Jiang Y, Chang S, Wang Z (2021) Transgan: two pure transformers can make one strong gan, and that can scale up. Adv Neural Inf Process Syst 34:14745–14758 Jiang Y, Chang S, Wang Z (2021) Transgan: two pure transformers can make one strong gan, and that can scale up. Adv Neural Inf Process Syst 34:14745–14758
74.
go back to reference Ramesh A, Pavlov M, Goh G, et al. (2021) Zero-shot text-to-image generation. In: International Conference on Machine Learning. PMLR, pp. 8821–8831. Ramesh A, Pavlov M, Goh G, et al. (2021) Zero-shot text-to-image generation. In: International Conference on Machine Learning. PMLR, pp. 8821–8831.
75.
go back to reference Lin T Y, Maire M, Belongie S, et al. (2014) Microsoft coco: common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014: 740-755. https://doi.org/10.1007/978-3-319-10602-1_48 Lin T Y, Maire M, Belongie S, et al. (2014) Microsoft coco: common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014: 740-755. https://​doi.​org/​10.​1007/​978-3-319-10602-1_​48
78.
go back to reference Saharia C, Chan W, Saxena S et al (2022) Photorealistic text-to-image diffusion models with deep language understanding. Adv Neural Inf Process Syst 35:36479–36494 Saharia C, Chan W, Saxena S et al (2022) Photorealistic text-to-image diffusion models with deep language understanding. Adv Neural Inf Process Syst 35:36479–36494
79.
go back to reference Raffel C, Shazeer N, Roberts A et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21(1):5485–5551 Raffel C, Shazeer N, Roberts A et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21(1):5485–5551
83.
go back to reference Schuhmann C, Beaumont R, Vencu R et al (2022) Laion-5b: an open large-scale dataset for training next generation image-text models. Adv Neural Inf Process Syst 35:25278–25294 Schuhmann C, Beaumont R, Vencu R et al (2022) Laion-5b: an open large-scale dataset for training next generation image-text models. Adv Neural Inf Process Syst 35:25278–25294
92.
go back to reference Hao Y, Chi Z, Dong L, et al. (2024) Optimizing prompts for text-to-image generation. Advances in Neural Information Processing Systems 36. Hao Y, Chi Z, Dong L, et al. (2024) Optimizing prompts for text-to-image generation. Advances in Neural Information Processing Systems 36.
Metadata
Title
A Comprehensive Survey of Image Generation Models Based on Deep Learning
Authors
Jun Li
Chenyang Zhang
Wei Zhu
Yawei Ren
Publication date
20-06-2024
Publisher
Springer Berlin Heidelberg
Published in
Annals of Data Science
Print ISSN: 2198-5804
Electronic ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-024-00544-1

Premium Partner