Skip to main content
Erschienen in:

20.06.2024

A Comprehensive Survey of Image Generation Models Based on Deep Learning

verfasst von: Jun Li, Chenyang Zhang, Wei Zhu, Yawei Ren

Erschienen in: Annals of Data Science | Ausgabe 1/2025

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, generative artificial intelligence has been developing rapidly. In the image domain, image generation models based on deep learning have made remarkable achievements. Early frameworks for image generation models were dominated by generative adversarial networks (GANs) and variational autoencoders (VAEs). Nowadays, large-scale generative models based on diffusion models have become mainstream, and the quality of their generated images is significantly improved. We will review the research and development of image generation models and delve into the significant progress made in the field in recent years. Initially, we revisit the development of traditional image generation models like GANs and VAEs, emphasizing their contributions and challenges. We also introduce diffusion models, which have received much attention in the field of image generation due to their unique generative process and excellent generative performance. Subsequently, we emphasized the large vision models with SAM as the focal point. We also pay special attention to large-scale generative models like Stable Diffusion, which have demonstrated unprecedented capabilities in high-quality image generation tasks. Additionally, we explore target models and respective fine-tuning methods for domain-oriented image generation tasks, predicts future directions in image generation, and proposes potential research focuses and challenges.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Olson DL, Shi Y, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York Olson DL, Shi Y, Shi Y (2007) Introduction to business data mining. McGraw-Hill/Irwin, New York
3.
Zurück zum Zitat Shi Y, Tian Y, Kou G et al (2011) Optimization based data mining: theory and applications. Springer, LondonCrossRef Shi Y, Tian Y, Kou G et al (2011) Optimization based data mining: theory and applications. Springer, LondonCrossRef
4.
Zurück zum Zitat Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef Tien JM (2017) Internet of things, real-time decision making, and artificial intelligence. Ann Data Sci 4:149–178CrossRef
11.
Zurück zum Zitat Sohl-Dickstein J, Weiss E, Maheswaranathan N, et al. (2015) Deep unsupervised learning using nonequilibrium thermodynamics. In: International conference on machine learning. PMLR, pp. 2256–2265. Sohl-Dickstein J, Weiss E, Maheswaranathan N, et al. (2015) Deep unsupervised learning using nonequilibrium thermodynamics. In: International conference on machine learning. PMLR, pp. 2256–2265.
13.
Zurück zum Zitat Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. Advances in neural information processing systems, 30. Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need. Advances in neural information processing systems, 30.
14.
Zurück zum Zitat Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33:6840–6851 Ho J, Jain A, Abbeel P (2020) Denoising diffusion probabilistic models. Adv Neural Inf Process Syst 33:6840–6851
15.
Zurück zum Zitat Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. Advances in neural information processing systems 28. Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. Advances in neural information processing systems 28.
17.
Zurück zum Zitat Van Den Oord A, Vinyals O (2017) Neural discrete representation learning. Advances in neural information processing systems, 30 Van Den Oord A, Vinyals O (2017) Neural discrete representation learning. Advances in neural information processing systems, 30
18.
Zurück zum Zitat Huang H, He R, Sun Z, et al. (2018) Introvae: introspective variational autoencoders for photographic image synthesis. Advances in neural information processing systems, 31. Huang H, He R, Sun Z, et al. (2018) Introvae: introspective variational autoencoders for photographic image synthesis. Advances in neural information processing systems, 31.
21.
Zurück zum Zitat Denton E L, Chintala S, Fergus R (2015) Deep generative image models using aOBJ laplacian pyramid of adversarial networks. Advances in neural information processing systems, 28 Denton E L, Chintala S, Fergus R (2015) Deep generative image models using aOBJ laplacian pyramid of adversarial networks. Advances in neural information processing systems, 28
24.
Zurück zum Zitat Nowozin S, Cseke B, Tomioka R (2016) f-gan: training generative neural samplers using variational divergence minimization. Advances in neural information processing systems. 29. Nowozin S, Cseke B, Tomioka R (2016) f-gan: training generative neural samplers using variational divergence minimization. Advances in neural information processing systems. 29.
25.
Zurück zum Zitat Chen X, Duan Y, Houthooft R, et al. (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. Advances in neural information processing systems, 29. Chen X, Duan Y, Houthooft R, et al. (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. Advances in neural information processing systems, 29.
28.
Zurück zum Zitat Gulrajani I, Ahmed F, Arjovsky M, et al. (2017) Improved training of wasserstein gans. Advances in neural information processing systems, 30. Gulrajani I, Ahmed F, Arjovsky M, et al. (2017) Improved training of wasserstein gans. Advances in neural information processing systems, 30.
29.
Zurück zum Zitat Zhang H, Goodfellow I, Metaxas D, et al. (2019) Self-attention generative adversarial networks. In: International conference on machine learning. PMLR. pp. 7354–7363 Zhang H, Goodfellow I, Metaxas D, et al. (2019) Self-attention generative adversarial networks. In: International conference on machine learning. PMLR. pp. 7354–7363
32.
Zurück zum Zitat Karras T, Aittala M, Laine S et al (2021) Alias-free generative adversarial networks. Adv Neural Inf Process Syst 34:852–863 Karras T, Aittala M, Laine S et al (2021) Alias-free generative adversarial networks. Adv Neural Inf Process Syst 34:852–863
33.
Zurück zum Zitat Dhariwal P, Nichol A (2021) Diffusion models beat gans on image synthesis. Adv Neural Inf Process Syst 34:8780–8794 Dhariwal P, Nichol A (2021) Diffusion models beat gans on image synthesis. Adv Neural Inf Process Syst 34:8780–8794
35.
Zurück zum Zitat Nichol AQ, Dhariwal P (2021) Improved denoising diffusion probabilistic models. In: International Conference on Machine Learning. PMLR, pp. 8162–8171 Nichol AQ, Dhariwal P (2021) Improved denoising diffusion probabilistic models. In: International Conference on Machine Learning. PMLR, pp. 8162–8171
39.
Zurück zum Zitat Lu C, Zhou Y, Bao F et al (2022) Dpm-solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps. Adv Neural Inf Process Syst 35:5775–5787 Lu C, Zhou Y, Bao F et al (2022) Dpm-solver: a fast ode solver for diffusion probabilistic model sampling in around 10 steps. Adv Neural Inf Process Syst 35:5775–5787
42.
Zurück zum Zitat Watson D, Chan W, Ho J, et al. (2021) Learning fast samplers for diffusion models by differentiating through sample quality. In: International Conference on Learning Representations Watson D, Chan W, Ho J, et al. (2021) Learning fast samplers for diffusion models by differentiating through sample quality. In: International Conference on Learning Representations
47.
Zurück zum Zitat Bansal A, Borgnia E, Chu H M, et al. (2024) Cold diffusion: inverting arbitrary image transforms without noise. Advances in Neural Information Processing Systems 36. Bansal A, Borgnia E, Chu H M, et al. (2024) Cold diffusion: inverting arbitrary image transforms without noise. Advances in Neural Information Processing Systems 36.
48.
Zurück zum Zitat Kingma D, Salimans T, Poole B et al (2021) Variational diffusion models. Adv Neural Inf Process Syst 34:21696–21707 Kingma D, Salimans T, Poole B et al (2021) Variational diffusion models. Adv Neural Inf Process Syst 34:21696–21707
49.
Zurück zum Zitat Sinha A, Song J, Meng C et al (2021) D2c: Diffusion-decoding models for few-shot conditional generation. Adv Neural Inf Process Syst 34:12533–12548 Sinha A, Song J, Meng C et al (2021) D2c: Diffusion-decoding models for few-shot conditional generation. Adv Neural Inf Process Syst 34:12533–12548
52.
Zurück zum Zitat Radford A, Kim J W, Hallacy C, et al. (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning. PMLR, pp. 8748–8763 Radford A, Kim J W, Hallacy C, et al. (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning. PMLR, pp. 8748–8763
61.
Zurück zum Zitat Jiang Y, Chang S, Wang Z (2021) Transgan: two pure transformers can make one strong gan, and that can scale up. Adv Neural Inf Process Syst 34:14745–14758 Jiang Y, Chang S, Wang Z (2021) Transgan: two pure transformers can make one strong gan, and that can scale up. Adv Neural Inf Process Syst 34:14745–14758
74.
Zurück zum Zitat Ramesh A, Pavlov M, Goh G, et al. (2021) Zero-shot text-to-image generation. In: International Conference on Machine Learning. PMLR, pp. 8821–8831. Ramesh A, Pavlov M, Goh G, et al. (2021) Zero-shot text-to-image generation. In: International Conference on Machine Learning. PMLR, pp. 8821–8831.
75.
Zurück zum Zitat Lin T Y, Maire M, Belongie S, et al. (2014) Microsoft coco: common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014: 740-755. https://doi.org/10.1007/978-3-319-10602-1_48 Lin T Y, Maire M, Belongie S, et al. (2014) Microsoft coco: common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014: 740-755. https://​doi.​org/​10.​1007/​978-3-319-10602-1_​48
78.
Zurück zum Zitat Saharia C, Chan W, Saxena S et al (2022) Photorealistic text-to-image diffusion models with deep language understanding. Adv Neural Inf Process Syst 35:36479–36494 Saharia C, Chan W, Saxena S et al (2022) Photorealistic text-to-image diffusion models with deep language understanding. Adv Neural Inf Process Syst 35:36479–36494
79.
Zurück zum Zitat Raffel C, Shazeer N, Roberts A et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21(1):5485–5551 Raffel C, Shazeer N, Roberts A et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21(1):5485–5551
83.
Zurück zum Zitat Schuhmann C, Beaumont R, Vencu R et al (2022) Laion-5b: an open large-scale dataset for training next generation image-text models. Adv Neural Inf Process Syst 35:25278–25294 Schuhmann C, Beaumont R, Vencu R et al (2022) Laion-5b: an open large-scale dataset for training next generation image-text models. Adv Neural Inf Process Syst 35:25278–25294
92.
Zurück zum Zitat Hao Y, Chi Z, Dong L, et al. (2024) Optimizing prompts for text-to-image generation. Advances in Neural Information Processing Systems 36. Hao Y, Chi Z, Dong L, et al. (2024) Optimizing prompts for text-to-image generation. Advances in Neural Information Processing Systems 36.
Metadaten
Titel
A Comprehensive Survey of Image Generation Models Based on Deep Learning
verfasst von
Jun Li
Chenyang Zhang
Wei Zhu
Yawei Ren
Publikationsdatum
20.06.2024
Verlag
Springer Berlin Heidelberg
Erschienen in
Annals of Data Science / Ausgabe 1/2025
Print ISSN: 2198-5804
Elektronische ISSN: 2198-5812
DOI
https://doi.org/10.1007/s40745-024-00544-1