Skip to main content

2025 | OriginalPaper | Buchkapitel

A Comprehensive Exploration on Detecting Fake Images Generated by Stable Diffusion

verfasst von : Jingyi Chen, Xiaolong Wang, Zhijian He, Xiaojiang Peng

Erschienen in: Pattern Recognition and Computer Vision

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Diffusion models, particularly Stable Diffusion Models (SDMs), have recently emerged as a focal point within the generative artificial intelligence sector, acclaimed for their superior visual fidelity and versatility. Despite their rising prominence, the challenge of detecting SDM-generated images has been somewhat overlooked, sparking concerns over their potential misuse for nefarious purposes. This paper aims to delve into the complexities of differentiating authentic images from those generated by SDMs, offering three significant contributions to the field. Firstly, we introduce a varied synthetic image dataset named SDM-Fakes, which consists of six subsets utilizing txt2img, img2img, and inpainting techniques. Secondly, we develop both CNN-based and Transformer-based detection models to identify artificial images, assessing a range of cutting-edge models. Thirdly, we pioneer the evaluation of these detection models’ generalization capabilities across different schemes. We also explored the impact of unknown perturbations on those detectors. Through comprehensive testing, we demonstrate that while current models are adept at recognizing SDM-generated images, there is a significant need to enhance their ability to generalize cross-scheme tasks, as well as robustness on unknown perturbations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Avrahami, O., Fried, O., Lischinski, D.: Blended latent diffusion. ACM Trans. Graph. (TOG) 42(4), 1–11 (2023)CrossRef Avrahami, O., Fried, O., Lischinski, D.: Blended latent diffusion. ACM Trans. Graph. (TOG) 42(4), 1–11 (2023)CrossRef
2.
Zurück zum Zitat Bird, J.J., Lotfi, A.: Cifake: image classification and explainable identification of AI-generated synthetic images. Proc. IEEE Access (2024) Bird, J.J., Lotfi, A.: Cifake: image classification and explainable identification of AI-generated synthetic images. Proc. IEEE Access (2024)
3.
Zurück zum Zitat Coccomini, D.A., Esuli, A., Falchi, F., Gennaro, C., Amato, G.: Detecting images generated by diffusers (2023). arXiv preprint arXiv:2303.05275 Coccomini, D.A., Esuli, A., Falchi, F., Gennaro, C., Amato, G.: Detecting images generated by diffusers (2023). arXiv preprint arXiv:​2303.​05275
4.
Zurück zum Zitat Corvi, R., Cozzolino, D., Zingarini, G., Poggi, G., Nagano, K., Verdoliva, L.: On the detection of synthetic images generated by diffusion models. In: Proceedings of IEEE ICASSP, pp. 1–5 (2023) Corvi, R., Cozzolino, D., Zingarini, G., Poggi, G., Nagano, K., Verdoliva, L.: On the detection of synthetic images generated by diffusion models. In: Proceedings of IEEE ICASSP, pp. 1–5 (2023)
5.
Zurück zum Zitat Dang, H., Liu, F., Stehouwer, J., Liu, X., Jain, A.K.: On the detection of digital face manipulation. In: Proceedings of IEEE/CVF CVPR, pp. 5781–5790 (2020) Dang, H., Liu, F., Stehouwer, J., Liu, X., Jain, A.K.: On the detection of digital face manipulation. In: Proceedings of IEEE/CVF CVPR, pp. 5781–5790 (2020)
6.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE/CVF CVPR, pp. 248–255 (2009) Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE/CVF CVPR, pp. 248–255 (2009)
7.
Zurück zum Zitat Dodds, K.: Popular geopolitics and audience dispositions: James bond and the internet movie database (imdb). Trans. Inst. Br. Geogr. 31(2), 116–130 (2006)CrossRef Dodds, K.: Popular geopolitics and audience dispositions: James bond and the internet movie database (imdb). Trans. Inst. Br. Geogr. 31(2), 116–130 (2006)CrossRef
8.
Zurück zum Zitat Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16 x 16 words: transformers for image recognition at scale (2020). arXiv preprint arXiv:2010.11929 Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16 x 16 words: transformers for image recognition at scale (2020). arXiv preprint arXiv:​2010.​11929
9.
Zurück zum Zitat Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Proceedings of NeurIPS, vol. 27 (2014) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Proceedings of NeurIPS, vol. 27 (2014)
10.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE/CVF CVPR, pp. 770–778 (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE/CVF CVPR, pp. 770–778 (2016)
11.
Zurück zum Zitat He, Y., Gan, B., Chen, S., Zhou, Y., Yin, G., Song, L., Sheng, L., Shao, J., Liu, Z.: Forgerynet: a versatile benchmark for comprehensive forgery analysis. In: Proceedings of IEEE/CVF CVPR, pp. 4360–4369 (2021) He, Y., Gan, B., Chen, S., Zhou, Y., Yin, G., Song, L., Sheng, L., Shao, J., Liu, Z.: Forgerynet: a versatile benchmark for comprehensive forgery analysis. In: Proceedings of IEEE/CVF CVPR, pp. 4360–4369 (2021)
12.
Zurück zum Zitat Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Proceedings of NeurIPS, vol. 33, pp. 6840–6851 (2020) Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Proceedings of NeurIPS, vol. 33, pp. 6840–6851 (2020)
13.
Zurück zum Zitat Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation (2017). arXiv preprint arXiv:1710.10196 Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation (2017). arXiv preprint arXiv:​1710.​10196
14.
Zurück zum Zitat Karras, T., Aittala, M., Aila, T., Laine, S.: Elucidating the design space of diffusion-based generative models. In: Proceedings of NeurIPS, vol. 35, pp. 26565–26577 (2022) Karras, T., Aittala, M., Aila, T., Laine, S.: Elucidating the design space of diffusion-based generative models. In: Proceedings of NeurIPS, vol. 35, pp. 26565–26577 (2022)
15.
Zurück zum Zitat Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF, pp. 8110–8119 (2020) Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF, pp. 8110–8119 (2020)
16.
Zurück zum Zitat Kingma, D., Salimans, T., Poole, B., Ho, J.: Variational diffusion models. In: Proceedings of NeurIPS, vol. 34, pp. 21696–21707 (2021) Kingma, D., Salimans, T., Poole, B., Ho, J.: Variational diffusion models. In: Proceedings of NeurIPS, vol. 34, pp. 21696–21707 (2021)
17.
Zurück zum Zitat Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.L., Yong, M.G., Lee, J., et al.: Mediapipe: a framework for building perception pipelines (2019). arXiv preprint arXiv:1906.08172 Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.L., Yong, M.G., Lee, J., et al.: Mediapipe: a framework for building perception pipelines (2019). arXiv preprint arXiv:​1906.​08172
18.
Zurück zum Zitat Marra, F., Gragnaniello, D., Cozzolino, D., Verdoliva, L.: Detection of gan-generated fake images over social networks. In: Proceedings of IEEE MIPR, pp. 384–389 (2018) Marra, F., Gragnaniello, D., Cozzolino, D., Verdoliva, L.: Detection of gan-generated fake images over social networks. In: Proceedings of IEEE MIPR, pp. 384–389 (2018)
19.
Zurück zum Zitat Marra, F., Gragnaniello, D., Verdoliva, L., Poggi, G.: Do gans leave artificial fingerprints? In: Proceedings of IEEE MIPR, pp. 506–511 (2019) Marra, F., Gragnaniello, D., Verdoliva, L., Poggi, G.: Do gans leave artificial fingerprints? In: Proceedings of IEEE MIPR, pp. 506–511 (2019)
20.
Zurück zum Zitat Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: Sdedit: guided image synthesis and editing with stochastic differential equations (2021). arXiv preprint arXiv:2108.01073 Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: Sdedit: guided image synthesis and editing with stochastic differential equations (2021). arXiv preprint arXiv:​2108.​01073
21.
Zurück zum Zitat Nataraj, L., Mohammed, T.M., Chandrasekaran, S., Flenner, A., Bappy, J.H., Roy-Chowdhury, A.K., Manjunath, B.: Detecting gan generated fake images using co-occurrence matrices (2019). arXiv preprint arXiv:1903.06836 Nataraj, L., Mohammed, T.M., Chandrasekaran, S., Flenner, A., Bappy, J.H., Roy-Chowdhury, A.K., Manjunath, B.: Detecting gan generated fake images using co-occurrence matrices (2019). arXiv preprint arXiv:​1903.​06836
22.
Zurück zum Zitat Neves, J.C., Tolosana, R., Vera-Rodriguez, R., Lopes, V., Proença, H., Fierrez, J.: Ganprintr: improved fakes and evaluation of the state of the art in face manipulation detection. Proc. IEEE J-STSP 14(5), 1038–1048 (2020) Neves, J.C., Tolosana, R., Vera-Rodriguez, R., Lopes, V., Proença, H., Fierrez, J.: Ganprintr: improved fakes and evaluation of the state of the art in face manipulation detection. Proc. IEEE J-STSP 14(5), 1038–1048 (2020)
23.
Zurück zum Zitat Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents (2022). arXiv preprint arXiv:2204.06125 Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical text-conditional image generation with clip latents (2022). arXiv preprint arXiv:​2204.​06125
24.
Zurück zum Zitat Ricker, J., Damm, S., Holz, T., Fischer, A.: Towards the detection of diffusion model deepfakes (2022). arXiv preprint arXiv:2210.14571 Ricker, J., Damm, S., Holz, T., Fischer, A.: Towards the detection of diffusion model deepfakes (2022). arXiv preprint arXiv:​2210.​14571
25.
Zurück zum Zitat Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF, pp. 10684–10695 (2022) Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF, pp. 10684–10695 (2022)
26.
Zurück zum Zitat Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E.L., Ghasemipour, K., Gontijo Lopes, R., Karagol Ayan, B., Salimans, T., et al.: Photorealistic text-to-image diffusion models with deep language understanding. In: Proceedings of NeurIPS, vol. 35, pp. 36479–36494 (2022) Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E.L., Ghasemipour, K., Gontijo Lopes, R., Karagol Ayan, B., Salimans, T., et al.: Photorealistic text-to-image diffusion models with deep language understanding. In: Proceedings of NeurIPS, vol. 35, pp. 36479–36494 (2022)
27.
Zurück zum Zitat Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE/CVF ICCV, pp. 618–626 (2017) Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE/CVF ICCV, pp. 618–626 (2017)
28.
Zurück zum Zitat Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, pp. 2256–2265. PMLR (2015) Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, pp. 2256–2265. PMLR (2015)
29.
Zurück zum Zitat Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. In: Proceedings of NeurIPS, vol. 32 (2019) Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. In: Proceedings of NeurIPS, vol. 32 (2019)
30.
Zurück zum Zitat Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations (2020). arXiv preprint arXiv:2011.13456 Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations (2020). arXiv preprint arXiv:​2011.​13456
31.
Zurück zum Zitat Wang, S.Y., Wang, O., Owens, A., Zhang, R., Efros, A.A.: Detecting photoshopped faces by scripting photoshop. In: Proceedings of IEEE/CVF ICCV, pp. 10072–10081 (2019) Wang, S.Y., Wang, O., Owens, A., Zhang, R., Efros, A.A.: Detecting photoshopped faces by scripting photoshop. In: Proceedings of IEEE/CVF ICCV, pp. 10072–10081 (2019)
32.
Zurück zum Zitat Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: CNN-generated images are surprisingly easy to spot... for now. In: Proceedings of IEEE/CVF CVPR, pp. 8695–8704 (2020) Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: CNN-generated images are surprisingly easy to spot... for now. In: Proceedings of IEEE/CVF CVPR, pp. 8695–8704 (2020)
33.
Zurück zum Zitat Wang, Z., Bao, J., Zhou, W., Wang, W., Hu, H., Chen, H., Li, H.: Dire for diffusion-generated image detection. In: Proceedings of IEEE/CVF ICCV, pp. 22445–22455 (2023) Wang, Z., Bao, J., Zhou, W., Wang, W., Hu, H., Chen, H., Li, H.: Dire for diffusion-generated image detection. In: Proceedings of IEEE/CVF ICCV, pp. 22445–22455 (2023)
34.
Zurück zum Zitat Yu, N., Davis, L.S., Fritz, M.: Attributing fake images to gans: Learning and analyzing gan fingerprints. In: Proceedings of IEEE/CVF ICCV, pp. 7556–7566 (2019) Yu, N., Davis, L.S., Fritz, M.: Attributing fake images to gans: Learning and analyzing gan fingerprints. In: Proceedings of IEEE/CVF ICCV, pp. 7556–7566 (2019)
35.
Zurück zum Zitat Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Jiang, Z.H., Tay, F.E., Feng, J., Yan, S.: Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of IEEE/CVF ICCV, pp. 558–567 (2021) Yuan, L., Chen, Y., Wang, T., Yu, W., Shi, Y., Jiang, Z.H., Tay, F.E., Feng, J., Yan, S.: Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of IEEE/CVF ICCV, pp. 558–567 (2021)
36.
Zurück zum Zitat Zhang, X., Karaman, S., Chang, S.F.: Detecting and simulating artifacts in gan fake images. In: Proceedings of IEEE WIFS, pp. 1–6 (2019) Zhang, X., Karaman, S., Chang, S.F.: Detecting and simulating artifacts in gan fake images. In: Proceedings of IEEE WIFS, pp. 1–6 (2019)
37.
Zurück zum Zitat Zhou, P., Han, X., Morariu, V.I., Davis, L.S.: Learning rich features for image manipulation detection. In: Proceedings of IEEE/CVF CVPR, pp. 1053–1061 (2018) Zhou, P., Han, X., Morariu, V.I., Davis, L.S.: Learning rich features for image manipulation detection. In: Proceedings of IEEE/CVF CVPR, pp. 1053–1061 (2018)
38.
Zurück zum Zitat Zhu, M., Chen, H., Yan, Q., Huang, X., Lin, G., Li, W., Tu, Z., Hu, H., Hu, J., Wang, Y.: Genimage: a million-scale benchmark for detecting AI-generated image. In: Proceedings of NeurIPS, vol. 36 (2024) Zhu, M., Chen, H., Yan, Q., Huang, X., Lin, G., Li, W., Tu, Z., Hu, H., Hu, J., Wang, Y.: Genimage: a million-scale benchmark for detecting AI-generated image. In: Proceedings of NeurIPS, vol. 36 (2024)
Metadaten
Titel
A Comprehensive Exploration on Detecting Fake Images Generated by Stable Diffusion
verfasst von
Jingyi Chen
Xiaolong Wang
Zhijian He
Xiaojiang Peng
Copyright-Jahr
2025
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-8487-5_32