nach oben

Neural Processing Letters

Erschienen in:

14.05.2022

PBGN: Phased Bidirectional Generation Network in Text-to-Image Synthesis

verfasst von: Jianwei Zhu, Zhixin Li, Jiahui Wei, Huifang Ma

Erschienen in: Neural Processing Letters | Ausgabe 6/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Text-to-image synthesis methods are mainly evaluated from two aspects: one is the quality and diversity of the generated images, and the other is the semantic consistency between the generated images and the input sentences. To address the problem of semantic consistency during image generation, we propose a Phased Bidirectional Generative Network. We use a bidirectional generative mechanism based on a multi-level generative adversarial network, where the images generated at each level are used to generate text, and the generated images are constrained by introducing a reconstruction loss. At the same time, we explore the effectiveness of the self-attention mechanism and spectral normalization techniques to improve the performance of generative networks. Furthermore, we propose an efficient boundary augmentation strategy to improve the performance of the model on small-scale datasets. Our method achieves Inception Scores of 4.71, 5.13, 32.42, and R-precision scores of 92.55, 87.72, and 92.29 on Oxford-102, CUB-200, and MS-COCO datasets, respectively.

Vorheriger Artikel Sequential Enhancement for Compressed Video Using Deep Convolutional Generative Adversarial Network

Nächster Artikel Generating Optimal Test Case Generation Using Shuffled Shepherd Flamingo Search Model

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp 2672–2680

Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942

Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805

Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: Proceedings of the International Conference on Machine Learning, pp 7354–7363

Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7794–7803

Farnia F, Zhang JM, Tse D (2018) Generalizable adversarial training via spectral normalization. arXiv preprint arXiv:1811.07457

Zhao L, Liu Y (2020) Spectral normalization for domain adaptation. Information 11(2):68CrossRef

Zhang J, Li Z, Zhang C, Ma H (2021) Stable self-attention adversarial learning for semi-supervised semantic image segmentation. J Vis Commun Image Represent 78:103170CrossRef

Reed S, Akata Z, Lee H, Schiele B (2016) Learning deep representations of fine-grained visual descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49–58

10.

Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5907–5915

11.

Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018) Attngan: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1316–1324

12.

Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681CrossRef

13.

Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365

14.

Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692

15.

Pavllo D, Lucchi A, Hofmann T (2020) Controlling style and semantics in weakly-supervised image generation. In: Proceedings of the European conference on computer vision, pp 482–499

16.

Wang T, Zhang T, Lovell B (2021) Faces à la carte: Text-to-face generation via attribute disentanglement. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3380–3388

17.

Souza DM, Wehrmann J, Ruiz DD (2020) Efficient neural architecture for text-to-image synthesis. In: Proceedings of the international joint conference on neural networks (IJCNN), pp 1–8

18.

Liang J, Pei W, Lu F (2020) Cpgan: Content-parsing generative adversarial networks for text-to-image synthesis. In: Proceedings of the European conference on computer vision, pp 491–508

19.

Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784

20.

Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396

21.

Dash A, Gamboa JCB, Ahmed S, Liwicki M, Afzal MZ (2017) Tac-gan-text conditioned auxiliary classifier generative adversarial network. arXiv preprint arXiv:1703.06412

22.

Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the international conference on machine learning, pp 2642–2651

23.

Reed SE, Akata Z, Mohan S, Tenka S, Schiele B, Lee H (2016) Learning what and where to draw. In: Advances in neural information processing systems, pp 217–225

24.

Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. Technical report, California Institute of Technology

25.

Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2018) Stackgan++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947–1962CrossRef

26.

Yuan M, Peng Y (2018) Text-to-image synthesis via symmetrical distillation networks. In: Proceedings of the 26th ACM international conference on multimedia, pp 1407–1415

27.

Ma S, Fu J, Chen CW, Mei T (2018) Da-gan: Instance-level image translation by deep attention generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5657–5666

28.

Zhu M, Pan P, Chen W, Yang Y (2019) Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5802–5810

29.

Weston J, Chopra S, Bordes A (2014) Memory networks. arXiv preprint arXiv:1410.3916

30.

Gulcehre C, Chandar S, Cho K, Bengio Y (2018) Dynamic neural turing machine with continuous and discrete addressing schemes. Neural Comput 30(4):857–884MathSciNetCrossRefMATH

31.

Cheng J, Wu F, Tian Y, Wang L, Tao D (2020) Rifegan: Rich feature generation for text-to-image synthesis from prior knowledge. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10911–10920

32.

Qiao T, Zhang J, Xu D, Tao D (2019) Mirrorgan: Learning text-to-image generation by redescription. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1505–1514

33.

Lee K-H, Chen X, Hua G, Hu H, He X (2018) Stacked cross attention for image-text matching. In: Proceedings of the European conference on computer vision, pp 201–216

34.

Cui H, Zhu L, Li J, Yang Y, Nie L (2020) Scalable deep hashing for large-scale social image retrieval. IEEE Trans Image Process 29:1271–1284MathSciNetCrossRefMATH

35.

Zhu L, Lu X, Cheng Z, Li J, Zhang H (2020) Deep collaborative multi-view hashing for large-scale image search. IEEE Trans Image Process 29:4643–4655MathSciNetCrossRefMATH

36.

Li Z, Xie X, Ling F, Ma H, Shi Z (2021) Matching images and texts with multi-head attention network for cross-media hashing retrieval. Eng Appl Artif Intell 106:104475CrossRef

37.

Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

38.

Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255

39.

He X, Deng L, Chou W (2008) Discriminative learning in sequential pattern recognition. IEEE Signal Process Mag 25(5):14–36CrossRef

40.

Juang B-H, Hou W, Lee C-H (1997) Minimum classification error rate methods for speech recognition. IEEE Trans Speech Audio Process 5(3):257–265CrossRef

41.

Fang H, Gupta S, Iandola F, Srivastava RK, Deng L, Dollár P, Gao J, He X, Mitchell M, Platt JC, Zitnick CL, Zweig G (2015) From captions to visual concepts and back. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), pp 1473–1482

42.

Huang P-S, He X, Gao J, Deng L, Acero A, Heck L (2013) Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM international conference on information & knowledge management, pp 2333–2338

43.

Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

44.

Wei H, Li Z, Huang F, Zhang C, Ma H, Shi Z (2021) Integrating scene semantic knowledge into image captioning. ACM Trans Multimed Comput Commun Appl 17(2):1–22CrossRef

45.

Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

46.

Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K (2015) Sequence to sequence-video to text. In: Proceedings of the IEEE international conference on computer vision, pp 4534–4542

47.

Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Proceedings of the European conference on computer vision, pp 740–755

48.

Nilsback M-E, Zisserman A (2008) Automated flower classification over a large number of classes. In: Proceedings of the Sixth Indian conference on computer vision, graphics & image processing, pp 722–729

49.

Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242

50.

Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957

Titel: PBGN: Phased Bidirectional Generation Network in Text-to-Image Synthesis
verfasst von: Jianwei Zhu
Zhixin Li
Jiahui Wei
Huifang Ma
Publikationsdatum: 14.05.2022
Verlag: Springer US
Erschienen in: Neural Processing Letters / Ausgabe 6/2022
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI: https://doi.org/10.1007/s11063-022-10866-x

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Die Gewinner und Laudatoren des Sustainability Award in Automotive 2024/© Uli Regenscheit | ATZlive, Search Icon, Banner Hanser, Sebastian Glenschek/© Hermes International, Dinko Eror/© Red Hat GmbH, Suresh Vittal/© Alteryx, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade, chassis.tech plus 2023/© [M] ATZlive / TÜV SÜD PRODUCT SERVICE GMBH, adäsion-Webinar-Matinee/© krystiannawrocki_ Getty Images

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 6/2022

Robust Asymptotic Stability and Projective Synchronization of Time-Varying Delayed Fractional Neural Networks Under Parametric Uncertainty

Breast Cancer Semantic Segmentation for Accurate Breast Cancer Detection with an Ensemble Deep Neural Network

COLAM: Co-Learning of Deep Neural Networks and Soft Labels via Alternating Minimization

A Hybrid Model Integrating Improved Fuzzy c-means and Optimized Mixed Kernel Relevance Vector Machine for Classification of Coal and Gas Outbursts

A Hybrid Optimized Deep Learning Framework to Enhance Question Answering System

Deep Learning Based-Virtual Screening Using 2D Pharmacophore Fingerprint in Drug Discovery

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.