Skip to main content
Erschienen in: Neural Processing Letters 6/2022

14.05.2022

PBGN: Phased Bidirectional Generation Network in Text-to-Image Synthesis

verfasst von: Jianwei Zhu, Zhixin Li, Jiahui Wei, Huifang Ma

Erschienen in: Neural Processing Letters | Ausgabe 6/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Text-to-image synthesis methods are mainly evaluated from two aspects: one is the quality and diversity of the generated images, and the other is the semantic consistency between the generated images and the input sentences. To address the problem of semantic consistency during image generation, we propose a Phased Bidirectional Generative Network. We use a bidirectional generative mechanism based on a multi-level generative adversarial network, where the images generated at each level are used to generate text, and the generated images are constrained by introducing a reconstruction loss. At the same time, we explore the effectiveness of the self-attention mechanism and spectral normalization techniques to improve the performance of generative networks. Furthermore, we propose an efficient boundary augmentation strategy to improve the performance of the model on small-scale datasets. Our method achieves Inception Scores of 4.71, 5.13, 32.42, and R-precision scores of 92.55, 87.72, and 92.29 on Oxford-102, CUB-200, and MS-COCO datasets, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp 2672–2680 Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp 2672–2680
2.
Zurück zum Zitat Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R (2019) Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:​1909.​11942
3.
Zurück zum Zitat Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805
4.
Zurück zum Zitat Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: Proceedings of the International Conference on Machine Learning, pp 7354–7363 Zhang H, Goodfellow I, Metaxas D, Odena A (2019) Self-attention generative adversarial networks. In: Proceedings of the International Conference on Machine Learning, pp 7354–7363
5.
Zurück zum Zitat Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7794–7803 Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7794–7803
6.
7.
Zurück zum Zitat Zhao L, Liu Y (2020) Spectral normalization for domain adaptation. Information 11(2):68CrossRef Zhao L, Liu Y (2020) Spectral normalization for domain adaptation. Information 11(2):68CrossRef
8.
Zurück zum Zitat Zhang J, Li Z, Zhang C, Ma H (2021) Stable self-attention adversarial learning for semi-supervised semantic image segmentation. J Vis Commun Image Represent 78:103170CrossRef Zhang J, Li Z, Zhang C, Ma H (2021) Stable self-attention adversarial learning for semi-supervised semantic image segmentation. J Vis Commun Image Represent 78:103170CrossRef
9.
Zurück zum Zitat Reed S, Akata Z, Lee H, Schiele B (2016) Learning deep representations of fine-grained visual descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49–58 Reed S, Akata Z, Lee H, Schiele B (2016) Learning deep representations of fine-grained visual descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 49–58
10.
Zurück zum Zitat Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5907–5915 Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5907–5915
11.
Zurück zum Zitat Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018) Attngan: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1316–1324 Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018) Attngan: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1316–1324
12.
Zurück zum Zitat Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681CrossRef Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681CrossRef
13.
Zurück zum Zitat Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv preprint arXiv:1802.05365 Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv preprint arXiv:​1802.​05365
14.
Zurück zum Zitat Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:​1907.​11692
15.
Zurück zum Zitat Pavllo D, Lucchi A, Hofmann T (2020) Controlling style and semantics in weakly-supervised image generation. In: Proceedings of the European conference on computer vision, pp 482–499 Pavllo D, Lucchi A, Hofmann T (2020) Controlling style and semantics in weakly-supervised image generation. In: Proceedings of the European conference on computer vision, pp 482–499
16.
Zurück zum Zitat Wang T, Zhang T, Lovell B (2021) Faces à la carte: Text-to-face generation via attribute disentanglement. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3380–3388 Wang T, Zhang T, Lovell B (2021) Faces à la carte: Text-to-face generation via attribute disentanglement. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 3380–3388
17.
Zurück zum Zitat Souza DM, Wehrmann J, Ruiz DD (2020) Efficient neural architecture for text-to-image synthesis. In: Proceedings of the international joint conference on neural networks (IJCNN), pp 1–8 Souza DM, Wehrmann J, Ruiz DD (2020) Efficient neural architecture for text-to-image synthesis. In: Proceedings of the international joint conference on neural networks (IJCNN), pp 1–8
18.
Zurück zum Zitat Liang J, Pei W, Lu F (2020) Cpgan: Content-parsing generative adversarial networks for text-to-image synthesis. In: Proceedings of the European conference on computer vision, pp 491–508 Liang J, Pei W, Lu F (2020) Cpgan: Content-parsing generative adversarial networks for text-to-image synthesis. In: Proceedings of the European conference on computer vision, pp 491–508
20.
Zurück zum Zitat Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396 Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. arXiv preprint arXiv:​1605.​05396
21.
Zurück zum Zitat Dash A, Gamboa JCB, Ahmed S, Liwicki M, Afzal MZ (2017) Tac-gan-text conditioned auxiliary classifier generative adversarial network. arXiv preprint arXiv:1703.06412 Dash A, Gamboa JCB, Ahmed S, Liwicki M, Afzal MZ (2017) Tac-gan-text conditioned auxiliary classifier generative adversarial network. arXiv preprint arXiv:​1703.​06412
22.
Zurück zum Zitat Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the international conference on machine learning, pp 2642–2651 Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the international conference on machine learning, pp 2642–2651
23.
Zurück zum Zitat Reed SE, Akata Z, Mohan S, Tenka S, Schiele B, Lee H (2016) Learning what and where to draw. In: Advances in neural information processing systems, pp 217–225 Reed SE, Akata Z, Mohan S, Tenka S, Schiele B, Lee H (2016) Learning what and where to draw. In: Advances in neural information processing systems, pp 217–225
24.
Zurück zum Zitat Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. Technical report, California Institute of Technology Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset. Technical report, California Institute of Technology
25.
Zurück zum Zitat Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2018) Stackgan++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947–1962CrossRef Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2018) Stackgan++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947–1962CrossRef
26.
Zurück zum Zitat Yuan M, Peng Y (2018) Text-to-image synthesis via symmetrical distillation networks. In: Proceedings of the 26th ACM international conference on multimedia, pp 1407–1415 Yuan M, Peng Y (2018) Text-to-image synthesis via symmetrical distillation networks. In: Proceedings of the 26th ACM international conference on multimedia, pp 1407–1415
27.
Zurück zum Zitat Ma S, Fu J, Chen CW, Mei T (2018) Da-gan: Instance-level image translation by deep attention generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5657–5666 Ma S, Fu J, Chen CW, Mei T (2018) Da-gan: Instance-level image translation by deep attention generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5657–5666
28.
Zurück zum Zitat Zhu M, Pan P, Chen W, Yang Y (2019) Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5802–5810 Zhu M, Pan P, Chen W, Yang Y (2019) Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5802–5810
30.
Zurück zum Zitat Gulcehre C, Chandar S, Cho K, Bengio Y (2018) Dynamic neural turing machine with continuous and discrete addressing schemes. Neural Comput 30(4):857–884MathSciNetCrossRefMATH Gulcehre C, Chandar S, Cho K, Bengio Y (2018) Dynamic neural turing machine with continuous and discrete addressing schemes. Neural Comput 30(4):857–884MathSciNetCrossRefMATH
31.
Zurück zum Zitat Cheng J, Wu F, Tian Y, Wang L, Tao D (2020) Rifegan: Rich feature generation for text-to-image synthesis from prior knowledge. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10911–10920 Cheng J, Wu F, Tian Y, Wang L, Tao D (2020) Rifegan: Rich feature generation for text-to-image synthesis from prior knowledge. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10911–10920
32.
Zurück zum Zitat Qiao T, Zhang J, Xu D, Tao D (2019) Mirrorgan: Learning text-to-image generation by redescription. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1505–1514 Qiao T, Zhang J, Xu D, Tao D (2019) Mirrorgan: Learning text-to-image generation by redescription. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1505–1514
33.
Zurück zum Zitat Lee K-H, Chen X, Hua G, Hu H, He X (2018) Stacked cross attention for image-text matching. In: Proceedings of the European conference on computer vision, pp 201–216 Lee K-H, Chen X, Hua G, Hu H, He X (2018) Stacked cross attention for image-text matching. In: Proceedings of the European conference on computer vision, pp 201–216
34.
Zurück zum Zitat Cui H, Zhu L, Li J, Yang Y, Nie L (2020) Scalable deep hashing for large-scale social image retrieval. IEEE Trans Image Process 29:1271–1284MathSciNetCrossRefMATH Cui H, Zhu L, Li J, Yang Y, Nie L (2020) Scalable deep hashing for large-scale social image retrieval. IEEE Trans Image Process 29:1271–1284MathSciNetCrossRefMATH
35.
Zurück zum Zitat Zhu L, Lu X, Cheng Z, Li J, Zhang H (2020) Deep collaborative multi-view hashing for large-scale image search. IEEE Trans Image Process 29:4643–4655MathSciNetCrossRefMATH Zhu L, Lu X, Cheng Z, Li J, Zhang H (2020) Deep collaborative multi-view hashing for large-scale image search. IEEE Trans Image Process 29:4643–4655MathSciNetCrossRefMATH
36.
Zurück zum Zitat Li Z, Xie X, Ling F, Ma H, Shi Z (2021) Matching images and texts with multi-head attention network for cross-media hashing retrieval. Eng Appl Artif Intell 106:104475CrossRef Li Z, Xie X, Ling F, Ma H, Shi Z (2021) Matching images and texts with multi-head attention network for cross-media hashing retrieval. Eng Appl Artif Intell 106:104475CrossRef
37.
Zurück zum Zitat Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826 Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
38.
Zurück zum Zitat Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255 Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255
39.
Zurück zum Zitat He X, Deng L, Chou W (2008) Discriminative learning in sequential pattern recognition. IEEE Signal Process Mag 25(5):14–36CrossRef He X, Deng L, Chou W (2008) Discriminative learning in sequential pattern recognition. IEEE Signal Process Mag 25(5):14–36CrossRef
40.
Zurück zum Zitat Juang B-H, Hou W, Lee C-H (1997) Minimum classification error rate methods for speech recognition. IEEE Trans Speech Audio Process 5(3):257–265CrossRef Juang B-H, Hou W, Lee C-H (1997) Minimum classification error rate methods for speech recognition. IEEE Trans Speech Audio Process 5(3):257–265CrossRef
41.
Zurück zum Zitat Fang H, Gupta S, Iandola F, Srivastava RK, Deng L, Dollár P, Gao J, He X, Mitchell M, Platt JC, Zitnick CL, Zweig G (2015) From captions to visual concepts and back. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), pp 1473–1482 Fang H, Gupta S, Iandola F, Srivastava RK, Deng L, Dollár P, Gao J, He X, Mitchell M, Platt JC, Zitnick CL, Zweig G (2015) From captions to visual concepts and back. In: Proceedings of the IEEE Conference on computer vision and pattern recognition (CVPR), pp 1473–1482
42.
Zurück zum Zitat Huang P-S, He X, Gao J, Deng L, Acero A, Heck L (2013) Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM international conference on information & knowledge management, pp 2333–2338 Huang P-S, He X, Gao J, Deng L, Acero A, Heck L (2013) Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM international conference on information & knowledge management, pp 2333–2338
43.
Zurück zum Zitat Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164 Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
44.
Zurück zum Zitat Wei H, Li Z, Huang F, Zhang C, Ma H, Shi Z (2021) Integrating scene semantic knowledge into image captioning. ACM Trans Multimed Comput Commun Appl 17(2):1–22CrossRef Wei H, Li Z, Huang F, Zhang C, Ma H, Shi Z (2021) Integrating scene semantic knowledge into image captioning. ACM Trans Multimed Comput Commun Appl 17(2):1–22CrossRef
45.
Zurück zum Zitat Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141 Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
46.
Zurück zum Zitat Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K (2015) Sequence to sequence-video to text. In: Proceedings of the IEEE international conference on computer vision, pp 4534–4542 Venugopalan S, Rohrbach M, Donahue J, Mooney R, Darrell T, Saenko K (2015) Sequence to sequence-video to text. In: Proceedings of the IEEE international conference on computer vision, pp 4534–4542
47.
Zurück zum Zitat Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Proceedings of the European conference on computer vision, pp 740–755 Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Proceedings of the European conference on computer vision, pp 740–755
48.
Zurück zum Zitat Nilsback M-E, Zisserman A (2008) Automated flower classification over a large number of classes. In: Proceedings of the Sixth Indian conference on computer vision, graphics & image processing, pp 722–729 Nilsback M-E, Zisserman A (2008) Automated flower classification over a large number of classes. In: Proceedings of the Sixth Indian conference on computer vision, graphics & image processing, pp 722–729
49.
Zurück zum Zitat Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242 Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. In: Advances in neural information processing systems, pp 2234–2242
50.
Zurück zum Zitat Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. arXiv preprint arXiv:​1802.​05957
Metadaten
Titel
PBGN: Phased Bidirectional Generation Network in Text-to-Image Synthesis
verfasst von
Jianwei Zhu
Zhixin Li
Jiahui Wei
Huifang Ma
Publikationsdatum
14.05.2022
Verlag
Springer US
Erschienen in
Neural Processing Letters / Ausgabe 6/2022
Print ISSN: 1370-4621
Elektronische ISSN: 1573-773X
DOI
https://doi.org/10.1007/s11063-022-10866-x

Weitere Artikel der Ausgabe 6/2022

Neural Processing Letters 6/2022 Zur Ausgabe

Neuer Inhalt