Skip to main content
Top

2019 | OriginalPaper | Chapter

Context-Aware GANs for Image Generation from Multimodal Queries

Authors : Kenki Nakamura, Qiang Ma

Published in: Database and Expert Systems Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, we propose a novel model of context-aware generative adversarial networks (GANs) to generate images from a multimodal query: a pair of condition text and context image. In our study, context is defined as the objects and concepts that appear in the image but not in the text. We construct two object trees expressing the objects and the corresponding hierarchical relationships described in the input condition text and context image, respectively. We compare these two object trees to extract the context. Then, based on the extracted context, we generate parameters for the generator in context-aware GANs. To guarantee that the generated image is related to the multimodal query, i.e., both the condition text and context image, we also construct a context discriminator in addition to the condition discriminator, similar to that of conditional GANs. The experimental results reveal that the prepared model generates images with higher resolutions, containing more contextual information than previous models.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
For details of these relationships, please refer to  [2].
 
Literature
1.
go back to reference Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., Parikh, D.: VQA: visual question answering. ICCV 2015, 2425–2433 (2015) Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., Parikh, D.: VQA: visual question answering. ICCV 2015, 2425–2433 (2015)
2.
go back to reference Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. EMNLP 2014, 740–750 (2014) Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. EMNLP 2014, 740–750 (2014)
3.
go back to reference Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS 2014, pp. 2672–2680 (2014) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS 2014, pp. 2672–2680 (2014)
4.
go back to reference Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRef Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRef
6.
go back to reference Ma, Q.: Utilization and analysis of user generated contents toward personalized and distributed sightseeing. Syst. Control Inf. 63(1), 32–37 (2019) Ma, Q.: Utilization and analysis of user generated contents toward personalized and distributed sightseeing. Syst. Control Inf. 63(1), 32–37 (2019)
7.
go back to reference Ma, Q.: Forefront of sightseeing informatics - technologies of collective intelligence for promotion of personalized and distributed sightseeing. Inf. Process. 58(3), 220–226 (2017) Ma, Q.: Forefront of sightseeing informatics - technologies of collective intelligence for promotion of personalized and distributed sightseeing. Inf. Process. 58(3), 220–226 (2017)
8.
go back to reference Zhuang, C.Y., Ma, Q., Liang, X.F., Yoshikawa, M.: Discovering obscure sightseeing spots by analysis of geo-tagged social images. ASONAM 2015, 590–595 (2015)CrossRef Zhuang, C.Y., Ma, Q., Liang, X.F., Yoshikawa, M.: Discovering obscure sightseeing spots by analysis of geo-tagged social images. ASONAM 2015, 590–595 (2015)CrossRef
9.
go back to reference Nakamura, K., Ma, Q.: Context-aware image generation by using generative adversarial networks. ISM 2017, 516–523 (2017) Nakamura, K., Ma, Q.: Context-aware image generation by using generative adversarial networks. ISM 2017, 516–523 (2017)
10.
go back to reference Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. EMNLP 2014, 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. EMNLP 2014, 1532–1543 (2014)
12.
go back to reference Reed, S.E., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. ICML 2016, 1060–1069 (2016) Reed, S.E., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. ICML 2016, 1060–1069 (2016)
13.
go back to reference Teney, D., Liu, L., van den Hengel, A.: Graph-structured representations for visual question answering. CVPR 2017, 3233–3241 (2017) Teney, D., Liu, L., van den Hengel, A.: Graph-structured representations for visual question answering. CVPR 2017, 3233–3241 (2017)
14.
go back to reference Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. NIPS 2016, 613–621 (2016) Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. NIPS 2016, 613–621 (2016)
15.
go back to reference Zhang, H., Xu, T., Li, H.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. ICCV 2017, 5908–5916 (2017) Zhang, H., Xu, T., Li, H.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. ICCV 2017, 5908–5916 (2017)
Metadata
Title
Context-Aware GANs for Image Generation from Multimodal Queries
Authors
Kenki Nakamura
Qiang Ma
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-27615-7_33

Premium Partner