Skip to main content

2019 | OriginalPaper | Buchkapitel

Context-Aware GANs for Image Generation from Multimodal Queries

verfasst von : Kenki Nakamura, Qiang Ma

Erschienen in: Database and Expert Systems Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we propose a novel model of context-aware generative adversarial networks (GANs) to generate images from a multimodal query: a pair of condition text and context image. In our study, context is defined as the objects and concepts that appear in the image but not in the text. We construct two object trees expressing the objects and the corresponding hierarchical relationships described in the input condition text and context image, respectively. We compare these two object trees to extract the context. Then, based on the extracted context, we generate parameters for the generator in context-aware GANs. To guarantee that the generated image is related to the multimodal query, i.e., both the condition text and context image, we also construct a context discriminator in addition to the condition discriminator, similar to that of conditional GANs. The experimental results reveal that the prepared model generates images with higher resolutions, containing more contextual information than previous models.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
For details of these relationships, please refer to  [2].
 
Literatur
1.
Zurück zum Zitat Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., Parikh, D.: VQA: visual question answering. ICCV 2015, 2425–2433 (2015) Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., Parikh, D.: VQA: visual question answering. ICCV 2015, 2425–2433 (2015)
2.
Zurück zum Zitat Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. EMNLP 2014, 740–750 (2014) Chen, D., Manning, C.: A fast and accurate dependency parser using neural networks. EMNLP 2014, 740–750 (2014)
3.
Zurück zum Zitat Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS 2014, pp. 2672–2680 (2014) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: NIPS 2014, pp. 2672–2680 (2014)
4.
Zurück zum Zitat Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRef Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRef
6.
Zurück zum Zitat Ma, Q.: Utilization and analysis of user generated contents toward personalized and distributed sightseeing. Syst. Control Inf. 63(1), 32–37 (2019) Ma, Q.: Utilization and analysis of user generated contents toward personalized and distributed sightseeing. Syst. Control Inf. 63(1), 32–37 (2019)
7.
Zurück zum Zitat Ma, Q.: Forefront of sightseeing informatics - technologies of collective intelligence for promotion of personalized and distributed sightseeing. Inf. Process. 58(3), 220–226 (2017) Ma, Q.: Forefront of sightseeing informatics - technologies of collective intelligence for promotion of personalized and distributed sightseeing. Inf. Process. 58(3), 220–226 (2017)
8.
Zurück zum Zitat Zhuang, C.Y., Ma, Q., Liang, X.F., Yoshikawa, M.: Discovering obscure sightseeing spots by analysis of geo-tagged social images. ASONAM 2015, 590–595 (2015)CrossRef Zhuang, C.Y., Ma, Q., Liang, X.F., Yoshikawa, M.: Discovering obscure sightseeing spots by analysis of geo-tagged social images. ASONAM 2015, 590–595 (2015)CrossRef
9.
Zurück zum Zitat Nakamura, K., Ma, Q.: Context-aware image generation by using generative adversarial networks. ISM 2017, 516–523 (2017) Nakamura, K., Ma, Q.: Context-aware image generation by using generative adversarial networks. ISM 2017, 516–523 (2017)
10.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. EMNLP 2014, 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. EMNLP 2014, 1532–1543 (2014)
12.
Zurück zum Zitat Reed, S.E., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. ICML 2016, 1060–1069 (2016) Reed, S.E., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. ICML 2016, 1060–1069 (2016)
13.
Zurück zum Zitat Teney, D., Liu, L., van den Hengel, A.: Graph-structured representations for visual question answering. CVPR 2017, 3233–3241 (2017) Teney, D., Liu, L., van den Hengel, A.: Graph-structured representations for visual question answering. CVPR 2017, 3233–3241 (2017)
14.
Zurück zum Zitat Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. NIPS 2016, 613–621 (2016) Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. NIPS 2016, 613–621 (2016)
15.
Zurück zum Zitat Zhang, H., Xu, T., Li, H.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. ICCV 2017, 5908–5916 (2017) Zhang, H., Xu, T., Li, H.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. ICCV 2017, 5908–5916 (2017)
Metadaten
Titel
Context-Aware GANs for Image Generation from Multimodal Queries
verfasst von
Kenki Nakamura
Qiang Ma
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-27615-7_33

Premium Partner