Skip to main content

2021 | OriginalPaper | Buchkapitel

Zero-Shot Visual Question Answering Using Knowledge Graph

verfasst von : Zhuo Chen, Jiaoyan Chen, Yuxia Geng, Jeff Z. Pan, Zonggang Yuan, Huajun Chen

Erschienen in: The Semantic Web – ISWC 2021

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Incorporating external knowledge to Visual Question Answering (VQA) has become a vital practical need. Existing methods mostly adopt pipeline approaches with different components for knowledge matching and extraction, feature learning, etc. However, such pipeline approaches suffer when some component does not perform well, which leads to error cascading and poor overall performance. Furthermore, the majority of existing approaches ignore the answer bias issue—many answers may have never appeared during training (i.e., unseen answers) in real-word application. To bridge these gaps, in this paper, we propose a Zero-shot VQA algorithm using knowledge graph and a mask-based learning mechanism for better incorporating external knowledge, and present new answer-based Zero-shot VQA splits for the F-VQA dataset. Experiments show that our method can achieve state-of-the-art performance in Zero-shot VQA with unseen answers, meanwhile dramatically augment existing end-to-end models on the normal F-VQA task.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agrawal, A., Kembhavi, A., Batra, D., Parikh, D.: C-VQA: a compositional split of the visual question answering (VQA) v1.0 dataset. CoRR arXiv:1704.08243 (2017) Agrawal, A., Kembhavi, A., Batra, D., Parikh, D.: C-VQA: a compositional split of the visual question answering (VQA) v1.0 dataset. CoRR arXiv:​1704.​08243 (2017)
2.
Zurück zum Zitat Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR, pp. 6077–6086 (2018) Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: CVPR, pp. 6077–6086 (2018)
3.
Zurück zum Zitat Antol, S., et al.: VQA: visual question answering. In: ICCV, pp. 2425–2433 (2015) Antol, S., et al.: VQA: visual question answering. In: ICCV, pp. 2425–2433 (2015)
4.
Zurück zum Zitat Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 2787–2795 (2013) Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NIPS, pp. 2787–2795 (2013)
5.
Zurück zum Zitat Chen, J., Geng, Y., Chen, Z., Horrocks, I., Pan, J.Z., Chen, H.: Knowledge-aware zero-shot learning: survey and perspective. In: IJCAI Survey Track (2021) Chen, J., Geng, Y., Chen, Z., Horrocks, I., Pan, J.Z., Chen, H.: Knowledge-aware zero-shot learning: survey and perspective. In: IJCAI Survey Track (2021)
6.
Zurück zum Zitat Chen, L., Gan, Z., Cheng, Y., Li, L., Carin, L., Liu, J.: Graph optimal transport for cross-domain alignment. In: ICML, vol. 119, pp. 1542–1553 (2020) Chen, L., Gan, Z., Cheng, Y., Li, L., Carin, L., Liu, J.: Graph optimal transport for cross-domain alignment. In: ICML, vol. 119, pp. 1542–1553 (2020)
7.
Zurück zum Zitat Chen, W., Zha, H., Chen, Z., Xiong, W., Wang, H., Wang, W.Y.: HybridQA: a dataset of multi-hop question answering over tabular and textual data. In: EMNLP, pp. 1026–1036 (2020) Chen, W., Zha, H., Chen, Z., Xiong, W., Wang, H., Wang, W.Y.: HybridQA: a dataset of multi-hop question answering over tabular and textual data. In: EMNLP, pp. 1026–1036 (2020)
8.
Zurück zum Zitat Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL, pp. 4171–4186 (2019) Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL, pp. 4171–4186 (2019)
9.
Zurück zum Zitat Farazi, M.R., Khan, S.H., Barnes, N.: From known to the unknown: transferring knowledge to answer questions about novel visual and semantic concepts. Image Vis. Comput. 103, 103985 (2020)CrossRef Farazi, M.R., Khan, S.H., Barnes, N.: From known to the unknown: transferring knowledge to answer questions about novel visual and semantic concepts. Image Vis. Comput. 103, 103985 (2020)CrossRef
10.
Zurück zum Zitat Geng, Y., et al.: OntoZSL: ontology-enhanced zero-shot learning. In: WWW, pp. 3325–3336 (2021) Geng, Y., et al.: OntoZSL: ontology-enhanced zero-shot learning. In: WWW, pp. 3325–3336 (2021)
11.
Zurück zum Zitat Geng, Y., Chen, J., Chen, Z., Pan, J.Z., Yuan, Z., Chen, H.: K-ZSL: resources for knowledge-driven zero-shot learning. CoRR arXiv:2106.15047 (2021) Geng, Y., Chen, J., Chen, Z., Pan, J.Z., Yuan, Z., Chen, H.: K-ZSL: resources for knowledge-driven zero-shot learning. CoRR arXiv:​2106.​15047 (2021)
12.
Zurück zum Zitat Hu, H., Chao, W., Sha, F.: Learning answer embeddings for visual question answering. In: CVPR, pp. 5428–5436 (2018) Hu, H., Chao, W., Sha, F.: Learning answer embeddings for visual question answering. In: CVPR, pp. 5428–5436 (2018)
13.
Zurück zum Zitat Kim, J., Jun, J., Zhang, B.: Bilinear attention networks. In: NeurIPS, pp. 1571–1581 (2018) Kim, J., Jun, J., Zhang, B.: Bilinear attention networks. In: NeurIPS, pp. 1571–1581 (2018)
14.
Zurück zum Zitat Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: NIPS, pp. 289–297 (2016) Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: NIPS, pp. 289–297 (2016)
15.
Zurück zum Zitat Malaviya, C., Bhagavatula, C., Bosselut, A., Choi, Y.: Commonsense knowledge base completion with structural and semantic context. In: AAAI, pp. 2925–2933 (2020) Malaviya, C., Bhagavatula, C., Bosselut, A., Choi, Y.: Commonsense knowledge base completion with structural and semantic context. In: AAAI, pp. 2925–2933 (2020)
16.
Zurück zum Zitat Marino, K., Rastegari, M., Farhadi, A., Mottaghi, R.: OK-VQA: A visual question answering benchmark requiring external knowledge. In: CVPR, pp. 3195–3204 (2019) Marino, K., Rastegari, M., Farhadi, A., Mottaghi, R.: OK-VQA: A visual question answering benchmark requiring external knowledge. In: CVPR, pp. 3195–3204 (2019)
17.
Zurück zum Zitat Narasimhan, M., Lazebnik, S., Schwing, A.G.: Out of the box: reasoning with graph convolution nets for factual visual question answering. In: NeurIPS, pp. 2659–2670 (2018) Narasimhan, M., Lazebnik, S., Schwing, A.G.: Out of the box: reasoning with graph convolution nets for factual visual question answering. In: NeurIPS, pp. 2659–2670 (2018)
21.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
22.
Zurück zum Zitat Ramakrishnan, S.K., Pal, A., Sharma, G., Mittal, A.: An empirical evaluation of visual question answering for novel objects. In: CVPR, pp. 7312–7321 (2017) Ramakrishnan, S.K., Pal, A., Sharma, G., Mittal, A.: An empirical evaluation of visual question answering for novel objects. In: CVPR, pp. 7312–7321 (2017)
23.
Zurück zum Zitat Shah, S., Mishra, A., Yadati, N., Talukdar, P.P.: KVQA: knowledge-aware visual question answering. In: AAAI, pp. 8876–8884 (2019) Shah, S., Mishra, A., Yadati, N., Talukdar, P.P.: KVQA: knowledge-aware visual question answering. In: AAAI, pp. 8876–8884 (2019)
24.
Zurück zum Zitat Shevchenko, V., Teney, D., Dick, A.R., van den Hengel, A.: Visual question answering with prior class semantics. CoRR arXiv:2005.01239 (2020) Shevchenko, V., Teney, D., Dick, A.R., van den Hengel, A.: Visual question answering with prior class semantics. CoRR arXiv:​2005.​01239 (2020)
26.
Zurück zum Zitat Wang, P., Wu, Q., Shen, C., Dick, A.R., van den Hengel, A.: Explicit knowledge-based reasoning for visual question answering. In: IJCAI, pp. 1290–1296 (2017) Wang, P., Wu, Q., Shen, C., Dick, A.R., van den Hengel, A.: Explicit knowledge-based reasoning for visual question answering. In: IJCAI, pp. 1290–1296 (2017)
27.
Zurück zum Zitat Wang, P., Wu, Q., Shen, C., Dick, A.R., van den Hengel, A.: FVQA: fact-based visual question answering. IEEE TPAMI 40(10), 2413–2427 (2018)CrossRef Wang, P., Wu, Q., Shen, C., Dick, A.R., van den Hengel, A.: FVQA: fact-based visual question answering. IEEE TPAMI 40(10), 2413–2427 (2018)CrossRef
28.
Zurück zum Zitat Wu, Q., Wang, P., Shen, C., Dick, A.R., van den Hengel, A.: Ask me anything: free-form visual question answering based on knowledge from external sources. In: CVPR, pp. 4622–4630 (2016) Wu, Q., Wang, P., Shen, C., Dick, A.R., van den Hengel, A.: Ask me anything: free-form visual question answering based on knowledge from external sources. In: CVPR, pp. 4622–4630 (2016)
29.
Zurück zum Zitat Yang, Z., He, X., Gao, J., Deng, L., Smola, A.J.: Stacked attention networks for image question answering. In: CVPR, pp. 21–29 (2016) Yang, Z., He, X., Gao, J., Deng, L., Smola, A.J.: Stacked attention networks for image question answering. In: CVPR, pp. 21–29 (2016)
30.
Zurück zum Zitat Ye, Z., Chen, Q., Wang, W., Ling, Z.: Align, mask and select: a simple method for incorporating commonsense knowledge into language representation models. CoRR arXiv:1908.06725 (2019) Ye, Z., Chen, Q., Wang, W., Ling, Z.: Align, mask and select: a simple method for incorporating commonsense knowledge into language representation models. CoRR arXiv:​1908.​06725 (2019)
31.
Zurück zum Zitat Zhu, Z., Yu, J., Wang, Y., Sun, Y., Hu, Y., Wu, Q.: Mucko: multi-layer cross-modal knowledge reasoning for fact-based visual question answering. In: IJCAI, pp. 1097–1103 (2020) Zhu, Z., Yu, J., Wang, Y., Sun, Y., Hu, Y., Wu, Q.: Mucko: multi-layer cross-modal knowledge reasoning for fact-based visual question answering. In: IJCAI, pp. 1097–1103 (2020)
Metadaten
Titel
Zero-Shot Visual Question Answering Using Knowledge Graph
verfasst von
Zhuo Chen
Jiaoyan Chen
Yuxia Geng
Jeff Z. Pan
Zonggang Yuan
Huajun Chen
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-88361-4_9

Premium Partner