nach oben

Erschienen in:

2020 | OriginalPaper | Buchkapitel

AQuA: ASP-Based Visual Question Answering

verfasst von : Kinjal Basu, Farhad Shakerin, Gopal Gupta

Erschienen in: Practical Aspects of Declarative Languages

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

AQuA (ASP-based Question Answering) is an Answer Set Programming (ASP) based visual question answering framework that truly “understands” an input picture and answers natural language questions about that picture. The knowledge contained in the picture is extracted using YOLO, a neural network-based object detection technique, and represented as an answer set program. Natural language processing is performed on the question to transform it into an ASP query. Semantic relations are extracted in the process for deeper understanding and to answer more complex questions. The resulting knowledge-base—with additional commonsense knowledge imported—can be used to perform reasoning using an ASP system, allowing it to answer questions about the picture, just like a human. This framework achieves 93.7% accuracy on CLEVR dataset, which exceeds human baseline performance. What is significant is that AQuA translates a question into an ASP query without requiring any training. Our framework for Visual Question Answering is quite general and closely simulates the way humans operate. In contrast to existing purely machine learning-based methods, our framework provides an explanation for the answer it computes, while maintaining high accuracy.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel On Repairing Web Services Workflows

Nächstes Kapitel Diagnosing Data Pipeline Failures Using Action Languages: A Progress Report

Cao, Q., Liang, X., Li, B., Li, G., Lin, L.: Visual question reasoning on general dependency tree. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7249–7257 (2018)

Davidson, D.: Inquiries into Truth and Interpretation: Philosophical Essays, vol. 2. Oxford University Press, Oxford (2001)CrossRef

Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., Xu, W.: Are you talking to a machine? Dataset and methods for multilingual image question. In: NIPS 2015, pp. 2296–2304 (2015)

Gelfond, M., Kahl, Y.: Knowledge Representation, Reasoning, and the Design of Intelligent Agents: The Answer-Set Programming Approach. Cambridge University Press, Cambridge (2014)CrossRef

Gelfond, M., Lifschitz, V.: The stable model semantics for logic programming. In: ICLP/SLP, vol. 88, pp. 1070–1080 (1988)

Honnibal, M., Montani, I.: spaCy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. 7 (2017, to appear)

Hu, R., Andreas, J., Rohrbach, M., Darrell, T., Saenko, K.: Learning to reason: end-to-end module networks for visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 804–813 (2017)

Hudson, D.A., Manning, C.D.: Compositional attention networks for machine reasoning. arXiv preprint arXiv:1803.03067 (2018)

Johnson, J., Hariharan, B., van der Maaten, L., Fei-Fei, L., Lawrence Zitnick, C., Girshick, R.: CLEVR: a diagnostic dataset for compositional language and elementary visual reasoning. In: IEEE CVPR 2017, pp. 2901–2910 (2017)

10.

Johnson, J., et al.: Inferring and executing programs for visual reasoning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2989–2998 (2017)

11.

Joshi, V., Peters, M., Hopkins, M.: Extending a parser to distant domains using a few dozen partially annotated examples. arXiv preprint arXiv:1805.06556 (2018)

12.

Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017)MathSciNetCrossRef

13.

Malinowski, M., Fritz, M.: A multi-world approach to question answering about real-world scenes based on uncertain input. In: NIPS 2014, pp. 1682–1690 (2014)

14.

Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL System Demonstrations, pp. 55–60 (2014)

15.

Marple, K., Salazar, E., Gupta, G.: Computing stable models of normal logic programs without grounding. arXiv:1709.00501 (2017)

16.

Mascharka, D., Tran, P., Soklaski, R., Majumdar, A.: Transparency by design: closing the gap between performance and interpretability in visual reasoning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4942–4950 (2018)

17.

Miller, G.A.: WordNet: a lexical database for english. Commun. ACM 38(11), 39–41 (1995). https://doi.org/10.1145/219717.219748CrossRef

18.

Pendharkar, D., Gupta, G.: An ASP based approach to answering questions for natural language text. In: Alferes, J.J., Johansson, M. (eds.) PADL 2019. LNCS, vol. 11372, pp. 46–63. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05998-9_4CrossRef

19.

Perez, E., et al.: FiLM: visual reasoning with a general conditioning layer. In: AAAI (2018)

20.

Redmon, J., Divvala, S.K., Girshick, R.B., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR, pp. 779–788. IEEE Computer Society (2016)

21.

Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

22.

Ren, M., Kiros, R., Zemel, R.: Exploring models and data for image question answering. In: NIPS 2015, pp. 2953–2961 (2015)

23.

Santor, A., et al.: A simple neural network module for relational reasoning. In: NIPS 2017, pp. 4967–4976 (2017)

24.

Schuster, S., Manning, C.D.: Enhanced English universal dependencies: an improved representation for natural language understanding tasks. In: LRED 2016, pp. 2371–2378 (2016)

25.

Shah, S., Mishra, A., Yadati, N., Talukdar, P.P.: KVQA: knowledge-aware visual question answering. In: AAAI (2019)

26.

Shakerin, F., Salazar, E., Gupta, G.: A new algorithm to automate inductive learning of default theories. TPLP 17(5–6), 1010–1026 (2017)MathSciNetMATH

27.

Shrestha, R., Kafle, K., Kanan, C.: Answer them all! Toward universal visual question answering models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10472–10481 (2019)

28.

Speer, R., Chin, J., Havasi, C.: ConceptNet 5.5: an open multilingual graph of general knowledge. In: Proceedings AAAI, pp. 4444–4451 (2017)

29.

Suarez, J., Johnson, J., Li, F.F.: DDRprog: a CLEVR differentiable dynamic reasoning programmer. arXiv preprint arXiv:1803.11361 (2018)

30.

Wu, Q., Teney, D., Wang, P., Shen, C., Dick, A., van den Hengel, A.: Visual question answering: a survey of methods and datasets. Comput. Vis. Image Underst. 163, 21–40 (2017)CrossRef

31.

Yang, Z., He, X., Gao, J., Deng, L., Smola, A.J.: Stacked attention networks for image question answering. CVPR, pp. 21–29 (2015)

32.

Yi, K., et al.: Neural-symbolic VQA: disentangling reasoning from vision and language understanding. In: NIPS 2018, pp. 1031–1042 (2018)

33.

Yu, L., Park, E., Berg, A.C., Berg, T.L.: Visual madlibs: fill in the blank image generation and question answering. arXiv preprint arXiv:1506.00278 (2015)

Titel: AQuA: ASP-Based Visual Question Answering
verfasst von: Kinjal Basu
Farhad Shakerin
Gopal Gupta
Verlag: Springer International Publishing
Buch: Practical Aspects of Declarative Languages
Print ISBN: 978-3-030-39196-6

Electronic ISBN: 978-3-030-39197-3

Copyright-Jahr: 2020
DOI: https://doi.org/10.1007/978-3-030-39197-3_4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner