Top

Published in:

2021 | OriginalPaper | Chapter

IQ-VQA: Intelligent Visual Question Answering

Authors : Vatsal Goel, Mohit Chandak, Ashish Anand, Prithwijit Guha

Published in: Pattern Recognition. ICPR International Workshops and Challenges

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Despite tremendous progress in the field of Visual Question Answering, models today still tend to be inconsistent and brittle. Thus, we propose a model-independent cyclic framework which increases consistency and robustness of any VQA architecture. We train our models to answer the original question, generate an implication based on the answer and then learn to answer the generated implication correctly. As part of the cyclic framework, we propose a novel implication generator which generates implied questions from any question-answer pair. As a baseline for future works on consistency, we provide a new human-annotated VQA-Implications dataset. The dataset consists of 30k implications of 3 types - Logical Equivalence, Necessary Condition and Mutual Exclusion - made from the VQA validation dataset. We show that our framework improves consistency of VQA models by

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-68790-8_28/MediaObjects/510911_1_En_28_Figa_HTML.gif

on the rule-based dataset,

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-68790-8_28/MediaObjects/510911_1_En_28_Figb_HTML.gif

on VQA-Implications dataset and robustness by

https://static-content.springer.com/image/chp%3A10.1007%2F978-3-030-68790-8_28/MediaObjects/510911_1_En_28_Figc_HTML.gif

, without degrading their performance.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Recent Advances in Video Question Answering: A Review of Datasets and Methods

next chapter Deep Sea Robotic Imaging Simulator

Available only for authorised users

Agrawal, A., Batra, D., Parikh, D., Kembhavi, A.: Don’t just assume; look and answer: overcoming priors for visual question answering. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4971–4980 (2018)

Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

Antol, S., et al.: VQA: visual question answering. In: International Conference on Computer Vision (ICCV) (2015)

Denkowski, M., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: Proceedings of the EACL 2014 Workshop on Statistical Machine Translation (2014)

Gokhale, T., Banerjee, P., Baral, C., Yang, Y.: VQA-LOL: visual question answering under the lens of logic. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 379–396. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_23CrossRef

Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., Parikh, D.: Making the V in VQA matter: elevating the role of image understanding in visual question answering. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

Kim, J.H., Jun, J., Zhang, B.T.: Bilinear attention networks. Adv. Neural Inf. Process. Syst. 31, 1571–1581 (2018)

Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vision 123(1), 32–73 (2017)MathSciNetCrossRef

Kumar, V., Ramakrishnan, G., Li, Y.F.: Putting the horse before the cart: a generator-evaluator framework for question generation from text. In: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pp. 812–821 (2019)

10.

Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite bert for self-supervised learning of language representations. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=H1eA7AEtvS

11.

Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81. Association for Computational Linguistics, Barcelona, July 2004. https://www.aclweb.org/anthology/W04-1013

12.

Manjunatha, V., Saini, N., Davis, L.S.: Explicit bias discovery in visual question answering models. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

13.

Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, pp. 311–318. Association for Computational Linguistics (2002). https://doi.org/10.3115/1073083.1073135

14.

Ray, A., Sikka, K., Divakaran, A., Lee, S., Burachas, G.: Sunny and dark outside?! Improving answer consistency in VQA through entailed question generation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5863–5868 (2019)

15.

Reddy, S., Raghu, D., Khapra, M.M., Joshi, S.: Generating natural language question-answer pairs from a knowledge graph using a RNN based question generation model. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 376–385. Association for Computational Linguistics, Valencia, April 2017. https://www.aclweb.org/anthology/E17-1036

16.

Ribeiro, M.T., Guestrin, C., Singh, S.: Are red roses red? Evaluating consistency of question-answering models. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6174–6184. Association for Computational Linguistics, Florence, July 2019. https://doi.org/10.18653/v1/P19-1621. https://www.aclweb.org/anthology/P19-1621

17.

Selvaraju, R.R., et al.: SQuINTing at VQA models: introspecting VQA models with sub-questions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10003–10011 (2020)

18.

Shah, M., Chen, X., Rohrbach, M., Parikh, D.: Cycle-consistency for robust visual question answering. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

19.

Sundaram, N., Brox, T., Keutzer, K.: Dense point trajectories by GPU-accelerated large displacement optical flow. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 438–451. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_32CrossRef

20.

Tan, H., Bansal, M.: LXMERT: learning cross-modality encoder representations from transformers. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (2019)

21.

Tang, D., et al.: Learning to collaborate for question answering and asking. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1564–1574. Association for Computational Linguistics, New Orleans, June 2018. https://doi.org/10.18653/v1/N18-1141. https://www.aclweb.org/anthology/N18-1141

22.

Vedantam, R., Lawrence Zitnick, C., Parikh, D.: CIDEr: consensus-based image description evaluation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015

23.

Yan, Y., et al.: ProphetNet: predicting future N-gram for sequence-to-sequence pre-training (2020)

24.

Jiang, Y., Natarajan, V., Chen, X., Rohrbach, M., Batra, D., Parikh, D.: Pythia v0.1: the winning entry to the VQA challenge 2018. arXiv preprint arXiv:1807.09956 (2018)

25.

Zhang, P., Goyal, Y., Summers-Stay, D., Batra, D., Parikh, D.: Yin and Yang: balancing and answering binary visual questions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5014–5022 (2016)

Title: IQ-VQA: Intelligent Visual Question Answering
Authors: Vatsal Goel
Mohit Chandak
Ashish Anand
Prithwijit Guha
Publisher: Springer International Publishing
Book: Pattern Recognition. ICPR International Workshops and Challenges
Print ISBN: 978-3-030-68789-2

Electronic ISBN: 978-3-030-68790-8

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-68790-8_28

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner