Skip to main content
Top

2019 | OriginalPaper | Chapter

An Effective Deep Transfer Learning and Information Fusion Framework for Medical Visual Question Answering

Authors : Feifan Liu, Yalei Peng, Max P. Rosen

Published in: Experimental IR Meets Multilinguality, Multimodality, and Interaction

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Medical visual question answering (Med-VQA) is very important for better clinical decision support and enhanced patient engagement in patient-centered medical care. Compared with open domain VQA tasks, VQA in medical domain becomes more challenging due to limited training resources as well as unique characteristics on medical images and domain vocabularies. In this paper, we propose and develop a novel deep transfer learning model, ETM-Trans, which exploits embedding topic modeling (ETM) on textual questions to derive topic labels to pair with associated medical images for finetuning the pre-trained ImageNet model. We also explore and implement a co-attention mechanism where residual networks is used to extract visual features from image interacting with the long-short term memory (LSTM) based question representation providing fine-grained contextual information for answer derivation. To efficiently integrate visual features from the image and textual features from the question, we employ Multimodal Factorized Bilinear (MFB) pooling as well as Multimodal Factorized High-order (MFH) pooling. The ETM-Trans model won the international Med-VQA 2018 challenge, achieving the best WBSS score of 0.186.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. arXiv:1602.07332 [cs] (2016) Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. arXiv:​1602.​07332 [cs] (2016)
5.
go back to reference Hasan, S.A., Ling, Y., Farri, O., Liu, J., Lungren, M., Müller, H.: Overview of the ImageCLEF 2018 medical domain visual question answering task. In: CLEF2018 Working Notes, Avignon, France (2018). http://ceur-ws.org/ Hasan, S.A., Ling, Y., Farri, O., Liu, J., Lungren, M., Müller, H.: Overview of the ImageCLEF 2018 medical domain visual question answering task. In: CLEF2018 Working Notes, Avignon, France (2018). http://​ceur-ws.​org/​
6.
go back to reference Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv:1506.01497 [cs] (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv:​1506.​01497 [cs] (2015)
7.
go back to reference Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning (2008) Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning (2008)
8.
9.
go back to reference Kim, J.-H., On, K.-W., Lim, W., Kim, J., Ha, J.-W., Zhang, B.-T.: Hadamard product for low-rank bilinear pooling. arXiv:1610.04325 [cs] (2016) Kim, J.-H., On, K.-W., Lim, W., Kim, J., Ha, J.-W., Zhang, B.-T.: Hadamard product for low-rank bilinear pooling. arXiv:​1610.​04325 [cs] (2016)
10.
go back to reference Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012) Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
11.
12.
go back to reference Yu, Z., Yu, J., Fan, J., Tao, D.: Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. arXiv:1708.01471 [cs] (2017) Yu, Z., Yu, J., Fan, J., Tao, D.: Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. arXiv:​1708.​01471 [cs] (2017)
13.
go back to reference Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 457–468. Association for Computational Linguistics, Austin (2016) Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M.: Multimodal compact bilinear pooling for visual question answering and visual grounding. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 457–468. Association for Computational Linguistics, Austin (2016)
15.
Metadata
Title
An Effective Deep Transfer Learning and Information Fusion Framework for Medical Visual Question Answering
Authors
Feifan Liu
Yalei Peng
Max P. Rosen
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-28577-7_20

Premium Partner