Skip to main content

2021 | OriginalPaper | Buchkapitel

Variational Topic Inference for Chest X-Ray Report Generation

verfasst von : Ivona Najdenkoska, Xiantong Zhen, Marcel Worring, Ling Shao

Erschienen in: Medical Image Computing and Computer Assisted Intervention – MICCAI 2021

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Automating report generation for medical imaging promises to reduce workload and assist diagnosis in clinical practice. Recent work has shown that deep learning models can successfully caption natural images. However, learning from medical data is challenging due to the diversity and uncertainty inherent in the reports written by different radiologists with discrepant expertise and experience. To tackle these challenges, we propose variational topic inference for automatic report generation. Specifically, we introduce a set of topics as latent variables to guide sentence generation by aligning image and language modalities in a latent space. The topics are inferred in a conditional variational inference framework, with each topic governing the generation of a sentence in the report. Further, we adopt a visual attention module that enables the model to attend to different locations in the image and generate more informative descriptions. We conduct extensive experiments on two benchmarks, namely Indiana U. Chest X-rays and MIMIC-CXR. The results demonstrate that our proposed variational topic inference method can generate novel reports rather than mere copies of reports used in training, while still achieving comparable performance to state-of-the-art methods in terms of standard language generation criteria.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
1.
Zurück zum Zitat Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: IEEE Conference on Computer Vision and Pattern Recognition (2018) Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: IEEE Conference on Computer Vision and Pattern Recognition (2018)
2.
Zurück zum Zitat Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A., Jozefowicz, R., Bengio, S.: Generating sentences from a continuous space. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 10–21 (2016) Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A., Jozefowicz, R., Bengio, S.: Generating sentences from a continuous space. In: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pp. 10–21 (2016)
3.
Zurück zum Zitat Chen, Z., Song, Y., Chang, T.H., Wan, X.: Generating radiology reports via memory-driven transformer. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (2020) Chen, Z., Song, Y., Chang, T.H., Wan, X.: Generating radiology reports via memory-driven transformer. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (2020)
4.
Zurück zum Zitat Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc.: JAMIA 23, 304–10 (2016)CrossRef Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc.: JAMIA 23, 304–10 (2016)CrossRef
5.
Zurück zum Zitat Fu, H., Li, C., Liu, X., Gao, J., Celikyilmaz, A., Carin, L.: Cyclical annealing schedule: a simple approach to mitigating KL vanishing. In: North American Chapter of the Association for Computational Linguistics, pp. 240–250 (2019) Fu, H., Li, C., Liu, X., Gao, J., Celikyilmaz, A., Carin, L.: Cyclical annealing schedule: a simple approach to mitigating KL vanishing. In: North American Chapter of the Association for Computational Linguistics, pp. 240–250 (2019)
6.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: IEEE International Conference on Computer Vision, pp. 1026–1034 (2015) He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)
7.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
8.
Zurück zum Zitat Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017) Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
9.
Zurück zum Zitat Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: 33rd AAAI Conference on Artificial Intelligence (2019) Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: 33rd AAAI Conference on Artificial Intelligence (2019)
10.
Zurück zum Zitat Jing, B., Wang, Z., Xing, E.: Show, describe and conclude: on exploiting the structure information of chest X-ray reports. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6570–6580. Association for Computational Linguistics, July 2019 Jing, B., Wang, Z., Xing, E.: Show, describe and conclude: on exploiting the structure information of chest X-ray reports. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6570–6580. Association for Computational Linguistics, July 2019
11.
Zurück zum Zitat Jing, B., Xie, P., Xing, E.: On the automatic generation of medical imaging reports. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (2018) Jing, B., Xie, P., Xing, E.: On the automatic generation of medical imaging reports. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (2018)
12.
Zurück zum Zitat Johnson, A.E., et al.: MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042 (2019) Johnson, A.E., et al.: MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:​1901.​07042 (2019)
13.
Zurück zum Zitat Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)
15.
17.
Zurück zum Zitat Li, Y., Liang, X., Hu, Z., Xing, E.P.: Hybrid retrieval-generation reinforced agent for medical image report generation. In: Advances in Neural Information Processing Systems, vol. 31 (2018) Li, Y., Liang, X., Hu, Z., Xing, E.P.: Hybrid retrieval-generation reinforced agent for medical image report generation. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
18.
Zurück zum Zitat Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. Association for Computational Linguistics (ACL) (2004) Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. Association for Computational Linguistics (ACL) (2004)
19.
Zurück zum Zitat Liu, G., et al.: Clinically accurate chest x-ray report generation. In: Machine Learning for Healthcare Conference, pp. 249–269 (2019) Liu, G., et al.: Clinically accurate chest x-ray report generation. In: Machine Learning for Healthcare Conference, pp. 249–269 (2019)
20.
Zurück zum Zitat Lovelace, J., Mortazavi, B.: Learning to generate clinically coherent chest X-ray reports. In: Findings of the Association for Computational Linguistics: EMNLP, pp. 1235–1243 (2020) Lovelace, J., Mortazavi, B.: Learning to generate clinically coherent chest X-ray reports. In: Findings of the Association for Computational Linguistics: EMNLP, pp. 1235–1243 (2020)
21.
Zurück zum Zitat Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 375–383 (2017) Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 375–383 (2017)
22.
Zurück zum Zitat Luo, R., Shakhnarovich, G.: Analysis of diversity-accuracy tradeoff in image captioning (2020) Luo, R., Shakhnarovich, G.: Analysis of diversity-accuracy tradeoff in image captioning (2020)
23.
Zurück zum Zitat Mahajan, S., Roth, S.: Diverse image captioning with context-object split latent spaces. In: Advances in Neural Information Processing Systems (NeurIPS) (2020) Mahajan, S., Roth, S.: Diverse image captioning with context-object split latent spaces. In: Advances in Neural Information Processing Systems (NeurIPS) (2020)
24.
Zurück zum Zitat Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Association for Computational Linguistics, pp. 311–318 (2002) Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Association for Computational Linguistics, pp. 311–318 (2002)
25.
Zurück zum Zitat Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision, pp. 618–626 (2017) Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: IEEE International Conference on Computer Vision, pp. 618–626 (2017)
26.
Zurück zum Zitat Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems (2015) Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems (2015)
27.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017) Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
28.
Zurück zum Zitat Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015) Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2015)
29.
Zurück zum Zitat Wang, W., et al.: Topic-guided variational auto-encoder for text generation. North American Chapter of the Association for Computational Linguistics (2019) Wang, W., et al.: Topic-guided variational auto-encoder for text generation. North American Chapter of the Association for Computational Linguistics (2019)
30.
Zurück zum Zitat Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015) Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)
34.
Zurück zum Zitat Zhang, Y., Chen, Q., Yang, Z., Lin, H., Lu, Z.: BioWordVec: improving biomedical word embeddings with subword information and MeSH ontology (2018) Zhang, Y., Chen, Q., Yang, Z., Lin, H., Lu, Z.: BioWordVec: improving biomedical word embeddings with subword information and MeSH ontology (2018)
Metadaten
Titel
Variational Topic Inference for Chest X-Ray Report Generation
verfasst von
Ivona Najdenkoska
Xiantong Zhen
Marcel Worring
Ling Shao
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-87199-4_59