Skip to main content

2021 | OriginalPaper | Buchkapitel

AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

verfasst von : Di You, Fenglin Liu, Shen Ge, Xiaoxia Xie, Jing Zhang, Xian Wu

Erschienen in: Medical Image Computing and Computer Assisted Intervention – MICCAI 2021

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Recently, medical report generation, which aims to automatically generate a long and coherent descriptive paragraph of a given medical image, has received growing research interests. Different from the general image captioning tasks, medical report generation is more challenging for data-driven neural models. This is mainly due to 1) the serious data bias: the normal visual regions dominate the dataset over the abnormal visual regions, and 2) the very long sequence. To alleviate above two problems, we propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules: 1) AHA module first predicts the disease tags from the input image and then learns the multi-grained visual features by hierarchically aligning the visual regions and disease tags. The acquired disease-grounded visual features can better represent the abnormal regions of the input image, which could alleviate data bias problem; 2) MGT module effectively uses the multi-grained features and Transformer framework to generate the long medical report. The experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets. Moreover, the human evaluation conducted by professional radiologists further proves the effectiveness of our approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Anderson, P., et al.: Bottom-up and top-down attention for image captioning and VQA. In: CVPR (2018) Anderson, P., et al.: Bottom-up and top-down attention for image captioning and VQA. In: CVPR (2018)
3.
Zurück zum Zitat Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: IEEvaluation@ACL (2005) Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: IEEvaluation@ACL (2005)
4.
Zurück zum Zitat Biswal, S., Xiao, C., Glass, L., Westover, M.B., Sun, J.: CLARA: clinical report auto-completion. In: WWW (2020) Biswal, S., Xiao, C., Glass, L., Westover, M.B., Sun, J.: CLARA: clinical report auto-completion. In: WWW (2020)
5.
Zurück zum Zitat Brady, A., Laoide, R.Ó., Mccarthy, P., Mcdermott, R.: Discrepancy and error in radiology: concepts, causes and consequences. Ulster Med. J. 81, 3–9 (2012) Brady, A., Laoide, R.Ó., Mccarthy, P., Mcdermott, R.: Discrepancy and error in radiology: concepts, causes and consequences. Ulster Med. J. 81, 3–9 (2012)
6.
7.
Zurück zum Zitat Chen, Z., Song, Y., Chang, T., Wan, X.: Generating radiology reports via memory-driven transformer. In: EMNLP (2020) Chen, Z., Song, Y., Chang, T., Wan, X.: Generating radiology reports via memory-driven transformer. In: EMNLP (2020)
8.
Zurück zum Zitat Cornia, M., Stefanini, M., Baraldi, L., Cucchiara, R.: Meshed-memory transformer for image captioning. In: CVPR (2020) Cornia, M., Stefanini, M., Baraldi, L., Cucchiara, R.: Meshed-memory transformer for image captioning. In: CVPR (2020)
9.
Zurück zum Zitat Delrue, L., Gosselin, R., Ilsen, B., Van Landeghem, A., de Mey, J., Duyck, P.: Difficulties in the interpretation of chest radiography. In: Coche, E., Ghaye, B., de Mey, J., Duyck, P. (eds.) Comparative Interpretation of CT and Standard Radiography of the Chest. Medical Radiology, pp. 27–49. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-540-79942-9_2CrossRef Delrue, L., Gosselin, R., Ilsen, B., Van Landeghem, A., de Mey, J., Duyck, P.: Difficulties in the interpretation of chest radiography. In: Coche, E., Ghaye, B., de Mey, J., Duyck, P. (eds.) Comparative Interpretation of CT and Standard Radiography of the Chest. Medical Radiology, pp. 27–49. Springer, Heidelberg (2011). https://​doi.​org/​10.​1007/​978-3-540-79942-9_​2CrossRef
10.
Zurück zum Zitat Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. 23(2), 304–310 (2016)CrossRef Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. 23(2), 304–310 (2016)CrossRef
11.
Zurück zum Zitat Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009) Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
12.
Zurück zum Zitat Goergen, S.K., et al.: Evidence-based guideline for the written radiology report: methods, recommendations and implementation challenges. J. Med. Imaging Radiat. Oncol. 57(1), 1–7 (2013)CrossRef Goergen, S.K., et al.: Evidence-based guideline for the written radiology report: methods, recommendations and implementation challenges. J. Med. Imaging Radiat. Oncol. 57(1), 1–7 (2013)CrossRef
13.
Zurück zum Zitat He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016) He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
14.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
15.
Zurück zum Zitat Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: AAAI (2019) Irvin, J., et al.: CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In: AAAI (2019)
16.
Zurück zum Zitat Jing, B., Wang, Z., Xing, E.P.: Show, describe and conclude: on exploiting the structure information of chest X-ray reports. In: ACL (2019) Jing, B., Wang, Z., Xing, E.P.: Show, describe and conclude: on exploiting the structure information of chest X-ray reports. In: ACL (2019)
17.
Zurück zum Zitat Jing, B., Xie, P., Xing, E.P.: On the automatic generation of medical imaging reports. In: ACL (2018) Jing, B., Xie, P., Xing, E.P.: On the automatic generation of medical imaging reports. In: ACL (2018)
18.
Zurück zum Zitat Johnson, A.E.W., et al.: MIMIC-CXR: a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:1901.07042 (2019) Johnson, A.E.W., et al.: MIMIC-CXR: a large publicly available database of labeled chest radiographs. arXiv preprint arXiv:​1901.​07042 (2019)
19.
Zurück zum Zitat Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2014) Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2014)
20.
Zurück zum Zitat Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A hierarchical approach for generating descriptive image paragraphs. In: CVPR (2017) Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A hierarchical approach for generating descriptive image paragraphs. In: CVPR (2017)
21.
Zurück zum Zitat Li, C.Y., Liang, X., Hu, Z., Xing, E.P.: Knowledge-driven encode, retrieve, paraphrase for medical image report generation. In: AAAI (2019) Li, C.Y., Liang, X., Hu, Z., Xing, E.P.: Knowledge-driven encode, retrieve, paraphrase for medical image report generation. In: AAAI (2019)
22.
Zurück zum Zitat Li, Y., Liang, X., Hu, Z., Xing, E.P.: Hybrid retrieval-generation reinforced agent for medical image report generation. In: NeurIPS (2018) Li, Y., Liang, X., Hu, Z., Xing, E.P.: Hybrid retrieval-generation reinforced agent for medical image report generation. In: NeurIPS (2018)
23.
Zurück zum Zitat Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: ACL (2004) Lin, C.Y.: ROUGE: a package for automatic evaluation of summaries. In: ACL (2004)
24.
Zurück zum Zitat Liu, F., Ge, S., Wu, X.: Competence-based multimodal curriculum learning for medical report generation. In: ACL (2021) Liu, F., Ge, S., Wu, X.: Competence-based multimodal curriculum learning for medical report generation. In: ACL (2021)
25.
Zurück zum Zitat Liu, F., Liu, Y., Ren, X., He, X., Sun, X.: Aligning visual regions and textual concepts for semantic-grounded image representations. In: NeurIPS (2019) Liu, F., Liu, Y., Ren, X., He, X., Sun, X.: Aligning visual regions and textual concepts for semantic-grounded image representations. In: NeurIPS (2019)
26.
Zurück zum Zitat Liu, F., Ren, X., Liu, Y., Lei, K., Sun, X.: Exploring and distilling cross-modal information for image captioning. In: IJCAI (2019) Liu, F., Ren, X., Liu, Y., Lei, K., Sun, X.: Exploring and distilling cross-modal information for image captioning. In: IJCAI (2019)
27.
Zurück zum Zitat Liu, F., Ren, X., Liu, Y., Wang, H., Sun, X.: simNet: stepwise image-topic merging network for generating detailed and comprehensive image captions. In: EMNLP (2018) Liu, F., Ren, X., Liu, Y., Wang, H., Sun, X.: simNet: stepwise image-topic merging network for generating detailed and comprehensive image captions. In: EMNLP (2018)
28.
Zurück zum Zitat Liu, F., Wu, X., Ge, S., Fan, W., Zou, Y.: Exploring and distilling posterior and prior knowledge for radiology report generation. In: CVPR (2021) Liu, F., Wu, X., Ge, S., Fan, W., Zou, Y.: Exploring and distilling posterior and prior knowledge for radiology report generation. In: CVPR (2021)
29.
Zurück zum Zitat Liu, F., Yin, C., Wu, X., Ge, S., Zhang, P., Sun, X.: Contrastive attention for automatic chest X-ray report generation. In: ACL (Findings) (2021) Liu, F., Yin, C., Wu, X., Ge, S., Zhang, P., Sun, X.: Contrastive attention for automatic chest X-ray report generation. In: ACL (Findings) (2021)
30.
Zurück zum Zitat Liu, G., et al.: Clinically accurate chest X-ray report generation. In: MLHC (2019) Liu, G., et al.: Clinically accurate chest X-ray report generation. In: MLHC (2019)
31.
Zurück zum Zitat Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: CVPR (2017) Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing when to look: adaptive attention via a visual sentinel for image captioning. In: CVPR (2017)
32.
Zurück zum Zitat Pan, Y., Yao, T., Li, Y., Mei, T.: X-linear attention networks for image captioning. In: CVPR (2020) Pan, Y., Yao, T., Li, Y., Mei, T.: X-linear attention networks for image captioning. In: CVPR (2020)
33.
Zurück zum Zitat Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002) Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)
34.
Zurück zum Zitat Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: ICML (2013) Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: ICML (2013)
35.
Zurück zum Zitat Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel, V.: Self-critical sequence training for image captioning. In: CVPR (2017) Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel, V.: Self-critical sequence training for image captioning. In: CVPR (2017)
36.
Zurück zum Zitat Shin, H., Roberts, K., Lu, L., Demner-Fushman, D., Yao, J., Summers, R.M.: Learning to read chest X-rays: recurrent neural cascade model for automated image annotation. In: CVPR (2016) Shin, H., Roberts, K., Lu, L., Demner-Fushman, D., Yao, J., Summers, R.M.: Learning to read chest X-rays: recurrent neural cascade model for automated image annotation. In: CVPR (2016)
37.
Zurück zum Zitat Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 5, 1929–1958 (2014)MathSciNetMATH Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 5, 1929–1958 (2014)MathSciNetMATH
39.
Zurück zum Zitat Vaswani, A., et al.: Attention is all you need. In: NIPS (2017) Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
40.
Zurück zum Zitat Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR (2015) Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR (2015)
41.
Zurück zum Zitat Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML (2015) Xu, K., et al.: Show, attend and tell: neural image caption generation with visual attention. In: ICML (2015)
43.
Zurück zum Zitat You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: CVPR (2016) You, Q., Jin, H., Wang, Z., Fang, C., Luo, J.: Image captioning with semantic attention. In: CVPR (2016)
45.
Zurück zum Zitat Zhang, Y., Wang, X., Xu, Z., Yu, Q., Yuille, A.L., Xu, D.: When radiology report generation meets knowledge graph. In: AAAI (2020) Zhang, Y., Wang, X., Xu, Z., Yu, Q., Yuille, A.L., Xu, D.: When radiology report generation meets knowledge graph. In: AAAI (2020)
Metadaten
Titel
AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation
verfasst von
Di You
Fenglin Liu
Shen Ge
Xiaoxia Xie
Jing Zhang
Xian Wu
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-87199-4_7