Top

Multimedia Systems

Published in:

01-04-2024 | Regular Paper

GHCL: Gaussian heuristic curriculum learning for Brain CT report generation

Authors: Qingya Shen, Yanzhao Shi, Xiaodan Zhang, Junzhong Ji, Ying Liu, Huimin Xu

Published in: Multimedia Systems | Issue 2/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Brain computed tomography (CT) report generation, which aims at generating accurate and descriptive reports for Brain CT imaging, has gained growing attention from researchers. Existing works mainly train a language-generation model with complex image-text pairs for supervision, which still struggled with the following challenges: 1) the serious long-tail distribution of textual supervise signals led by imbalanced text length distribution, and 2) the insufficient medical data caused by expensive expert intervention. In this paper, we propose a novel Gaussian heuristic curriculum learning (GHCL) model to effectively tackle the long-tail data distribution and optimally utilize the limited training data. Specifically, our training process mimics the learning process of physicians in a step-wise paradigm. Firstly, we evaluate the scores of training difficulty for each sample through two elaborately designed Gaussian heuristic metrics. Then, during the training of the language-generation model, we iteratively select the most suitable batch of training samples, which is comprehensively considered by the calculated scores of training difficulty. In this way, GHCL can effectively guide the progressive learning of the report generation model and boost the quality of generated Brain CT reports. We comprehensively compare the method with previous state-of-the-art models on the Brain CT report generation dataset BCT-CHR. Experimental results demonstrate that our method surpasses previous state-of-the-art approaches and GHCL is flexible to combine with existing approaches to further improve the performance.

previous article Cascaded refinement residual attention network for image outpainting

next article Same-clothes person re-identification with dual-stream network

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

https://github.com/fxsjy/jieba.

Jing, B., Xie, P., Xing, E.P.: On the Automatic Generation of Medical Imaging Reports. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers (2018)

Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L.: Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering. 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018 (2018)

Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: A neural image caption generator. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015 (2015)

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A.C., Salakhutdinov, R., Zemel, R.S., Bengio, Y.: Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015 (2015)

Wang, X., Peng, Y., Lu, L., Lu, Z., Summers, R.M.: TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays. 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018 (2018)

Ni, J., Hsu, C., Gentili, A., McAuley, J.J.: Learning visual-semantic embeddings for reporting abnormal findings on chest x-rays. In: Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, pp. 1954–1960 (2020)

Yang, S., Ji, J., Zhang, X., Liu, Y., Wang, Z.: Weakly Guided Hierarchical Encoder-Decoder Network for Brain CT Report Generation. IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2021, Houston, TX, USA, December 9-12, 2021 (2021)

Yang, S., Wu, X., Ge, S., Zheng, Z., Zhou, S.K., Xiao, L.: Radiology report generation with a learned knowledge base and multi-modal alignment. Medical Image Anal. 86, 102798 (2023)CrossRef

Yan, A., He, Z., Lu, X., Du, J., Chang, E.Y., Gentili, A., McAuley, J.J., Hsu, C.: Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation. Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021 (2021)

10.

Chen, Z., Shen, Y., Song, Y., Wan, X.: Cross-modal Memory Networks for Radiology Report Generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021 (2021)

11.

Qin, H., Song, Y.: Reinforced Cross-modal Alignment for Radiology Report Generation. Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022 (2022)

12.

Wang, J., Bhalerao, A., He, Y.: Cross-Modal Prototype Driven Network for Radiology Report Generation. Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXXV (2022)

13.

Xue, Y., Xu, T., Long, L.R., Xue, Z., Antani, S.K., Thoma, G.R., Huang, X.: Multimodal Recurrent Model with Attention for Automated Radiology Report Generation. Medical Image Computing and Computer Assisted Intervention - MICCAI 2018 - 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part I (2018)

14.

Zhang, Y., Wang, X., Xu, Z., Yu, Q., Yuille, A.L., Xu, D.: When Radiology Report Generation Meets Knowledge Graph. The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020 (2020)

15.

Liu, F., Wu, X., Ge, S., Fan, W., Zou, Y.: Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 (2021)

16.

Chen, Z., Song, Y., Chang, T., Wan, X.: Generating Radiology Reports via Memory-driven Transformer. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020 (2020)

17.

Chen, X., Fang, H., Lin, T., Vedantam, R., Gupta, S., Dollár, P., Zitnick, C.L.: Microsoft COCO captions: Data collection and evaluation server. CoRR (2015)

18.

Rennie, S.J., Marcheret, E., Mroueh, Y., Ross, J., Goel, V.: Self-Critical Sequence Training for Image Captioning. 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 (2017)

19.

Liu, F., Liu, Y., Ren, X., He, X., Sun, X.: Aligning visual regions and textual concepts for semantic-grounded image representations. In: Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 6847–6857 (2019)

20.

Liu, F., Ren, X., Wu, X., Ge, S., Fan, W., Zou, Y., Sun, X.: Prophet attention: Predicting attention with future attention. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual (2020)

21.

Zheng, A., Zheng, S., Bai, C., Chen, D.: Triple-level relationship enhanced transformer for image captioning. Multim. Syst. 29(4), 1955–1966 (2023)CrossRef

22.

Carmo Nogueira, T., Vinhal, C.D.N., Cruz Júnior, G., Ullmann, M.R.D., Marques, T.C.: A reference-based model using deep learning for image captioning. Multim. Syst. 29(3), 1665–1681 (2023)

23.

Lu, J., Xiong, C., Parikh, D., Socher, R.: Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning. 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 (2017)

24.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017)CrossRef

25.

He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016 (2016)

26.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)CrossRef

27.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 5998–6008 (2017)

28.

Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009, vol. 382, pp. 41–48 (2009)

29.

Platanios, E.A., Stretcu, O., Neubig, G., Póczos, B., Mitchell, T.M.: Competence-based curriculum learning for neural machine translation. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 1162–1172 (2019)

30.

Kumar, G., Foster, G.F., Cherry, C., Krikun, M.: Reinforcement learning based curriculum optimization for neural machine translation. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pp. 2054–2061 (2019)

31.

Zhang, X., Kumar, G., Khayrallah, H., Murray, K., Gwinnup, J., Martindale, M.J., McNamee, P., Duh, K., Carpuat, M.: An empirical exploration of curriculum learning for neural machine translation. CoRR abs/1811.00739 (2018)

32.

Liu, X., Lai, H., Wong, D.F., Chao, L.S.: Norm-based curriculum learning for neural machine translation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pp. 427–436 (2020)

33.

Weinshall, D., Cohen, G., Amir, D.: Curriculum learning by transfer learning: Theory and experiments with deep networks. In: Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, vol. 80, pp. 5235–5243 (2018)

34.

Wang, Y., Gan, W., Yang, J., Wu, W., Yan, J.: Dynamic curriculum learning for imbalanced data classification. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 5016–5025 (2019)

35.

Li, Q., Huang, S., Hong, Y., Zhu, S.: A competence-aware curriculum for visual concepts learning via question answering. In: Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part II, vol. 12347, pp. 141–157 (2020)

36.

Elman, J.L.: Learning and development in neural networks: The importance of starting small. Cognition 48, 71–99 (1993)CrossRef

37.

Liu, F., Ge, S., Wu, X.: Competence-based multimodal curriculum learning for medical report generation. CoRR abs/2206.14579 (2022)

38.

Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (2015)

39.

Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA, USA (2002)

40.

Lavie, A., Agarwal, A.: METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments. Proceedings of the Second Workshop on Statistical Machine Translation, WMT@ACL 2007, Prague, Czech Republic, June 23, 2007 (2007)

41.

Lin, C.-Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out, pp. 74–81 (2004)

42.

Vedantam, R., Zitnick, C.L., Parikh, D.: CIDEr: Consensus-based image description evaluation. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015 (2015)

43.

Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A Hierarchical Approach for Generating Descriptive Image Paragraphs. 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017 (2017)

44.

Cornia, M., Stefanini, M., Baraldi, L., Cucchiara, R.: Meshed-memory transformer for image captioning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 10575–10584 (2020)

Title: GHCL: Gaussian heuristic curriculum learning for Brain CT report generation
Authors: Qingya Shen
Yanzhao Shi
Xiaodan Zhang
Junzhong Ji
Ying Liu
Huimin Xu
Publication date: 01-04-2024
Publisher: Springer Berlin Heidelberg
Published in: Multimedia Systems / Issue 2/2024
Print ISSN: 0942-4962
Electronic ISSN: 1432-1882
DOI: https://doi.org/10.1007/s00530-024-01266-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 2/2024

Vision transformer models for mobile/edge devices: a survey

Gs-DeblurGANv2: a QR code deblurring algorithm based on lightweight network structure

Scalable image coding with enhancement features for human and machine

Recognize after early fusion: the Chinese food recognition based on the alignment of image and ingredients

360° video quality assessment based on saliency-guided viewport extraction

Multiscale image denoising algorithm based on UNet3+