Skip to main content
Top
Published in: The Journal of Supercomputing 14/2023

26-04-2023

CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19

Authors: Akanksha Karotia, Seba Susan

Published in: The Journal of Supercomputing | Issue 14/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The number of research articles published on COVID-19 has dramatically increased since the outbreak of the pandemic in November 2019. This absurd rate of productivity in research articles leads to information overload. It has increasingly become urgent for researchers and medical associations to stay up to date on the latest COVID-19 studies. To address information overload in COVID-19 scientific literature, the study presents a novel hybrid model named CovSumm, an unsupervised graph-based hybrid approach for single-document summarization, that is evaluated on the CORD-19 dataset. We have tested the proposed methodology on the scientific papers in the database dated from January 1, 2021 to December 31, 2021, consisting of 840 documents in total. The proposed text summarization is a hybrid of two distinctive extractive approaches (1) GenCompareSum (transformer-based approach) and (2) TextRank (graph-based approach). The sum of scores generated by both methods is used to rank the sentences for generating the summary. On the CORD-19, the recall-oriented understudy for gisting evaluation (ROUGE) score metric is used to compare the performance of the CovSumm model with various state-of-the-art techniques. The proposed method achieved the highest scores of ROUGE-1: 40.14%, ROUGE-2: 13.25%, and ROUGE-L: 36.32%. The proposed hybrid approach shows improved performance on the CORD-19 dataset when compared to existing unsupervised text summarization methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Cai X, Liu S, Yang L, Lu Y, Zhao J, Shen D, Liu T (2022) COVIDSum: a linguistically enriched SciBERT-based summarization model for COVID-19 scientific papers. J Biomed Inform 127:103999CrossRef Cai X, Liu S, Yang L, Lu Y, Zhao J, Shen D, Liu T (2022) COVIDSum: a linguistically enriched SciBERT-based summarization model for COVID-19 scientific papers. J Biomed Inform 127:103999CrossRef
2.
go back to reference Xie Q, Bishop JA, Tiwari P, Ananiadou S (2022) Pre-trained language models with domain knowledge for biomedical extractive summarization. Knowl-Based Syst 252:109460CrossRef Xie Q, Bishop JA, Tiwari P, Ananiadou S (2022) Pre-trained language models with domain knowledge for biomedical extractive summarization. Knowl-Based Syst 252:109460CrossRef
3.
go back to reference Tang T, Yuan T, Tang X, Chen D (2020) Incorporating external knowledge into unsupervised graph model for document summarization. Electronics 9(9):1520CrossRef Tang T, Yuan T, Tang X, Chen D (2020) Incorporating external knowledge into unsupervised graph model for document summarization. Electronics 9(9):1520CrossRef
4.
go back to reference Zhao J, Liu M, Gao L, Jin Y, Du L, Zhao H, Haffari G (2020) SummPip: unsupervised multi-document summarization with sentence graph compression. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 1949–1952 Zhao J, Liu M, Gao L, Jin Y, Du L, Zhao H, Haffari G (2020) SummPip: unsupervised multi-document summarization with sentence graph compression. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 1949–1952
5.
go back to reference Wallace BC, Saha S, Soboczenski F, Marshall IJ (2021) Generating (factual?) narrative summaries of rcts: experiments with neural multi-document summarization. AMIA Summits Transl. Sci. Proc. 2021:605 Wallace BC, Saha S, Soboczenski F, Marshall IJ (2021) Generating (factual?) narrative summaries of rcts: experiments with neural multi-document summarization. AMIA Summits Transl. Sci. Proc. 2021:605
6.
go back to reference Huang D, Cui L, Yang S, Bao G, Wang K, Xie J, Zhang Y (2020) What have we achieved on text summarization?. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp 446–469 Huang D, Cui L, Yang S, Bao G, Wang K, Xie J, Zhang Y (2020) What have we achieved on text summarization?. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp 446–469
8.
go back to reference See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp 1073–1083. Vancouver, Canada See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp 1073–1083. Vancouver, Canada
9.
go back to reference Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1):5485–5551MathSciNetMATH Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1):5485–5551MathSciNetMATH
10.
go back to reference Cachola I, Lo K, Cohan A, Weld C (2020) TLDR: extreme summarization of scientific documents. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, pp 4766–4777 Cachola I, Lo K, Cohan A, Weld C (2020) TLDR: extreme summarization of scientific documents. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, pp 4766–4777
11.
go back to reference Liu Y, Lapata M (2019) Text summarization with pretrained encoders. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Hong Kong, pp 3728–3738 Liu Y, Lapata M (2019) Text summarization with pretrained encoders. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Hong Kong, pp 3728–3738
12.
go back to reference Dou Z-Y, Liu P, Hayashi H, Jiang Z, Neubig G (2021) GSum: a general framework for guided neural abstractive summarization. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 4830–4842. https://doi.org/10.18653/v1/2021.naacl-main.384 Dou Z-Y, Liu P, Hayashi H, Jiang Z, Neubig G (2021) GSum: a general framework for guided neural abstractive summarization. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 4830–4842. https://​doi.​org/​10.​18653/​v1/​2021.​naacl-main.​384
13.
go back to reference Ramos J (2003) Using tf-idf to determine word relevance in document queries. Proc First Instr Conf Mach Learn 242(1):29–48 Ramos J (2003) Using tf-idf to determine word relevance in document queries. Proc First Instr Conf Mach Learn 242(1):29–48
14.
go back to reference Kiros R, Zhu Y, Salakhutdinov R, Zemel RS, Torralba A, Urtasun R, Fidler S (2015) Skip-thought vectors. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol 2, pp 3294–3302 Kiros R, Zhu Y, Salakhutdinov R, Zemel RS, Torralba A, Urtasun R, Fidler S (2015) Skip-thought vectors. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol 2, pp 3294–3302
15.
go back to reference Kenton JDMWC, Toutanova LK (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of naacL-HLT, vol 1, p 2 Kenton JDMWC, Toutanova LK (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of naacL-HLT, vol 1, p 2
16.
go back to reference Mutlu B, Sezer EA, Akcayol MA (2020) Candidate sentence selection for extractive text summarization. Inf Process Manag 57(6):102359CrossRef Mutlu B, Sezer EA, Akcayol MA (2020) Candidate sentence selection for extractive text summarization. Inf Process Manag 57(6):102359CrossRef
18.
go back to reference Edmundson HP, Wyllys RE (1961) Automatic abstracting and indexing—survey and recommendations. Commun ACM 4(5):226–234CrossRef Edmundson HP, Wyllys RE (1961) Automatic abstracting and indexing—survey and recommendations. Commun ACM 4(5):226–234CrossRef
19.
go back to reference Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp 404–411 Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp 404–411
20.
go back to reference Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479CrossRef Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479CrossRef
21.
go back to reference Bishop J, Xie Q, Ananiadou S (2022) GenCompareSum: a hybrid unsupervised summarization method using salience. In: Proceedings of the 21st workshop on biomedical language processing, pp 220–240 Bishop J, Xie Q, Ananiadou S (2022) GenCompareSum: a hybrid unsupervised summarization method using salience. In: Proceedings of the 21st workshop on biomedical language processing, pp 220–240
22.
go back to reference Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37(4):405–417MathSciNetCrossRef Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37(4):405–417MathSciNetCrossRef
23.
go back to reference Nenkova A, Vanderwende L (2005) The impact of frequency on summarization. Microsoft Research, Redmond, Washington, Tech. Rep. MSR-TR-2005, 101 Nenkova A, Vanderwende L (2005) The impact of frequency on summarization. Microsoft Research, Redmond, Washington, Tech. Rep. MSR-TR-2005, 101
24.
go back to reference Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. Technical Report. OpenAI Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. Technical Report. OpenAI
25.
go back to reference Wang LL, Lo K, Chandrasekhar Y, Reas R, Yang J, Burdick D, Eide D, Funk K, Katsis Y, Kinney RM, Li Y, Liu Z, Merrill W, Mooney P, Murdick DA, Rishi D, Sheehan J, Shen Z, Stilson B, et al. (2020) CORD-19: the COVID-19 open research dataset. In: Proceedings of the 1st workshop on NLP for COVID-19 at ACL 2020. Association for Computational Linguistics Wang LL, Lo K, Chandrasekhar Y, Reas R, Yang J, Burdick D, Eide D, Funk K, Katsis Y, Kinney RM, Li Y, Liu Z, Merrill W, Mooney P, Murdick DA, Rishi D, Sheehan J, Shen Z, Stilson B, et al. (2020) CORD-19: the COVID-19 open research dataset. In: Proceedings of the 1st workshop on NLP for COVID-19 at ACL 2020. Association for Computational Linguistics
26.
go back to reference Moradi M, Dorffner G, Samwald M (2020) Deep contextualized embeddings for quantifying the informative content in biomedical text summarization. Comput Methods Prog Biomed 184:105117CrossRef Moradi M, Dorffner G, Samwald M (2020) Deep contextualized embeddings for quantifying the informative content in biomedical text summarization. Comput Methods Prog Biomed 184:105117CrossRef
27.
go back to reference Padmakumar V, He H (2021) Unsupervised extractive summarization using pointwise mutual information. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, pp 2505–2512 Padmakumar V, He H (2021) Unsupervised extractive summarization using pointwise mutual information. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, pp 2505–2512
28.
go back to reference Ju J, Liu M, Koh HY, Jin Y, Du L, Pan S (2021) Leveraging information bottleneck for scientific document summarization. In: Findings of the association for computational linguistics: EMNLP 2021, Punta Cana, Dominican Republic. Association for Computational Linguistics, pp 4091–4098 Ju J, Liu M, Koh HY, Jin Y, Du L, Pan S (2021) Leveraging information bottleneck for scientific document summarization. In: Findings of the association for computational linguistics: EMNLP 2021, Punta Cana, Dominican Republic. Association for Computational Linguistics, pp 4091–4098
29.
go back to reference Su D, Xu Y, Yu T, Siddique FB, Barezi E, Fung P (2020) CAiRE-COVID: a question answering and query-focused multi-document summarization system for COVID-19 scholarly information management. In: Proceedings of the 1st workshop on NLP for COVID-19 (part 2) at EMNLP. Association for Computational Linguistics Su D, Xu Y, Yu T, Siddique FB, Barezi E, Fung P (2020) CAiRE-COVID: a question answering and query-focused multi-document summarization system for COVID-19 scholarly information management. In: Proceedings of the 1st workshop on NLP for COVID-19 (part 2) at EMNLP. Association for Computational Linguistics
30.
go back to reference Jang M, Kang P (2021) Learning-free unsupervised extractive summarization model. IEEE Access 9:14358–14368CrossRef Jang M, Kang P (2021) Learning-free unsupervised extractive summarization model. IEEE Access 9:14358–14368CrossRef
31.
go back to reference Belwal RC, Rai S, Gupta A (2021) Text summarization using topic-based vector space model and semantic measure. Inf Process Manag 58(3):102536CrossRef Belwal RC, Rai S, Gupta A (2021) Text summarization using topic-based vector space model and semantic measure. Inf Process Manag 58(3):102536CrossRef
32.
go back to reference Srivastava R, Singh P, Rana KPS, Kumar V (2022) A topic modeled unsupervised approach to single document extractive text summarization. Knowl-Based Syst 246:108636CrossRef Srivastava R, Singh P, Rana KPS, Kumar V (2022) A topic modeled unsupervised approach to single document extractive text summarization. Knowl-Based Syst 246:108636CrossRef
33.
go back to reference Belwal RC, Rai S, Gupta A (2021) A new graph-based extractive text summarization using keywords or topic modeling. J Ambient Intell Humaniz Comput 12(10):8975–8990CrossRef Belwal RC, Rai S, Gupta A (2021) A new graph-based extractive text summarization using keywords or topic modeling. J Ambient Intell Humaniz Comput 12(10):8975–8990CrossRef
34.
go back to reference El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2020) EdgeSumm: graph-based framework for automatic text summarization. Inf Process Manag 57(6):102264CrossRef El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2020) EdgeSumm: graph-based framework for automatic text summarization. Inf Process Manag 57(6):102264CrossRef
35.
go back to reference Liu J, Hughes DJ, Yang Y (2021) Unsupervised extractive text summarization with distance-augmented sentence graphs. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 2313–2317 Liu J, Hughes DJ, Yang Y (2021) Unsupervised extractive text summarization with distance-augmented sentence graphs. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 2313–2317
36.
go back to reference Joshi A, Fidalgo E, Alegre E, Alaiz-Rodriguez R (2022) RankSum—an unsupervised extractive text summarization based on rank fusion. Expert Syst Appl 200:116846CrossRef Joshi A, Fidalgo E, Alegre E, Alaiz-Rodriguez R (2022) RankSum—an unsupervised extractive text summarization based on rank fusion. Expert Syst Appl 200:116846CrossRef
38.
go back to reference Xu S, Zhang X, Wu Y, Wei F, Zhou M (2020) Unsupervised extractive summarization by pre-training hierarchical transformers. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, pp 1784–1795 Xu S, Zhang X, Wu Y, Wei F, Zhou M (2020) Unsupervised extractive summarization by pre-training hierarchical transformers. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, pp 1784–1795
39.
go back to reference Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81 Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81
40.
go back to reference Haghighi A, Vanderwende L (2009) Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp 362–370 Haghighi A, Vanderwende L (2009) Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp 362–370
41.
go back to reference Ishikawa K (2001) A hybrid text summarization method based on the TF method and the lead method. In: Proceedings of the second NTCIR workshop meeting on evaluation of Chinese & Japanese text retrieval and text summarization, pp 325–330 Ishikawa K (2001) A hybrid text summarization method based on the TF method and the lead method. In: Proceedings of the second NTCIR workshop meeting on evaluation of Chinese & Japanese text retrieval and text summarization, pp 325–330
Metadata
Title
CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19
Authors
Akanksha Karotia
Seba Susan
Publication date
26-04-2023
Publisher
Springer US
Published in
The Journal of Supercomputing / Issue 14/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-023-05291-3

Other articles of this Issue 14/2023

The Journal of Supercomputing 14/2023 Go to the issue

Premium Partner