Top

The Journal of Supercomputing

Published in:

26-04-2023

CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19

Authors: Akanksha Karotia, Seba Susan

Published in: The Journal of Supercomputing | Issue 14/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The number of research articles published on COVID-19 has dramatically increased since the outbreak of the pandemic in November 2019. This absurd rate of productivity in research articles leads to information overload. It has increasingly become urgent for researchers and medical associations to stay up to date on the latest COVID-19 studies. To address information overload in COVID-19 scientific literature, the study presents a novel hybrid model named CovSumm, an unsupervised graph-based hybrid approach for single-document summarization, that is evaluated on the CORD-19 dataset. We have tested the proposed methodology on the scientific papers in the database dated from January 1, 2021 to December 31, 2021, consisting of 840 documents in total. The proposed text summarization is a hybrid of two distinctive extractive approaches (1) GenCompareSum (transformer-based approach) and (2) TextRank (graph-based approach). The sum of scores generated by both methods is used to rank the sentences for generating the summary. On the CORD-19, the recall-oriented understudy for gisting evaluation (ROUGE) score metric is used to compare the performance of the CovSumm model with various state-of-the-art techniques. The proposed method achieved the highest scores of ROUGE-1: 40.14%, ROUGE-2: 13.25%, and ROUGE-L: 36.32%. The proposed hybrid approach shows improved performance on the CORD-19 dataset when compared to existing unsupervised text summarization methods.

previous article Efficient data persistence and data division for distributed computing in cloud data center networks

next article An attentive convolutional transformer-based network for road safety

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Cai X, Liu S, Yang L, Lu Y, Zhao J, Shen D, Liu T (2022) COVIDSum: a linguistically enriched SciBERT-based summarization model for COVID-19 scientific papers. J Biomed Inform 127:103999CrossRef

Xie Q, Bishop JA, Tiwari P, Ananiadou S (2022) Pre-trained language models with domain knowledge for biomedical extractive summarization. Knowl-Based Syst 252:109460CrossRef

Tang T, Yuan T, Tang X, Chen D (2020) Incorporating external knowledge into unsupervised graph model for document summarization. Electronics 9(9):1520CrossRef

Zhao J, Liu M, Gao L, Jin Y, Du L, Zhao H, Haffari G (2020) SummPip: unsupervised multi-document summarization with sentence graph compression. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 1949–1952

Wallace BC, Saha S, Soboczenski F, Marshall IJ (2021) Generating (factual?) narrative summaries of rcts: experiments with neural multi-document summarization. AMIA Summits Transl. Sci. Proc. 2021:605

Huang D, Cui L, Yang S, Bao G, Wang K, Xie J, Zhang Y (2020) What have we achieved on text summarization?. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, pp 446–469

Zhong M, Liu P, Chen Y, Wang D, Qiu X, Huang X (2020) Extractive summarization as text matching. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 6197–6208. https://doi.org/10.18653/v1/2020.acl-main.552

See A, Liu PJ, Manning CD (2017) Get to the point: summarization with pointer-generator networks. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp 1073–1083. Vancouver, Canada

Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(1):5485–5551MathSciNetMATH

10.

Cachola I, Lo K, Cohan A, Weld C (2020) TLDR: extreme summarization of scientific documents. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, pp 4766–4777

11.

Liu Y, Lapata M (2019) Text summarization with pretrained encoders. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Hong Kong, pp 3728–3738

12.

Dou Z-Y, Liu P, Hayashi H, Jiang Z, Neubig G (2021) GSum: a general framework for guided neural abstractive summarization. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 4830–4842. https://doi.org/10.18653/v1/2021.naacl-main.384

13.

Ramos J (2003) Using tf-idf to determine word relevance in document queries. Proc First Instr Conf Mach Learn 242(1):29–48

14.

Kiros R, Zhu Y, Salakhutdinov R, Zemel RS, Torralba A, Urtasun R, Fidler S (2015) Skip-thought vectors. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, vol 2, pp 3294–3302

15.

Kenton JDMWC, Toutanova LK (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of naacL-HLT, vol 1, p 2

16.

Mutlu B, Sezer EA, Akcayol MA (2020) Candidate sentence selection for extractive text summarization. Inf Process Manag 57(6):102359CrossRef

17.

Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165MathSciNetCrossRef

18.

Edmundson HP, Wyllys RE (1961) Automatic abstracting and indexing—survey and recommendations. Commun ACM 4(5):226–234CrossRef

19.

Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp 404–411

20.

Erkan G, Radev DR (2004) Lexrank: graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479CrossRef

21.

Bishop J, Xie Q, Ananiadou S (2022) GenCompareSum: a hybrid unsupervised summarization method using salience. In: Proceedings of the 21st workshop on biomedical language processing, pp 220–240

22.

Ozsoy MG, Alpaslan FN, Cicekli I (2011) Text summarization using latent semantic analysis. J Inf Sci 37(4):405–417MathSciNetCrossRef

23.

Nenkova A, Vanderwende L (2005) The impact of frequency on summarization. Microsoft Research, Redmond, Washington, Tech. Rep. MSR-TR-2005, 101

24.

Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training. Technical Report. OpenAI

25.

Wang LL, Lo K, Chandrasekhar Y, Reas R, Yang J, Burdick D, Eide D, Funk K, Katsis Y, Kinney RM, Li Y, Liu Z, Merrill W, Mooney P, Murdick DA, Rishi D, Sheehan J, Shen Z, Stilson B, et al. (2020) CORD-19: the COVID-19 open research dataset. In: Proceedings of the 1st workshop on NLP for COVID-19 at ACL 2020. Association for Computational Linguistics

26.

Moradi M, Dorffner G, Samwald M (2020) Deep contextualized embeddings for quantifying the informative content in biomedical text summarization. Comput Methods Prog Biomed 184:105117CrossRef

27.

Padmakumar V, He H (2021) Unsupervised extractive summarization using pointwise mutual information. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, pp 2505–2512

28.

Ju J, Liu M, Koh HY, Jin Y, Du L, Pan S (2021) Leveraging information bottleneck for scientific document summarization. In: Findings of the association for computational linguistics: EMNLP 2021, Punta Cana, Dominican Republic. Association for Computational Linguistics, pp 4091–4098

29.

Su D, Xu Y, Yu T, Siddique FB, Barezi E, Fung P (2020) CAiRE-COVID: a question answering and query-focused multi-document summarization system for COVID-19 scholarly information management. In: Proceedings of the 1st workshop on NLP for COVID-19 (part 2) at EMNLP. Association for Computational Linguistics

30.

Jang M, Kang P (2021) Learning-free unsupervised extractive summarization model. IEEE Access 9:14358–14368CrossRef

31.

Belwal RC, Rai S, Gupta A (2021) Text summarization using topic-based vector space model and semantic measure. Inf Process Manag 58(3):102536CrossRef

32.

Srivastava R, Singh P, Rana KPS, Kumar V (2022) A topic modeled unsupervised approach to single document extractive text summarization. Knowl-Based Syst 246:108636CrossRef

33.

Belwal RC, Rai S, Gupta A (2021) A new graph-based extractive text summarization using keywords or topic modeling. J Ambient Intell Humaniz Comput 12(10):8975–8990CrossRef

34.

El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2020) EdgeSumm: graph-based framework for automatic text summarization. Inf Process Manag 57(6):102264CrossRef

35.

Liu J, Hughes DJ, Yang Y (2021) Unsupervised extractive text summarization with distance-augmented sentence graphs. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 2313–2317

36.

Joshi A, Fidalgo E, Alegre E, Alaiz-Rodriguez R (2022) RankSum—an unsupervised extractive text summarization based on rank fusion. Expert Syst Appl 200:116846CrossRef

37.

COVID-19 Open Research Dataset Challenge (CORD-19), https://www.kaggle.com/datasets/allen-institute-for-ai/CORD-19-research-challenge. Accessed 07 Aug 2022

38.

Xu S, Zhang X, Wu Y, Wei F, Zhou M (2020) Unsupervised extractive summarization by pre-training hierarchical transformers. In: Findings of the association for computational linguistics: EMNLP 2020. Association for Computational Linguistics, pp 1784–1795

39.

Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81

40.

Haghighi A, Vanderwende L (2009) Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp 362–370

41.

Ishikawa K (2001) A hybrid text summarization method based on the TF method and the lead method. In: Proceedings of the second NTCIR workshop meeting on evaluation of Chinese & Japanese text retrieval and text summarization, pp 325–330

42.

Bansal A, Choudhry A, Sharma A, Susan S (2023) Adaptation of domain-specific transformer models with text oversampling for sentiment analysis of social media posts on COVID-19 vaccine. Comput Sci 24(2). https://doi.org/10.7494/csci.2023.24.2.4761

Title: CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19
Authors: Akanksha Karotia
Seba Susan
Publication date: 26-04-2023
Publisher: Springer US
Published in: The Journal of Supercomputing / Issue 14/2023
Print ISSN: 0920-8542
Electronic ISSN: 1573-0484
DOI: https://doi.org/10.1007/s11227-023-05291-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 14/2023

Dynamic stochastic game-based security of edge computing based on blockchain

Adaptive trajectory prediction without catastrophic forgetting

Multivariate outlier filtering for A-NFVLearn: an advanced deep VNF resource usage forecasting technique

Embedding channel pruning within the CNN architecture design using a bi-level evolutionary approach

A new interval constructed belief rule base with rule reliability

A novel cloud model based on multiplicative unbalanced linguistic term set

Premium Partner