Skip to main content
Erschienen in: The Journal of Supercomputing 1/2023

04.07.2022

Frequent item-set mining and clustering based ranked biomedical text summarization

verfasst von: Supriya Gupta, Aakanksha Sharaff, Naresh Kumar Nagwani

Erschienen in: The Journal of Supercomputing | Ausgabe 1/2023

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The difficulty of deriving value out of vast available scientific literature in a condensed form lead us to look for a proficient theme based summarization solution which can preserve precise biomedical content. The study targets to analyze impact of combining semantic biomedical concepts extraction, frequent item-set mining and clustering techniques over information retention, objective functions and ROUGE values for the obtained final summary. The suggested frequent item-set mining and clustering (FRI-CL) graph-based framework uses UMLS metathesarus and BERT-based semantic embeddings to identify domain-relevant concepts. The scrutinized concepts are mined according to their relationship with neighbors and frequency via an amended FP-Growth model. The framework utilizes S-DPMM clustering, which is a probabilistic mixture model and aids in the identification and clubbing of complex relevant patterns to increase coverage of important sub-themes. The sentences with the frequent concepts are scored via PageRank to form an efficient and compelling summary. The research experiments on the 100 sample biomedical documents taken from PubMed archives are evaluated via calculation of ROUGE scores, coverage, readability, non-redundancy, memory utilization and information retention from the summary output. The results with the FRI-CL summarization system showcased 10% ROUGE performance improvement and are at par with the other baseline methods. On an average 30–40% improvement in memory utilization is observed with up to 50% information retention when experiments are performed using S-DPMM clustering. The research indicates that the fusion of semantic mapping, clustering, along with frequent-item set mining of biomedical concepts enhance the overall co-related information covering all sub-themes.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J et al (2014) Textsummarization in the biomedical domain: a systematic review of recent research. J Biomed Inform 52:457–467CrossRef Mishra R, Bian J, Fiszman M, Weir CR, Jonnalagadda S, Mostafa J et al (2014) Textsummarization in the biomedical domain: a systematic review of recent research. J Biomed Inform 52:457–467CrossRef
3.
Zurück zum Zitat Afantenos S, Karkaletsis V, Stamatopoulos P (2005) Summarization from medicaldocuments: a survey. Artif Intell Med 33(2):157–177CrossRef Afantenos S, Karkaletsis V, Stamatopoulos P (2005) Summarization from medicaldocuments: a survey. Artif Intell Med 33(2):157–177CrossRef
4.
Zurück zum Zitat Fleuren WWM, Alkema W (2015) Application of text mining in the biomedical domain. Methods 74:97–106CrossRef Fleuren WWM, Alkema W (2015) Application of text mining in the biomedical domain. Methods 74:97–106CrossRef
5.
Zurück zum Zitat Jones KS (2007) Automatic summarising: the state of the art. Inf Process Manag 43(6):1449–1481CrossRef Jones KS (2007) Automatic summarising: the state of the art. Inf Process Manag 43(6):1449–1481CrossRef
6.
Zurück zum Zitat Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66CrossRef Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66CrossRef
7.
Zurück zum Zitat Yao J-G, Wan X, Xiao J (2017) Recent advances in document summarization. Knowl Inform Syst 53(2):297–336CrossRef Yao J-G, Wan X, Xiao J (2017) Recent advances in document summarization. Knowl Inform Syst 53(2):297–336CrossRef
8.
Zurück zum Zitat Reeve L, Han H, Brooks AD (2006) BioChain: lexical chaining methods for biomedical text summarization. In: Proceedings of the 2006 ACM symposium on applied computing. ACM, pp 180–184 Reeve L, Han H, Brooks AD (2006) BioChain: lexical chaining methods for biomedical text summarization. In: Proceedings of the 2006 ACM symposium on applied computing. ACM, pp 180–184
9.
Zurück zum Zitat Reeve LH, Han H, Brooks AD (2007) The use of domain-specific concepts in biomedical text summarization. Inf Process Manag 43(6):1765–1776CrossRef Reeve LH, Han H, Brooks AD (2007) The use of domain-specific concepts in biomedical text summarization. Inf Process Manag 43(6):1765–1776CrossRef
10.
Zurück zum Zitat Plaza L, Díaz A, Gervás P (2011) A semantic graph-based approach to biomedical summarization. Artif Intell Med 53(1):1–14CrossRef Plaza L, Díaz A, Gervás P (2011) A semantic graph-based approach to biomedical summarization. Artif Intell Med 53(1):1–14CrossRef
11.
Zurück zum Zitat Davoodijam E, Ghadiri N, LotfiShahreza M, Rinaldi F (2021) MultiGBS: a multi-layer graph approach to biomedical summarization. J Biomed Inf 116:103706CrossRef Davoodijam E, Ghadiri N, LotfiShahreza M, Rinaldi F (2021) MultiGBS: a multi-layer graph approach to biomedical summarization. J Biomed Inf 116:103706CrossRef
12.
Zurück zum Zitat Agrawal R, Imielinski T (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22(2):207–216CrossRef Agrawal R, Imielinski T (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22(2):207–216CrossRef
14.
Zurück zum Zitat Nelson SJ, Powell T, Humphreys BL (2002) The unified medical language system (UMLSs) project, in encyclopedia of library and information science, 3rd edn. CRC Press, Florida Nelson SJ, Powell T, Humphreys BL (2002) The unified medical language system (UMLSs) project, in encyclopedia of library and information science, 3rd edn. CRC Press, Florida
15.
Zurück zum Zitat LinCY (2004) Rouge: a package for automatic evaluation of summaries. In: Proceedings of workshop on text summarization branches out. Post-conference workshop of ACL. pp 74–81 LinCY (2004) Rouge: a package for automatic evaluation of summaries. In: Proceedings of workshop on text summarization branches out. Post-conference workshop of ACL. pp 74–81
16.
Zurück zum Zitat Hovy E (2005) Automated text summarization. The Oxford handbook of computational linguistics. Oxford University Press, Oxford, pp 583–598 Hovy E (2005) Automated text summarization. The Oxford handbook of computational linguistics. Oxford University Press, Oxford, pp 583–598
19.
Zurück zum Zitat Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2:258–268 Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2:258–268
20.
Zurück zum Zitat Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37(1):1–41CrossRef Lloret E, Palomar M (2012) Text summarisation in progress: a literature review. Artif Intell Rev 37(1):1–41CrossRef
21.
Zurück zum Zitat Fiszman M, Demner-Fushman D, Kilicoglu H, Rindflesch TC (2009) Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation. J Biomed Inform 42(5):801–813CrossRef Fiszman M, Demner-Fushman D, Kilicoglu H, Rindflesch TC (2009) Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation. J Biomed Inform 42(5):801–813CrossRef
23.
Zurück zum Zitat Brandow R, Mitze K, Rau LF (1995) Automatic condensation of electronic publicationsby sentence selection. Inf Process Manag 31(5):675–685CrossRef Brandow R, Mitze K, Rau LF (1995) Automatic condensation of electronic publicationsby sentence selection. Inf Process Manag 31(5):675–685CrossRef
24.
Zurück zum Zitat Anton H (1994) Elementary linear algebra. Wiley, New JerseyMATH Anton H (1994) Elementary linear algebra. Wiley, New JerseyMATH
25.
Zurück zum Zitat Jaccard P (1901) Etude de la distribution floraledansune portion des Alpes et du Jura. Bull Soc Vaud Des Sci Nat 37:547–579 Jaccard P (1901) Etude de la distribution floraledansune portion des Alpes et du Jura. Bull Soc Vaud Des Sci Nat 37:547–579
26.
Zurück zum Zitat Singhal A (2001) Modern information retrieval: a brief overview. IEEE Comput Soc Tech Comm Data Eng 24:35–42 Singhal A (2001) Modern information retrieval: a brief overview. IEEE Comput Soc Tech Comm Data Eng 24:35–42
27.
Zurück zum Zitat Radev DR, Jing H, Budzikowska M (2000) Centroid-based summarization of multiple documents Sentence extraction, utility-based evaluation, and user studies. Inf Process Manag 40(10):919–938 Radev DR, Jing H, Budzikowska M (2000) Centroid-based summarization of multiple documents Sentence extraction, utility-based evaluation, and user studies. Inf Process Manag 40(10):919–938
28.
Zurück zum Zitat Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28:11–21CrossRef Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28:11–21CrossRef
29.
Zurück zum Zitat Erkan G, Radev DR (2004) Lexrank: Graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479CrossRef Erkan G, Radev DR (2004) Lexrank: Graph-based lexical centrality as salience in text summarization. J Artif Intell Res 22:457–479CrossRef
30.
Zurück zum Zitat Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: Bringing order to the web. Stanford InfoLab, California Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: Bringing order to the web. Stanford InfoLab, California
31.
Zurück zum Zitat Mihalcea R, Tarau P (2004) TextRank: bringing order into texts, proceedings of EMNLP, vol 85. pp 404–411 Mihalcea R, Tarau P (2004) TextRank: bringing order into texts, proceedings of EMNLP, vol 85. pp 404–411
32.
Zurück zum Zitat Baralis E,Cagliero L, Jabeen S, Fiori A (2012) Multi-document summarization exploiting frequent itemsets. In: Proceedings of the 27th annual ACM Symposium on Applied Computing, pp 782–786 Baralis E,Cagliero L, Jabeen S, Fiori A (2012) Multi-document summarization exploiting frequent itemsets. In: Proceedings of the 27th annual ACM Symposium on Applied Computing, pp 782–786
33.
Zurück zum Zitat Baralis E, Cagliero L, Fiori A, Garza P (2015) MWI-Sum: a multilingual summarizer based on frequent weighted item sets. ACM Trans Inf Syst 34:1–35CrossRef Baralis E, Cagliero L, Fiori A, Garza P (2015) MWI-Sum: a multilingual summarizer based on frequent weighted item sets. ACM Trans Inf Syst 34:1–35CrossRef
34.
Zurück zum Zitat Qiang JP, Chen P, Ding W, Xie F, Wu X (2016) Multi-document summarization using closed patterns. Knowl-Based Syst 99:28–38CrossRef Qiang JP, Chen P, Ding W, Xie F, Wu X (2016) Multi-document summarization using closed patterns. Knowl-Based Syst 99:28–38CrossRef
35.
Zurück zum Zitat Dzuganova B (2013) English medical terminology–different ways of forming medical terms. JAHR Eur J Bioeth 4:55–69 Dzuganova B (2013) English medical terminology–different ways of forming medical terms. JAHR Eur J Bioeth 4:55–69
36.
Zurück zum Zitat Moradi M, Ghadiri N (2017) Quantifying the informativeness for biomedical literature summarization: an item-set mining method. Comput Methods Program Biomed 146:77–89CrossRef Moradi M, Ghadiri N (2017) Quantifying the informativeness for biomedical literature summarization: an item-set mining method. Comput Methods Program Biomed 146:77–89CrossRef
37.
Zurück zum Zitat Shortliffe EH, Cimino JJ (2014) Biomedical informatics: computer applications in health care and biomedicine, 4th ed. Springer, LondonCrossRef Shortliffe EH, Cimino JJ (2014) Biomedical informatics: computer applications in health care and biomedicine, 4th ed. Springer, LondonCrossRef
38.
Zurück zum Zitat Alami N, Meknassi M, En-nahnahi N, El Adlouni Y, Ammor O (2021) Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modelling. Expert Syst Appl 172:114652CrossRef Alami N, Meknassi M, En-nahnahi N, El Adlouni Y, Ammor O (2021) Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modelling. Expert Syst Appl 172:114652CrossRef
39.
Zurück zum Zitat National B, Us M (2009) UMLS Rreference manual. Health (San Francisco) National B, Us M (2009) UMLS Rreference manual. Health (San Francisco)
40.
Zurück zum Zitat Ordonez C, Ezquerra N, Santana CA (2006) Constraining and summarizing association rules in medical data. Knowl Inf Syst 9(3):1–2CrossRef Ordonez C, Ezquerra N, Santana CA (2006) Constraining and summarizing association rules in medical data. Knowl Inf Syst 9(3):1–2CrossRef
41.
Zurück zum Zitat Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(90001):D267–D270CrossRef Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(90001):D267–D270CrossRef
42.
Zurück zum Zitat Plaza L, Carrillo-de-Albornoz J (2013) Evaluating the use of different positional strategies for sentence selection in biomedical literature summarization. BMC Bioinform 14(1):71CrossRef Plaza L, Carrillo-de-Albornoz J (2013) Evaluating the use of different positional strategies for sentence selection in biomedical literature summarization. BMC Bioinform 14(1):71CrossRef
43.
Zurück zum Zitat Nigam K, McCullam A, Thrun S, Mitchell TM (2000) Text classification from labeled and unlabeled document using em. Mach Learn 39(2/3):103–134CrossRefMATH Nigam K, McCullam A, Thrun S, Mitchell TM (2000) Text classification from labeled and unlabeled document using em. Mach Learn 39(2/3):103–134CrossRefMATH
44.
Zurück zum Zitat Jones KS, Galliers JR (1996) evaluating natural language processing systems: an analysis and review, vol 228. Springer, New York Jones KS, Galliers JR (1996) evaluating natural language processing systems: an analysis and review, vol 228. Springer, New York
45.
Zurück zum Zitat Baralis E, Cagliero L, Mahoto N, Fiori A (2013) GraphSum: discovering correlations among multiple terms for graph-based summarization. Inf Sci 249:96–109MathSciNetCrossRef Baralis E, Cagliero L, Mahoto N, Fiori A (2013) GraphSum: discovering correlations among multiple terms for graph-based summarization. Inf Sci 249:96–109MathSciNetCrossRef
47.
Zurück zum Zitat Azadani MN, Ghadiri N, Davoodijam E (2018) Graph-based biomedical text summarization an itemset mining and sentence clustering approach. J Biomed Inf 84:1532–2464 Azadani MN, Ghadiri N, Davoodijam E (2018) Graph-based biomedical text summarization an itemset mining and sentence clustering approach. J Biomed Inf 84:1532–2464
48.
Zurück zum Zitat Rouane O, Belhadef H, Bouakkaz M (2019) Combine clustering and frequent itemsets mining to enhance biomedical text summarization. Expert Syst Appl J 135:362–373CrossRef Rouane O, Belhadef H, Bouakkaz M (2019) Combine clustering and frequent itemsets mining to enhance biomedical text summarization. Expert Syst Appl J 135:362–373CrossRef
49.
Zurück zum Zitat Moradi M (2018) CIBS a biomedical text summarizer using topic-based sentence clustering. J Biomed Inf 88:53–61CrossRef Moradi M (2018) CIBS a biomedical text summarizer using topic-based sentence clustering. J Biomed Inf 88:53–61CrossRef
Metadaten
Titel
Frequent item-set mining and clustering based ranked biomedical text summarization
verfasst von
Supriya Gupta
Aakanksha Sharaff
Naresh Kumar Nagwani
Publikationsdatum
04.07.2022
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 1/2023
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-022-04578-1

Weitere Artikel der Ausgabe 1/2023

The Journal of Supercomputing 1/2023 Zur Ausgabe