Skip to main content
Top
Published in: Pattern Analysis and Applications 4/2023

29-09-2023 | Theoretical Advances

Applying unsupervised keyphrase methods on concepts extracted from discharge sheets

Authors: Hoda Memarzadeh, Nasser Ghadiri, Matthias Samwald, Maryam Lotfi Shahreza

Published in: Pattern Analysis and Applications | Issue 4/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Clinical notes contain valuable patient information. These notes are written by health care providers with various scientific levels and writing styles. It might be helpful for clinicians and researchers to understand what information is essential when dealing with extensive electronic medical records. Entities recognizing them and mapping them to standard terminologies is crucial to reducing ambiguity in processing clinical notes. Although named entity recognition and entity linking are critical steps in clinical natural language processing, they can produce repetitive and low-value concepts. On the other hand, all parts of a clinical text do not share the same importance or content in predicting the patient's condition. As a result, it is necessary to identify the section in which each content item is recorded and critical concepts to extract meaning from clinical texts. In this study, these challenges have been addressed by using clinical natural language processing techniques. In addition, a set of unsupervised essential phrase extraction methods has been verified and evaluated to identify key concepts. Considering that most clinical concepts are in the form of multi-word expressions and their accurate identification requires the user to specify an n-gram range, we have proposed a shortcut method to preserve the structure of the term based on TF-IDF (Term Frequency Inverse Document Frequency). To evaluate, we have designed two types of downstream tasks (multiple and binary classification) using the capabilities of transformer-based models. The results show the proposed method's superiority in combination with the SciBERT model. Also, they offer an insight into the efficacy of general methods for extracting essential phrases from clinical notes.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
International statistical classification of diseases and related health problems.
 
2
The normalized naming system for generic and branded drugs.
 
3
Logical Observation Identifiers Names and Codes.
 
Literature
1.
go back to reference Dalianis H (2018) Clinical text mining: secondary use of electronic patient records. Springer, ChamCrossRef Dalianis H (2018) Clinical text mining: secondary use of electronic patient records. Springer, ChamCrossRef
2.
go back to reference Holzinger A, Haibe-Kains B, Jurisica I (2019) Why imaging data alone is not enough: AI-based integration of imaging, omics, and clinical data. Eur J Nucl Med Mol Imaging 46(13):2722–2730CrossRef Holzinger A, Haibe-Kains B, Jurisica I (2019) Why imaging data alone is not enough: AI-based integration of imaging, omics, and clinical data. Eur J Nucl Med Mol Imaging 46(13):2722–2730CrossRef
3.
go back to reference Yadav P, Steinbach M, Kumar V, Simon G (2018) Mining Electronic Health Records (EHRs) A Survey. ACM Comput Surv 50(6):1–40CrossRef Yadav P, Steinbach M, Kumar V, Simon G (2018) Mining Electronic Health Records (EHRs) A Survey. ACM Comput Surv 50(6):1–40CrossRef
5.
go back to reference Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V (2019) Natural language processing of clinical notes on chronic diseases: systematic review. JMIR Med informatics 7(2):e12239CrossRef Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V (2019) Natural language processing of clinical notes on chronic diseases: systematic review. JMIR Med informatics 7(2):e12239CrossRef
6.
go back to reference Liu Z, Lin Y, Sun M (2020) “Document representation bt - representation learning for natural language processing. Springer, Singapore, pp 91–123 Liu Z, Lin Y, Sun M (2020) “Document representation bt - representation learning for natural language processing. Springer, Singapore, pp 91–123
7.
go back to reference Sammut C, Webb GI (2010) TF–IDF BT-encyclopedia of machine learning. Springer, Boston, pp 986–987CrossRef Sammut C, Webb GI (2010) TF–IDF BT-encyclopedia of machine learning. Springer, Boston, pp 986–987CrossRef
11.
go back to reference Eyre H et al (2022) Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python. AMIA Annu Symp Proc 2021:438–447 Eyre H et al (2022) Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python. AMIA Annu Symp Proc 2021:438–447
13.
go back to reference Neumann M, King D, Beltagy I, Ammar W (2019) ScispaCy: Fast and robust models for biomedical natural language processing. BioNLP 2019 - SIGBioMed Work. Biomed. Nat. Lang. Process. Proc. 18th BioNLP Work. Shar. Task, (pp. 319–327). https://doi.org/10.18653/v1/w19-5034. Neumann M, King D, Beltagy I, Ammar W (2019) ScispaCy: Fast and robust models for biomedical natural language processing. BioNLP 2019 - SIGBioMed Work. Biomed. Nat. Lang. Process. Proc. 18th BioNLP Work. Shar. Task, (pp. 319–327). https://​doi.​org/​10.​18653/​v1/​w19-5034.
15.
go back to reference Boudin F (2016) PKE: an open source python-based keyphrase extraction toolkit. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: system demonstrations, (pp. 69–73) [Online]. Available: https://github.com/boudinfl/pke Boudin F (2016) PKE: an open source python-based keyphrase extraction toolkit. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: system demonstrations, (pp. 69–73) [Online]. Available: https://​github.​com/​boudinfl/​pke
16.
go back to reference Mahata D, Kuriakose J, Shah R, Zimmermann R (2018) Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018, pp. 634–639 Mahata D, Kuriakose J, Shah R, Zimmermann R (2018) Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018, pp. 634–639
18.
go back to reference Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv Prepr. arXiv1810.04805 Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv Prepr. arXiv1810.04805
19.
go back to reference Gu Y et al (2021) Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthc 3(1):1–23CrossRef Gu Y et al (2021) Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthc 3(1):1–23CrossRef
20.
go back to reference Beltagy I, Lo K, Cohan A (2019) SCIBERT: A pretrained language model for scientific text. EMNLP-IJCNLP 2019 - 2019 Conf. Empir. Methods Nat. Lang. Process. 9th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., pp. 3615–3620. https://doi.org/10.18653/v1/d19-1371 Beltagy I, Lo K, Cohan A (2019) SCIBERT: A pretrained language model for scientific text. EMNLP-IJCNLP 2019 - 2019 Conf. Empir. Methods Nat. Lang. Process. 9th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., pp. 3615–3620. https://​doi.​org/​10.​18653/​v1/​d19-1371
21.
go back to reference Yogarajan V, Montiel J, Smith T, Pfahringer B (2021) Transformers for multi-label classification of medical text: an empirical comparison. In International Conference on Artificial Intelligence in Medicine, pp. 114–123 Yogarajan V, Montiel J, Smith T, Pfahringer B (2021) Transformers for multi-label classification of medical text: an empirical comparison. In International Conference on Artificial Intelligence in Medicine, pp. 114–123
22.
go back to reference Yogarajan V (2022) Domain-specific language models for multi-label classification of medical text. The University of Waikato, New Zealand Yogarajan V (2022) Domain-specific language models for multi-label classification of medical text. The University of Waikato, New Zealand
24.
go back to reference Schopf T, Klimek S, Matthes F (2022) PatternRank: leveraging pretrained language models and part of speech for unsupervised keyphrase extraction. arXiv Prepr. arXiv2210.05245, 2022 Schopf T, Klimek S, Matthes F (2022) PatternRank: leveraging pretrained language models and part of speech for unsupervised keyphrase extraction. arXiv Prepr. arXiv2210.05245, 2022
25.
go back to reference Peng Y, Yan S, Lu Z (2019) Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. BioNLP 2019 - SIGBioMed Work. Biomed. Nat. Lang. Process. Proc. 18th BioNLP Work. Shar. Task, pp. 58–65. https://doi.org/10.18653/v1/w19-5006. Peng Y, Yan S, Lu Z (2019) Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. BioNLP 2019 - SIGBioMed Work. Biomed. Nat. Lang. Process. Proc. 18th BioNLP Work. Shar. Task, pp. 58–65. https://​doi.​org/​10.​18653/​v1/​w19-5006.
26.
go back to reference Michalopoulos G, Wang Y, Kaka H, Chen H, Wong A (2021) UmlsBERT: clinical domain knowledge augmentation of contextual embeddings using the unified medical language system metathesaurus. arXiv Prepr. arXiv2010.10391 Michalopoulos G, Wang Y, Kaka H, Chen H, Wong A (2021) UmlsBERT: clinical domain knowledge augmentation of contextual embeddings using the unified medical language system metathesaurus. arXiv Prepr. arXiv2010.10391
Metadata
Title
Applying unsupervised keyphrase methods on concepts extracted from discharge sheets
Authors
Hoda Memarzadeh
Nasser Ghadiri
Matthias Samwald
Maryam Lotfi Shahreza
Publication date
29-09-2023
Publisher
Springer London
Published in
Pattern Analysis and Applications / Issue 4/2023
Print ISSN: 1433-7541
Electronic ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-023-01198-0

Other articles of this Issue 4/2023

Pattern Analysis and Applications 4/2023 Go to the issue

Premium Partner