Top

Pattern Analysis and Applications

Published in:

29-09-2023 | Theoretical Advances

Applying unsupervised keyphrase methods on concepts extracted from discharge sheets

Authors: Hoda Memarzadeh, Nasser Ghadiri, Matthias Samwald, Maryam Lotfi Shahreza

Published in: Pattern Analysis and Applications | Issue 4/2023

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Clinical notes contain valuable patient information. These notes are written by health care providers with various scientific levels and writing styles. It might be helpful for clinicians and researchers to understand what information is essential when dealing with extensive electronic medical records. Entities recognizing them and mapping them to standard terminologies is crucial to reducing ambiguity in processing clinical notes. Although named entity recognition and entity linking are critical steps in clinical natural language processing, they can produce repetitive and low-value concepts. On the other hand, all parts of a clinical text do not share the same importance or content in predicting the patient's condition. As a result, it is necessary to identify the section in which each content item is recorded and critical concepts to extract meaning from clinical texts. In this study, these challenges have been addressed by using clinical natural language processing techniques. In addition, a set of unsupervised essential phrase extraction methods has been verified and evaluated to identify key concepts. Considering that most clinical concepts are in the form of multi-word expressions and their accurate identification requires the user to specify an n-gram range, we have proposed a shortcut method to preserve the structure of the term based on TF-IDF (Term Frequency Inverse Document Frequency). To evaluate, we have designed two types of downstream tasks (multiple and binary classification) using the capabilities of transformer-based models. The results show the proposed method's superiority in combination with the SciBERT model. Also, they offer an insight into the efficacy of general methods for extracting essential phrases from clinical notes.

previous article A semantic-aware monocular projection model for accurate pose measurement

next article SAFPN: a full semantic feature pyramid network for object detection

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

International statistical classification of diseases and related health problems.

The normalized naming system for generic and branded drugs.

Logical Observation Identifiers Names and Codes.

https://github.com/HodaMemar/A3.

Dalianis H (2018) Clinical text mining: secondary use of electronic patient records. Springer, ChamCrossRef

Holzinger A, Haibe-Kains B, Jurisica I (2019) Why imaging data alone is not enough: AI-based integration of imaging, omics, and clinical data. Eur J Nucl Med Mol Imaging 46(13):2722–2730CrossRef

Yadav P, Steinbach M, Kumar V, Simon G (2018) Mining Electronic Health Records (EHRs) A Survey. ACM Comput Surv 50(6):1–40CrossRef

Ford E, Carroll JA, Smith HE, Scott D, Cassell JA (2016) Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Informatics Assoc 23(5):1007–1015. https://doi.org/10.1093/jamia/ocv180CrossRef

Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V (2019) Natural language processing of clinical notes on chronic diseases: systematic review. JMIR Med informatics 7(2):e12239CrossRef

Liu Z, Lin Y, Sun M (2020) “Document representation bt - representation learning for natural language processing. Springer, Singapore, pp 91–123

Sammut C, Webb GI (2010) TF–IDF BT-encyclopedia of machine learning. Springer, Boston, pp 986–987CrossRef

Darabi S, Kachuee M, Fazeli S, Sarrafzadeh M (2020) TAPER: Time-aware patient EHR representation. IEEE J Biomed Heal Informatics 24(11):3268–3275. https://doi.org/10.1109/JBHI.2020.2984931CrossRef

Sushil M, Šuster S, Luyckx K, Daelemans W (2018) Patient representation learning and interpretable evaluation using clinical notes. J Biomed Inform 84:103–113. https://doi.org/10.1016/j.jbi.2018.06.016CrossRef

10.

Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, (pp. 785–794) https://doi.org/10.1145/2939672.2939785

11.

Eyre H et al (2022) Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python. AMIA Annu Symp Proc 2021:438–447

12.

Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG (2001) A Simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34(5):301–310. https://doi.org/10.1006/jbin.2001.1029CrossRef

13.

Neumann M, King D, Beltagy I, Ammar W (2019) ScispaCy: Fast and robust models for biomedical natural language processing. BioNLP 2019 - SIGBioMed Work. Biomed. Nat. Lang. Process. Proc. 18th BioNLP Work. Shar. Task, (pp. 319–327). https://doi.org/10.18653/v1/w19-5034.

14.

“sklearn.feature_extraction.text.TfidfVectorizer.” https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

15.

Boudin F (2016) PKE: an open source python-based keyphrase extraction toolkit. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: system demonstrations, (pp. 69–73) [Online]. Available: https://github.com/boudinfl/pke

16.

Mahata D, Kuriakose J, Shah R, Zimmermann R (2018) Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In Proceedings of the 2018 conference of the North American chapter of the association for computational linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018, pp. 634–639

17.

Maarten Grootendorst, “Keyword Extraction with BERT,” Towar. Data Sci., 2020, [Online]. Available: https://towardsdatascience.com/keyword-extraction-with-bert-724efca412ea

18.

Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv Prepr. arXiv1810.04805

19.

Gu Y et al (2021) Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthc 3(1):1–23CrossRef

20.

Beltagy I, Lo K, Cohan A (2019) SCIBERT: A pretrained language model for scientific text. EMNLP-IJCNLP 2019 - 2019 Conf. Empir. Methods Nat. Lang. Process. 9th Int. Jt. Conf. Nat. Lang. Process. Proc. Conf., pp. 3615–3620. https://doi.org/10.18653/v1/d19-1371

21.

Yogarajan V, Montiel J, Smith T, Pfahringer B (2021) Transformers for multi-label classification of medical text: an empirical comparison. In International Conference on Artificial Intelligence in Medicine, pp. 114–123

22.

Yogarajan V (2022) Domain-specific language models for multi-label classification of medical text. The University of Waikato, New Zealand

23.

Duque A, Fabregat H, Araujo L, Martinez-Romo J (2021) A keyphrase-based approach for interpretable ICD-10 code classification of Spanish medical reports. Artif. Intell. Med. 121:102177. https://doi.org/10.1016/j.artmed.2021.102177CrossRef

24.

Schopf T, Klimek S, Matthes F (2022) PatternRank: leveraging pretrained language models and part of speech for unsupervised keyphrase extraction. arXiv Prepr. arXiv2210.05245, 2022

25.

Peng Y, Yan S, Lu Z (2019) Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. BioNLP 2019 - SIGBioMed Work. Biomed. Nat. Lang. Process. Proc. 18th BioNLP Work. Shar. Task, pp. 58–65. https://doi.org/10.18653/v1/w19-5006.

26.

Michalopoulos G, Wang Y, Kaka H, Chen H, Wong A (2021) UmlsBERT: clinical domain knowledge augmentation of contextual embeddings using the unified medical language system metathesaurus. arXiv Prepr. arXiv2010.10391

Title: Applying unsupervised keyphrase methods on concepts extracted from discharge sheets
Authors: Hoda Memarzadeh
Nasser Ghadiri
Matthias Samwald
Maryam Lotfi Shahreza
Publication date: 29-09-2023
Publisher: Springer London
Published in: Pattern Analysis and Applications / Issue 4/2023
Print ISSN: 1433-7541
Electronic ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-023-01198-0

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 4/2023

Adaptive frequency-based fully hyperbolic graph neural networks

A semantic-aware monocular projection model for accurate pose measurement

AdaHOSVD: an adaptive higher-order singular value decomposition method for point cloud denoising

Discriminative estimation of probabilistic context-free grammars for mathematical expression recognition and retrieval

DWT-CompCNN: deep image classification network for high throughput JPEG 2000 compressed documents

Unsupervised multimodal learning for image-text relation classification in tweets

Premium Partner