Skip to main content
Erschienen in:

22.09.2023

Hybrid medical named entity recognition using document structure and surrounding context

verfasst von: Mohamed Yassine Landolsi, Lotfi Ben Romdhane, Lobna Hlaoua

Erschienen in: The Journal of Supercomputing | Ausgabe 4/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nowadays, there is a huge amount of electronic medical documents created in natural language by medical specialists, containing useful information needed for several medical tasks. However, reading these documents to get some specific information is a too tiring task. Thus, extracting information automatically became an essential and a challenging task, especially Named Entity Recognition (NER). NER is crucial for extracting valuable information used in various medical tasks such as clinical decision support and drug safety surveillance. Capturing sufficient context is important for an efficient NER. In the literature, some important context information are not well exploited. Usually, a standard sequence segmentation is used, such as sentence segmentation, which may can’t cover sufficient context. In this paper, we propose a supervised NER method, called MedSINE (Medical Section Identification to enhance the Named Entity tagging), which is based on sequence tagging task using Bidirectional Long Short-Term Memory neural network with Conditional Random Field (BiLSTM-CRF). For that, we exploit layout information to segment the text on chunk sequences and to extract the parent sections of each word as features to provide sufficient context. In addition, we have used a clinical Bidirectional Encoder Representations from Transformers (BERT) word embedding, Part of Speech (PoS), and entity surrounding sequence features. Experiments were conducted on a manually annotated dataset of real Summary of Product Characteristics (SmPC) medical documents in PDF format and on the Colorado Richly Annotated Full Text (CRAFT) corpus. Our model achieved an F1-measure of \(89.49\%\) and \(73.52\%\) in terms of strict matching evaluation using the SmPC and CRAFT datasets, respectively. The results show that employing the sequence of parent sections improves the F1-measure by \(4.71\%\) in terms of strict matching evaluation.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alsentzer E, Murphy J, Boag W, et al (2019) Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop. Association for Computational Linguistics, Minneapolis, Minnesota, USA, pp 72–78 Alsentzer E, Murphy J, Boag W, et al (2019) Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop. Association for Computational Linguistics, Minneapolis, Minnesota, USA, pp 72–78
2.
Zurück zum Zitat Bada M, Eckert M, Evans D et al (2012) Concept annotation in the craft corpus. BMC Bioinform 13:161CrossRef Bada M, Eckert M, Evans D et al (2012) Concept annotation in the craft corpus. BMC Bioinform 13:161CrossRef
3.
4.
Zurück zum Zitat Cai X, Dong S, Hu J (2019) A deep learning model incorporating part of speech and self-matching attention for named entity recognition of chinese electronic medical records. BMC Med Inform Decis Mak 19(2):101–109 Cai X, Dong S, Hu J (2019) A deep learning model incorporating part of speech and self-matching attention for named entity recognition of chinese electronic medical records. BMC Med Inform Decis Mak 19(2):101–109
5.
Zurück zum Zitat Chirila OS, Chirila CB, Stoicu-Tivadar L (2019) Named entity recognition and classification for medical prospectuses. Stud Health Technol Inform 262:284–287PubMed Chirila OS, Chirila CB, Stoicu-Tivadar L (2019) Named entity recognition and classification for medical prospectuses. Stud Health Technol Inform 262:284–287PubMed
6.
Zurück zum Zitat Deléger L, Névéol A (2014) Automatic identification of document sections for designing a french clinical corpus (identification automatique de zones dans des documents pour la constitution d’un corpus médical en français) [in french]. In: TALN Deléger L, Névéol A (2014) Automatic identification of document sections for designing a french clinical corpus (identification automatique de zones dans des documents pour la constitution d’un corpus médical en français) [in french]. In: TALN
7.
Zurück zum Zitat Deng N, Fu H, Chen X (2021) Named entity recognition of traditional chinese medicine patents based on bilstm-crf. Wirel Commun Mobile Comput 2021:1–12 Deng N, Fu H, Chen X (2021) Named entity recognition of traditional chinese medicine patents based on bilstm-crf. Wirel Commun Mobile Comput 2021:1–12
8.
Zurück zum Zitat Dong H, Suárez-Paniagua V, Zhang H, et al (2022) Ontology-based and weakly supervised rare disease phenotyping from clinical notes. arXiv preprint arXiv:2205.05656 Dong H, Suárez-Paniagua V, Zhang H, et al (2022) Ontology-based and weakly supervised rare disease phenotyping from clinical notes. arXiv preprint arXiv:​2205.​05656
9.
Zurück zum Zitat Fudholi DH, Nayoan RAN, Hidayatullah AF et al (2022) A hybrid cnn-bilstm model for drug named entity recognition. J Eng Sci Technol 17(1):0730–0744 Fudholi DH, Nayoan RAN, Hidayatullah AF et al (2022) A hybrid cnn-bilstm model for drug named entity recognition. J Eng Sci Technol 17(1):0730–0744
10.
Zurück zum Zitat Ghiasvand O, Kate RJ (2018) Learning for clinical named entity recognition without manual annotations. Inform Med Unlocked 13:122–127CrossRef Ghiasvand O, Kate RJ (2018) Learning for clinical named entity recognition without manual annotations. Inform Med Unlocked 13:122–127CrossRef
11.
Zurück zum Zitat Landolsi MY, Ben Romdhane L, Hlaoua L (2022) Medical named entity recognition using surrounding sequences matching. In: 26th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, Elsevier Landolsi MY, Ben Romdhane L, Hlaoua L (2022) Medical named entity recognition using surrounding sequences matching. In: 26th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, Elsevier
12.
Zurück zum Zitat Landolsi MY, Hlaoua L, Ben Romdhane L (2023) Information extraction from electronic medical documents: state of the art and future research directions. Knowl Inf Syst 65(2):463–516CrossRefPubMed Landolsi MY, Hlaoua L, Ben Romdhane L (2023) Information extraction from electronic medical documents: state of the art and future research directions. Knowl Inf Syst 65(2):463–516CrossRefPubMed
13.
Zurück zum Zitat Landolsi MY, Hlaoua L, Romdhane LB (2023) Hybrid method to automatically extract medical document tree structure. Eng Appl Artif Intell 120(105):922 Landolsi MY, Hlaoua L, Romdhane LB (2023) Hybrid method to automatically extract medical document tree structure. Eng Appl Artif Intell 120(105):922
14.
Zurück zum Zitat Lauriola I, Sella R, Aiolli F, et al (2018) Learning representations for biomedical named entity recognition Lauriola I, Sella R, Aiolli F, et al (2018) Learning representations for biomedical named entity recognition
15.
Zurück zum Zitat Lee EB, Heo GE, Choi CM, et al (2022) Mlm-based typographical error correction of unstructured medical texts for named entity recognition Lee EB, Heo GE, Choi CM, et al (2022) Mlm-based typographical error correction of unstructured medical texts for named entity recognition
16.
Zurück zum Zitat Lee J, Yoon W, Kim S et al (2020) Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234–1240CrossRefPubMed Lee J, Yoon W, Kim S et al (2020) Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4):1234–1240CrossRefPubMed
17.
Zurück zum Zitat Lei J, Tang B, Lu X et al (2014) A comprehensive study of named entity recognition in chinese clinical text. J Am Med Inf Assoc 21(5):808–814CrossRef Lei J, Tang B, Lu X et al (2014) A comprehensive study of named entity recognition in chinese clinical text. J Am Med Inf Assoc 21(5):808–814CrossRef
18.
Zurück zum Zitat Li J, Liu J, Chen Y, et al (2022) Multi-domain adaptation for named entity recognition with multi-aspect relevance learning. Language Resources and Evaluation pp 1–16 Li J, Liu J, Chen Y, et al (2022) Multi-domain adaptation for named entity recognition with multi-aspect relevance learning. Language Resources and Evaluation pp 1–16
19.
Zurück zum Zitat Lupşe O, Stoicu-Tivadar L (2018) Supporting prescriptions with synonym matching of section names in prospectuses. Stud Health Technol Inform 251:153–156PubMed Lupşe O, Stoicu-Tivadar L (2018) Supporting prescriptions with synonym matching of section names in prospectuses. Stud Health Technol Inform 251:153–156PubMed
20.
Zurück zum Zitat Michalopoulos G, Wang Y, Kaka H, et al (2021) UmlsBERT: Clinical domain knowledge augmentation of contextual embeddings using the Unified Medical Language System Metathesaurus. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, pp 1744–1753 Michalopoulos G, Wang Y, Kaka H, et al (2021) UmlsBERT: Clinical domain knowledge augmentation of contextual embeddings using the Unified Medical Language System Metathesaurus. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, pp 1744–1753
21.
Zurück zum Zitat Nayel HA, et al (2019) Integrating dictionary feature into a deep learning model for disease named entity recognition. arXiv preprint arXiv:1911.01600 Nayel HA, et al (2019) Integrating dictionary feature into a deep learning model for disease named entity recognition. arXiv preprint arXiv:​1911.​01600
22.
Zurück zum Zitat Pomares-Quimbaya A, Kreuzthaler M, Schulz S (2019) Current approaches to identify sections within clinical narratives from electronic health records: a systematic review. BMC Med Res Method 19(1):155CrossRef Pomares-Quimbaya A, Kreuzthaler M, Schulz S (2019) Current approaches to identify sections within clinical narratives from electronic health records: a systematic review. BMC Med Res Method 19(1):155CrossRef
23.
Zurück zum Zitat Popovski G, Seljak BK, Eftimov T (2020) A survey of named-entity recognition methods for food information extraction. IEEE Access 8:31586–31594CrossRef Popovski G, Seljak BK, Eftimov T (2020) A survey of named-entity recognition methods for food information extraction. IEEE Access 8:31586–31594CrossRef
24.
Zurück zum Zitat Ramachandran R, Arutchelvan K (2022) ArRaNER: a novel named entity recognition model for biomedical literature documents. J Supercomput 78(14):16498–16511CrossRef Ramachandran R, Arutchelvan K (2022) ArRaNER: a novel named entity recognition model for biomedical literature documents. J Supercomput 78(14):16498–16511CrossRef
25.
Zurück zum Zitat Rodrigues da Silva J, Caseli HdM (2021) Sense representations for portuguese: experiments with sense embeddings and deep neural language models. Lang Resour Eval 55(4):901–924CrossRef Rodrigues da Silva J, Caseli HdM (2021) Sense representations for portuguese: experiments with sense embeddings and deep neural language models. Lang Resour Eval 55(4):901–924CrossRef
26.
Zurück zum Zitat Sui Y, Bu F, Hu Y, et al (2022) Trigger-GNN: a Trigger-Based graph neural network for nested named entity recognition. 2204.05518 Sui Y, Bu F, Hu Y, et al (2022) Trigger-GNN: a Trigger-Based graph neural network for nested named entity recognition. 2204.05518
28.
Zurück zum Zitat Sun W, Cai Z, Li Y et al (2018) Data processing and text mining technologies on electronic medical records: a review. J Healthcare Eng 2018:4302425CrossRef Sun W, Cai Z, Li Y et al (2018) Data processing and text mining technologies on electronic medical records: a review. J Healthcare Eng 2018:4302425CrossRef
29.
30.
Zurück zum Zitat Wang C, Gao J, Rao H, et al (2022) Named entity recognition (ner) for chinese agricultural diseases and pests based on discourse topic and attention mechanism. Evolutionary Intelligence pp 1–10 Wang C, Gao J, Rao H, et al (2022) Named entity recognition (ner) for chinese agricultural diseases and pests based on discourse topic and attention mechanism. Evolutionary Intelligence pp 1–10
31.
Zurück zum Zitat Wu H, Toti G, Morley KI et al (2018) SemEHR: a general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research*. J Am Med Inform Assoc 25(5):530–537CrossRefPubMedPubMedCentral Wu H, Toti G, Morley KI et al (2018) SemEHR: a general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research*. J Am Med Inform Assoc 25(5):530–537CrossRefPubMedPubMedCentral
32.
Zurück zum Zitat Wu Y, Jiang M, Xu J, et al (2017) Clinical named entity recognition using deep learning models. In: AMIA Annual Symposium Proceedings, American Medical Informatics Association, p 1812 Wu Y, Jiang M, Xu J, et al (2017) Clinical named entity recognition using deep learning models. In: AMIA Annual Symposium Proceedings, American Medical Informatics Association, p 1812
33.
Zurück zum Zitat Xu J, Gan L, Cheng M et al (2018) Unsupervised medical entity recognition and linking in chinese online medical text. J Healthcare Eng 2018:2548537CrossRef Xu J, Gan L, Cheng M et al (2018) Unsupervised medical entity recognition and linking in chinese online medical text. J Healthcare Eng 2018:2548537CrossRef
34.
Zurück zum Zitat Yang X, Yu Z, Guo Y, et al (2021) Clinical relation extraction using transformer-based models. arXiv preprint arXiv:2107.08957 Yang X, Yu Z, Guo Y, et al (2021) Clinical relation extraction using transformer-based models. arXiv preprint arXiv:​2107.​08957
35.
Zurück zum Zitat Zhang S, Elhadad N (2013) Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts. J Biomed Inform 46(6):1088–1098CrossRefPubMed Zhang S, Elhadad N (2013) Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts. J Biomed Inform 46(6):1088–1098CrossRefPubMed
36.
Zurück zum Zitat Zhao X, Ding H, Feng Z (2021) GLaRA: Graph-based labeling rule augmentation for weakly supervised named entity recognition. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, pp 3636–3649 Zhao X, Ding H, Feng Z (2021) GLaRA: Graph-based labeling rule augmentation for weakly supervised named entity recognition. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, Online, pp 3636–3649
37.
Zurück zum Zitat Zhou Y, Ju C, Caufield JH, et al (2021) Clinical named entity recognition using contextualized token representations. arXiv preprint arXiv:2106.12608 Zhou Y, Ju C, Caufield JH, et al (2021) Clinical named entity recognition using contextualized token representations. arXiv preprint arXiv:​2106.​12608
38.
Zurück zum Zitat Zong J, Han J (2022) Entity recognition of chinese electronic medical record based on gated graph neural network. In: 2022 14th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), IEEE, pp 1208–1213 Zong J, Han J (2022) Entity recognition of chinese electronic medical record based on gated graph neural network. In: 2022 14th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), IEEE, pp 1208–1213
Metadaten
Titel
Hybrid medical named entity recognition using document structure and surrounding context
verfasst von
Mohamed Yassine Landolsi
Lotfi Ben Romdhane
Lobna Hlaoua
Publikationsdatum
22.09.2023
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 4/2024
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-023-05647-9

Weitere Artikel der Ausgabe 4/2024

The Journal of Supercomputing 4/2024 Zur Ausgabe