Skip to main content
Erschienen in: The Journal of Supercomputing 3/2020

16.01.2018

WCP-RNN: a novel RNN-based approach for Bio-NER in Chinese EMRs

Paper ID: FC_17_25

verfasst von: Jianqiang Li, Shenhe Zhao, Jijiang Yang, Zhisheng Huang, Bo Liu, Shi Chen, Hui Pan, Qing Wang

Erschienen in: The Journal of Supercomputing | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep learning has achieved remarkable success in a wide range of domains. However, it has not been comprehensively evaluated as a solution for the task of Chinese biomedical named entity recognition (Bio-NER). The traditional deep-learning approach for the Bio-NER task is usually based on the structure of recurrent neural networks (RNN) and only takes word embeddings into consideration, ignoring the value of character-level embeddings to encode the morphological and shape information. We propose an RNN-based approach, WCP-RNN, for the Chinese Bio-NER problem. Our method combines word embeddings and character embeddings to capture orthographic and lexicosemantic features. In addition, POS tags are involved as a priori word information to improve the final performance. The experimental results show our proposed approach outperforms the baseline method; the highest F-scores for subject and lesion detection tasks reach 90.36 and 90.48% with an increase of 3.10 and 2.60% compared with the baseline methods, respectively.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Yang J-J, Li J, Mulder J, Wang Y, Chen S, Wu H, Wang Q, Pan H (2015) Emerging information technologies for enhanced healthcare. Comput Ind 69:3–11CrossRef Yang J-J, Li J, Mulder J, Wang Y, Chen S, Wu H, Wang Q, Pan H (2015) Emerging information technologies for enhanced healthcare. Comput Ind 69:3–11CrossRef
2.
Zurück zum Zitat Zhang S, Elhadad N (2013) Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J Biomed Inform 46(6):1088–1098CrossRef Zhang S, Elhadad N (2013) Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J Biomed Inform 46(6):1088–1098CrossRef
3.
Zurück zum Zitat Mao R, Xu H, Wu W, Li J, Li Y, Lu M (2015) Overcoming the challenge of variety: big data abstraction, the next evolution of data management for AAL communication systems. IEEE Commun Mag 53(1):42–47CrossRef Mao R, Xu H, Wu W, Li J, Li Y, Lu M (2015) Overcoming the challenge of variety: big data abstraction, the next evolution of data management for AAL communication systems. IEEE Commun Mag 53(1):42–47CrossRef
4.
Zurück zum Zitat Lafferty J, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Enabling recognition of diseases in biomedical text with machine learning: corpus and benchmark Lafferty J, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Enabling recognition of diseases in biomedical text with machine learning: corpus and benchmark
5.
Zurück zum Zitat Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. J Mach Learn Res 6(Sep):1453–1484MathSciNetMATH Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. J Mach Learn Res 6(Sep):1453–1484MathSciNetMATH
6.
Zurück zum Zitat Mao R, Zhang P, Li X, Liu X, Lu M (2016) Pivot selection for metric-space indexing. Int J Mach Learn Cybern 7(2):311–323CrossRef Mao R, Zhang P, Li X, Liu X, Lu M (2016) Pivot selection for metric-space indexing. Int J Mach Learn Cybern 7(2):311–323CrossRef
7.
Zurück zum Zitat LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef
8.
Zurück zum Zitat Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems. pp 1799–1807 Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems. pp 1799–1807
9.
Zurück zum Zitat Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537MATH Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537MATH
10.
Zurück zum Zitat Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems. pp 3111–3119 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems. pp 3111–3119
11.
Zurück zum Zitat Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. pp 1631–1642 Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. pp 1631–1642
12.
Zurück zum Zitat Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:​1406.​1078
13.
Zurück zum Zitat Seok M, Song H-J, Park C-Y, Kim J-D, Kim Y-S (2016) Named entity recognition using word embedding as a feature. Int J Softw Eng Appl 10(2):93–104 Seok M, Song H-J, Park C-Y, Kim J-D, Kim Y-S (2016) Named entity recognition using word embedding as a feature. Int J Softw Eng Appl 10(2):93–104
14.
Zurück zum Zitat Sahu SK, Anand A (2016) Recurrent neural network models for disease name recognition using domain invariant features. arXiv preprint arXiv:1606.09371 Sahu SK, Anand A (2016) Recurrent neural network models for disease name recognition using domain invariant features. arXiv preprint arXiv:​1606.​09371
15.
Zurück zum Zitat Tang B, Cao H, Wang X, Chen Q, Xu H (2014) Evaluating word representation features in biomedical named entity recognition tasks. BioMed Res Int 2014:240403 Tang B, Cao H, Wang X, Chen Q, Xu H (2014) Evaluating word representation features in biomedical named entity recognition tasks. BioMed Res Int 2014:240403
16.
Zurück zum Zitat Li C, Song R, Liakata M, Vlachos A, Seneff S, Zhang X (2015) Using word embedding for bio-event extraction. In: Proceedings of the 2015 Workshop on Biomedical Natural Language Processing (BioNLP 2015). Association for Computational Linguistics, Stroudsburg, pp 121–126 Li C, Song R, Liakata M, Vlachos A, Seneff S, Zhang X (2015) Using word embedding for bio-event extraction. In: Proceedings of the 2015 Workshop on Biomedical Natural Language Processing (BioNLP 2015). Association for Computational Linguistics, Stroudsburg, pp 121–126
17.
Zurück zum Zitat Nie Y, Rong W, Zhang Y, Ouyang Y, Xiong Z (2015) Embedding assisted prediction architecture for event trigger identification. J Bioinform Comput Biol 13(03):1541001CrossRef Nie Y, Rong W, Zhang Y, Ouyang Y, Xiong Z (2015) Embedding assisted prediction architecture for event trigger identification. J Bioinform Comput Biol 13(03):1541001CrossRef
18.
Zurück zum Zitat Jagannatha AN, Yu H (2016) Bidirectional rnn for medical event detection in electronic health records. In: Proceedings of the Conference. Association for Computational Linguistics. North American Chapter. Meeting, 2016, 473. NIH Public Access Jagannatha AN, Yu H (2016) Bidirectional rnn for medical event detection in electronic health records. In: Proceedings of the Conference. Association for Computational Linguistics. North American Chapter. Meeting, 2016, 473. NIH Public Access
19.
Zurück zum Zitat Jagannatha AN, Yu H (2016) Structured prediction models for rnn based sequence labeling in clinical text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016, 856. NIH Public Access Jagannatha AN, Yu H (2016) Structured prediction models for rnn based sequence labeling in clinical text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2016, 856. NIH Public Access
20.
Zurück zum Zitat Lei J, Tang B, Lu X, Gao K, Jiang M, Xu H (2013) A comprehensive study of named entity recognition in chinese clinical text. J Am Med Inform Assoc 21(5):808–814CrossRef Lei J, Tang B, Lu X, Gao K, Jiang M, Xu H (2013) A comprehensive study of named entity recognition in chinese clinical text. J Am Med Inform Assoc 21(5):808–814CrossRef
21.
Zurück zum Zitat Yan Y, Wen D, Wang Y, Wang K (2014) Named entity recognition in chinese medical records based on cascaded conditional random field. J Jilin Univers Eng Technol Edn 6:048 Yan Y, Wen D, Wang Y, Wang K (2014) Named entity recognition in chinese medical records based on cascaded conditional random field. J Jilin Univers Eng Technol Edn 6:048
22.
Zurück zum Zitat Dong X, Qian L, Guan Y, Huang L, Yu Q, Yang J (2016) A multiclass classification method based on deep learning for named entity recognition in electronic medical records. In: Scientific data summit (NYSDS). IEEE, New York, pp 1–10 Dong X, Qian L, Guan Y, Huang L, Yu Q, Yang J (2016) A multiclass classification method based on deep learning for named entity recognition in electronic medical records. In: Scientific data summit (NYSDS). IEEE, New York, pp 1–10
23.
Zurück zum Zitat Wu Y, Jiang M, Lei J, Xu H (2015) Named entity recognition in chinese clinical text using deep neural network. Stud Health Technol Inform 216:624 Wu Y, Jiang M, Lei J, Xu H (2015) Named entity recognition in chinese clinical text using deep neural network. Stud Health Technol Inform 216:624
24.
Zurück zum Zitat Botha J, Blunsom P (2014) Compositional morphology for word representations and language modelling. In: International Conference on Machine Learning. pp 1899–1907 Botha J, Blunsom P (2014) Compositional morphology for word representations and language modelling. In: International Conference on Machine Learning. pp 1899–1907
25.
Zurück zum Zitat Chen X, Xu L, Liu Z, Sun M, Luan H-B (2015) Joint learning of character and word embeddings. In: IJCAI. pp 1236–1242 Chen X, Xu L, Liu Z, Sun M, Luan H-B (2015) Joint learning of character and word embeddings. In: IJCAI. pp 1236–1242
26.
Zurück zum Zitat Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166CrossRef Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166CrossRef
27.
Zurück zum Zitat Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRef
28.
Zurück zum Zitat Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18(5):602–610CrossRef Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw 18(5):602–610CrossRef
29.
Zurück zum Zitat Dyer C, Ballesteros M, Ling W, Matthews A, Smith NA (2015) Transition-based dependency parsing with stack long short-term memory. arXiv preprint arXiv:1505.08075 Dyer C, Ballesteros M, Ling W, Matthews A, Smith NA (2015) Transition-based dependency parsing with stack long short-term memory. arXiv preprint arXiv:​1505.​08075
30.
Zurück zum Zitat Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259 Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:​1409.​1259
31.
Zurück zum Zitat Kinga D, Adam JB (2015) A method for stochastic optimization. In: International Conference on Learning Representations (ICLR) Kinga D, Adam JB (2015) A method for stochastic optimization. In: International Conference on Learning Representations (ICLR)
Metadaten
Titel
WCP-RNN: a novel RNN-based approach for Bio-NER in Chinese EMRs
Paper ID: FC_17_25
verfasst von
Jianqiang Li
Shenhe Zhao
Jijiang Yang
Zhisheng Huang
Bo Liu
Shi Chen
Hui Pan
Qing Wang
Publikationsdatum
16.01.2018
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 3/2020
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-017-2229-x

Weitere Artikel der Ausgabe 3/2020

The Journal of Supercomputing 3/2020 Zur Ausgabe

EditorialNotes

Editorial Preface

Premium Partner