Skip to main content
Top

2020 | OriginalPaper | Chapter

Incorporating Lexicon for Named Entity Recognition of Traditional Chinese Medicine Books

Authors : Bingyan Song, Zhenshan Bao, YueZhang Wang, Wenbo Zhang, Chao Sun

Published in: Natural Language Processing and Chinese Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Little research has been done on the Named Entity Recognition (NER) of Traditional Chinese Medicine (TCM) books and most of them use statistical models such as Conditional Random Fields (CRFs). However, in these methods, lexicon information and large-scale of unlabeled corpus data are not fully exploited. In order to improve the performance of NER for TCM books, we propose a method which is based on biLSTM-CRF model and can incorporate lexicon information into representation layer to enrich its semantic information. We compared our approach with several previous character-based and word-based methods. Experiments on “Shanghan Lun” dataset show that our method outperforms previous models. In addition, we collected 376 TCM books to construct a large-scale of corpus to obtain the pre-trained vectors since there is no large available corpus in this field before. We have released the corpus and pre-trained vectors to the public.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Tang, J., Bao, Y.: Traditional Chinese medicine. Lancet. 372(9654), 1938-1940 (2008) Tang, J., Bao, Y.: Traditional Chinese medicine. Lancet. 372(9654), 1938-1940 (2008)
2.
go back to reference Meng, H., Xie, Q.: Automatic identification of TCM terminology in Shanghan Lun based on Conditional Random Field. J. Beijing Univ. Tradit. Chin. 38(9), 587–590 (2015) Meng, H., Xie, Q.: Automatic identification of TCM terminology in Shanghan Lun based on Conditional Random Field. J. Beijing Univ. Tradit. Chin. 38(9), 587–590 (2015)
3.
go back to reference Ye, H., Ji, D.: Research on symptom and medicine information abstraction of TCM book Jin Gui Yao Lue based on conditional random field. Chin. J. Libr. Inf. Sci. Tradit. Chin. Med. 040(005), 14–17 (2016) Ye, H., Ji, D.: Research on symptom and medicine information abstraction of TCM book Jin Gui Yao Lue based on conditional random field. Chin. J. Libr. Inf. Sci. Tradit. Chin. Med. 040(005), 14–17 (2016)
4.
go back to reference Wang, G., Du, J.: POS tagging and feature recombination for ancient prose of TCM diagnosis. Comput. Eng. Design 3, 835–840 (2015) Wang, G., Du, J.: POS tagging and feature recombination for ancient prose of TCM diagnosis. Comput. Eng. Design 3, 835–840 (2015)
5.
go back to reference Li, M., Liu, Z.: LSTM-CRF based symptom term recognition on traditional Chinese medical case. J. Comput. Appl. 38(3), 835–840 (2018) Li, M., Liu, Z.: LSTM-CRF based symptom term recognition on traditional Chinese medical case. J. Comput. Appl. 38(3), 835–840 (2018)
6.
go back to reference Zhang, Y., Jie, Y.: Chinese NER using lattice LSTM. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL, Melbourne, pp. 1554–1564 (2018) Zhang, Y., Jie, Y.: Chinese NER using lattice LSTM. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL, Melbourne, pp. 1554–1564 (2018)
7.
go back to reference Ma, R., Peng, M., Zhang, Q., et al.: Simplify the usage of lexicon in Chinese NER. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5951–5960 (2020) Ma, R., Peng, M., Zhang, Q., et al.: Simplify the usage of lexicon in Chinese NER. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5951–5960 (2020)
8.
go back to reference Wang, Q., Zhou, Y., Ruan, T., et al.: Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition. J. Biomed. Inform. 92, 103133 (2019)CrossRef Wang, Q., Zhou, Y., Ruan, T., et al.: Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition. J. Biomed. Inform. 92, 103133 (2019)CrossRef
9.
go back to reference Lu, N., Zheng, J., Wu, W., et al.: Chinese clinical named entity recognition with word-level information incorporating dictionaries. In: Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019) Lu, N., Zheng, J., Wu, W., et al.: Chinese clinical named entity recognition with word-level information incorporating dictionaries. In: Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
10.
go back to reference Guo, J., Che, W., Wang, H., Liu, T.: Revisiting embedding features for simple semi-supervised learning. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), ACL, Stroudsburg, pp. 110–120 (2014) Guo, J., Che, W., Wang, H., Liu, T.: Revisiting embedding features for simple semi-supervised learning. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), ACL, Stroudsburg, pp. 110–120 (2014)
11.
go back to reference Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009) Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009)
12.
go back to reference Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning, pp. 160–167 (2008) Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning, pp. 160–167 (2008)
13.
go back to reference Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
14.
go back to reference Siwei, L., Kang, L., Shizhu, H.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016)CrossRef Siwei, L., Kang, L., Shizhu, H.: How to generate a good word embedding. IEEE Intell. Syst. 31(6), 5–14 (2016)CrossRef
15.
go back to reference Greenberg, N., Bansal, T., Verga, P., McCallum, A.: Marginal likelihood training of BiLSTM-CRF for biomedical named entity recognition from disjoint label sets. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2824–2829 (2018) Greenberg, N., Bansal, T., Verga, P., McCallum, A.: Marginal likelihood training of BiLSTM-CRF for biomedical named entity recognition from disjoint label sets. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2824–2829 (2018)
16.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
17.
go back to reference Lafferty, J.D., Mccallum, A., Pereira F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc. (2001) Lafferty, J.D., Mccallum, A., Pereira F.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Eighteenth International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann Publishers Inc. (2001)
Metadata
Title
Incorporating Lexicon for Named Entity Recognition of Traditional Chinese Medicine Books
Authors
Bingyan Song
Zhenshan Bao
YueZhang Wang
Wenbo Zhang
Chao Sun
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-60457-8_39

Premium Partner