Skip to main content

2017 | OriginalPaper | Buchkapitel

A CRFs-Based Approach Empowered with Word Representation Features to Learning Biomedical Named Entities from Medical Text

verfasst von : Wenxiu Xie, Sihui Fu, Shengyi Jiang, Tianyong Hao

Erschienen in: Emerging Technologies for Education

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Targeting at identifying specific types of entities, biomedical named entity recognition is a fundamental task of biomedical text processing. This paper presents a CRFs-based approach to learning disease entities by identifying their boundaries in texts. Two types of word representation features are proposed and used including word embedding features and cluster-based features. In addition, an external disease dictionary feature is also explored in the learning process. Based on a publically available NCBI disease corpus, we evaluate the performance of the CRFs-based model with the combination of these word representation features. The results show that using these features can significantly improve BNER performance with an increase of 24.7% on F1 measure, demonstrating the effectiveness of the proposed features and the feature-empowered approach.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Kazama, J., Makino, T., Ohta, Y., Tsujii, J.: Tuning support vector machines for biomedical named entity recognition. In: Proceedings of the ACL Workshop on Natural Language Processing in the Biomedical Domain, vol. 3, pp. 1–8 (2002) Kazama, J., Makino, T., Ohta, Y., Tsujii, J.: Tuning support vector machines for biomedical named entity recognition. In: Proceedings of the ACL Workshop on Natural Language Processing in the Biomedical Domain, vol. 3, pp. 1–8 (2002)
2.
Zurück zum Zitat Athenikos, S.J., Han, H.: Biomedical question answering: A survey. Comput. Methods Prog. Biomed. 99, 1–24 (2010)CrossRef Athenikos, S.J., Han, H.: Biomedical question answering: A survey. Comput. Methods Prog. Biomed. 99, 1–24 (2010)CrossRef
3.
Zurück zum Zitat Leaman, R., Gonzalez, G.: BANNER: an executable survey of advances in biomedical named entity recognition. In: Pacific Symposium on Biocomputing, vol. 13, pp. 652–663 (2008) Leaman, R., Gonzalez, G.: BANNER: an executable survey of advances in biomedical named entity recognition. In: Pacific Symposium on Biocomputing, vol. 13, pp. 652–663 (2008)
4.
Zurück zum Zitat Yao, L., Liu, H., Liu, Y., Li, X., Anwar, M.W.: Biomedical named entity recognition based on deep neural network. Int. J. Hybrid Inf. Technol. 8(8), 279–288 (2015)CrossRef Yao, L., Liu, H., Liu, Y., Li, X., Anwar, M.W.: Biomedical named entity recognition based on deep neural network. Int. J. Hybrid Inf. Technol. 8(8), 279–288 (2015)CrossRef
5.
Zurück zum Zitat Tang, B., Cao, H., Wang, X., Chen, Q., Xu, H.: Evaluating word representation features in biomedical named entity recognition tasks. Biomed. Res. Int. 1–6 (2014) Tang, B., Cao, H., Wang, X., Chen, Q., Xu, H.: Evaluating word representation features in biomedical named entity recognition tasks. Biomed. Res. Int. 1–6 (2014)
6.
Zurück zum Zitat Wang, X., Yang, C., Guan, R.: A comparative study for biomedical named entity recognition. Int. J. Mach. Learn.Cybern., 1–10 (2015) Wang, X., Yang, C., Guan, R.: A comparative study for biomedical named entity recognition. Int. J. Mach. Learn.Cybern., 1–10 (2015)
7.
Zurück zum Zitat Li, K., Ai, W., Tang, Z., Zhang, F., Jiang, L., Li, K., Hwang, K.: Hadoop recognition of biomedical named entity using conditional random fields. IEEE Trans. Parallel Distrib. Syst. 26(11), 3040–3051 (2015)CrossRef Li, K., Ai, W., Tang, Z., Zhang, F., Jiang, L., Li, K., Hwang, K.: Hadoop recognition of biomedical named entity using conditional random fields. IEEE Trans. Parallel Distrib. Syst. 26(11), 3040–3051 (2015)CrossRef
8.
Zurück zum Zitat Fries, J., Wu, S., Ratner, A., Ré, C.: SwellShark: a generative model for biomedical named entity recognition without labeled data (2017). arXiv preprint arXiv:1704.06360 Fries, J., Wu, S., Ratner, A., Ré, C.: SwellShark: a generative model for biomedical named entity recognition without labeled data (2017). arXiv preprint arXiv:​1704.​06360
9.
Zurück zum Zitat Zhang, S., Elhadad, N.: Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J. Biomed. Inf. 46(6), 1088–1098 (2013)CrossRef Zhang, S., Elhadad, N.: Unsupervised biomedical named entity recognition: experiments with clinical and biological texts. J. Biomed. Inf. 46(6), 1088–1098 (2013)CrossRef
10.
Zurück zum Zitat Kuksa, P.P., Qi, Y.: Semi-supervised bio-named entity recognition with word-codebook learning. In: Proceedings of the 2010 SIAM International Conference on Data Mining, pp. 25–36. Society for Industrial and Applied Mathematics (2010) Kuksa, P.P., Qi, Y.: Semi-supervised bio-named entity recognition with word-codebook learning. In: Proceedings of the 2010 SIAM International Conference on Data Mining, pp. 25–36. Society for Industrial and Applied Mathematics (2010)
11.
Zurück zum Zitat Munkhdalai, T., Li, M., Yun, U., Namsrai, O.E., Ryu, K.H.: An active co-training algorithm for biomedical named-entity recognition. JIPS 8(4), 575–588 (2012) Munkhdalai, T., Li, M., Yun, U., Namsrai, O.E., Ryu, K.H.: An active co-training algorithm for biomedical named-entity recognition. JIPS 8(4), 575–588 (2012)
12.
Zurück zum Zitat Munkhdalai, T., Li, M., Batsuren, K., Park, H.A., Choi, N.H., Ryu, K.H.: Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations. J. Cheminf. 7(1), 1–8 (2015)CrossRef Munkhdalai, T., Li, M., Batsuren, K., Park, H.A., Choi, N.H., Ryu, K.H.: Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations. J. Cheminf. 7(1), 1–8 (2015)CrossRef
13.
Zurück zum Zitat Gridach, M.: Character-level neural network for biomedical named entity recognition. J. Biomed. Inf. 70, 85–91 (2017)CrossRef Gridach, M.: Character-level neural network for biomedical named entity recognition. J. Biomed. Inf. 70, 85–91 (2017)CrossRef
14.
Zurück zum Zitat Vlachos, A.: Tackling the BioCreative2 gene mention task with conditional random fields and syntactic parsing. In: Proceedings of the Second BioCreative Challenge Workshop, pp. 85–87 (2007) Vlachos, A.: Tackling the BioCreative2 gene mention task with conditional random fields and syntactic parsing. In: Proceedings of the Second BioCreative Challenge Workshop, pp. 85–87 (2007)
15.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on EMNLP, pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on EMNLP, pp. 1532–1543 (2014)
17.
Zurück zum Zitat Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, vol. 8, pp. 282–289 (2001) Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, vol. 8, pp. 282–289 (2001)
18.
Zurück zum Zitat Jain, D.: Supervised named entity recognition for clinical data. CLEF 2015 Online Working Notes (2015) Jain, D.: Supervised named entity recognition for clinical data. CLEF 2015 Online Working Notes (2015)
19.
Zurück zum Zitat Wang, S.K., Li, S., Chen, T.: Recognition of Chinese medicine named entity based on condition random field. J. Xiamen Univ. 48(3), 359–364 (2009) Wang, S.K., Li, S., Chen, T.: Recognition of Chinese medicine named entity based on condition random field. J. Xiamen Univ. 48(3), 359–364 (2009)
20.
Zurück zum Zitat Zweig, G., Nguyen, P., Van Compernolle, D., et al.: Speech recognition with segmental conditional random fields: a summary of the JHU CLSP 2010 summer workshop. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5044–5047 (2011) Zweig, G., Nguyen, P., Van Compernolle, D., et al.: Speech recognition with segmental conditional random fields: a summary of the JHU CLSP 2010 summer workshop. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5044–5047 (2011)
21.
Zurück zum Zitat Wallach, H.M.: Conditional random fields: an introduction. University of Pennsylvania (2004) Wallach, H.M.: Conditional random fields: an introduction. University of Pennsylvania (2004)
22.
Zurück zum Zitat Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inf. 47, 1–10 (2014)CrossRef Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inf. 47, 1–10 (2014)CrossRef
23.
Zurück zum Zitat Zhao, H., Huang, C.-N., Li, M.: An improved chinese word segmentation system with conditional random field. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 162–165 (2006) Zhao, H., Huang, C.-N., Li, M.: An improved chinese word segmentation system with conditional random field. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 162–165 (2006)
Metadaten
Titel
A CRFs-Based Approach Empowered with Word Representation Features to Learning Biomedical Named Entities from Medical Text
verfasst von
Wenxiu Xie
Sihui Fu
Shengyi Jiang
Tianyong Hao
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-71084-6_61

Premium Partner