Skip to main content
Top
Published in: Cluster Computing 3/2019

08-09-2017

Named entity recognition based on conditional random fields

Authors: Shengli Song, Nan Zhang, Haitao Huang

Published in: Cluster Computing | Special Issue 3/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Named entity recognition (NER) is one of the fundamental problems in many natural language processing applications and the study on NER has great significance. Combining words segmentation and parts of speech analysis, the paper proposes a new NER method based on conditional random fields considering the graininess of candidate entities. The recognition granularity can be divided into two levels: word-based and character-based. We use segmented text to extract characteristics according to the characteristic templates which had been trained in the training phase, and then calculate \(P(y{\vert }x)\) to get the best result from the input sequence. The paper valuates the algorithm for different graininess on large-scale corpus experimentally, and the results show that this method has high research value and feasibility.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)CrossRef Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)CrossRef
2.
go back to reference Bhargava, R., Vamsi, B., Sharma, Y.: Named entity recognition for code mixing in indian languages using hybrid approach. Facilities 23, 10 (2016) Bhargava, R., Vamsi, B., Sharma, Y.: Named entity recognition for code mixing in indian languages using hybrid approach. Facilities 23, 10 (2016)
3.
go back to reference Şeker, G.A., Eryiğit, G.: Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content. Semant. Web 8(5), 625–642 (2017)CrossRef Şeker, G.A., Eryiğit, G.: Extending a CRF-based named entity recognition model for Turkish well formed text and user generated content. Semant. Web 8(5), 625–642 (2017)CrossRef
4.
6.
go back to reference Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37(6), 1554–1563 (1966)MathSciNetCrossRef Baum, L.E., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Stat. 37(6), 1554–1563 (1966)MathSciNetCrossRef
7.
go back to reference Müller, H.M., Kenny, E.E., Sternberg, P.W.: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2(11), e309 (2004)CrossRef Müller, H.M., Kenny, E.E., Sternberg, P.W.: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2(11), e309 (2004)CrossRef
8.
go back to reference Lehnert, W.G.: The Process of Question Answering: A Computer Simulation of Cognition. Lawrence Erlbaum Associates, Hillsdale (1978)MATH Lehnert, W.G.: The Process of Question Answering: A Computer Simulation of Cognition. Lawrence Erlbaum Associates, Hillsdale (1978)MATH
9.
go back to reference Cho, K., Van Merriënboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv:1406.1078 Cho, K., Van Merriënboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv:​1406.​1078
10.
go back to reference Goldberg, D.E., Holland, J.H.: Genetic algorithms and machine learning. Mach. Learn. 3(2), 95–99 (1988)CrossRef Goldberg, D.E., Holland, J.H.: Genetic algorithms and machine learning. Mach. Learn. 3(2), 95–99 (1988)CrossRef
11.
go back to reference Suxiang, Z.: Based cascaded conditional random fields model for Chinese Named Entity recognition In: Signal Processing. ICSP 2008. 9th International Conference on. IEEE, pp. 1573–1577 (2008) Suxiang, Z.: Based cascaded conditional random fields model for Chinese Named Entity recognition In: Signal Processing. ICSP 2008. 9th International Conference on. IEEE, pp. 1573–1577 (2008)
12.
go back to reference Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: Human Language Technology Conference, pp. 109–116 (2001) Yarowsky, D., Ngai, G., Wicentowski, R.: Inducing multilingual text analysis tools via robust projection across aligned corpora. In: Human Language Technology Conference, pp. 109–116 (2001)
13.
go back to reference Kim, S., Toutanova, K., Yu, H.: Multilingual named entity recognition using parallel data and metadata from Wikipedia. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (2012) Kim, S., Toutanova, K., Yu, H.: Multilingual named entity recognition using parallel data and metadata from Wikipedia. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (2012)
14.
go back to reference Fu, R., Qin, B., Liu, T.: Generating Chinese named entity data from a parallel corpus. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 264–272 (2011) Fu, R., Qin, B., Liu, T.: Generating Chinese named entity data from a parallel corpus. In: Proceedings of the 5th International Joint Conference on Natural Language Processing, pp. 264–272 (2011)
15.
go back to reference Muslea, I., Minton, S., Knoblock, C.A.: Active learning with multiple views. J. Artif. Intell. Res. 27, 203–233 (2006)MathSciNetCrossRef Muslea, I., Minton, S., Knoblock, C.A.: Active learning with multiple views. J. Artif. Intell. Res. 27, 203–233 (2006)MathSciNetCrossRef
16.
go back to reference Jones, R., Ghani, R., Mitchell, T., Rilo, E.: Active learning for information extraction with multiple view. In: Proceedings of the European Conference in Machine Learning (ECML 2003), vol. 77, pp. 257–286 (2003) Jones, R., Ghani, R., Mitchell, T., Rilo, E.: Active learning for information extraction with multiple view. In: Proceedings of the European Conference in Machine Learning (ECML 2003), vol. 77, pp. 257–286 (2003)
17.
go back to reference Li, Q., Li, H., Ji, H.: Joint bilingual name tagging for parallel corpora. In: Proceedings of CIKM 2012 (2012) Li, Q., Li, H., Ji, H.: Joint bilingual name tagging for parallel corpora. In: Proceedings of CIKM 2012 (2012)
18.
go back to reference Mao, X., Dong, Y., He, S., et al.: Chinese word segmentation and named entity recognition based on conditional random fields. In: IJCNLP, pp. 90–93 (2008) Mao, X., Dong, Y., He, S., et al.: Chinese word segmentation and named entity recognition based on conditional random fields. In: IJCNLP, pp. 90–93 (2008)
19.
go back to reference McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL -Volume 4. Association for Computational Linguistics, vol. 2003, pp. 188–191 (2003) McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL -Volume 4. Association for Computational Linguistics, vol. 2003, pp. 188–191 (2003)
20.
go back to reference Zhao, H., Huang, C.N., Li, M.: An improved Chinese word segmentation system with conditional random field. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. Sydney: July, 1082117 (2006) Zhao, H., Huang, C.N., Li, M.: An improved Chinese word segmentation system with conditional random field. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing. Sydney: July, 1082117 (2006)
21.
go back to reference Joseph, K.: Bradley and Carlos Guestrin. Learning tree conditional random Felds. In: International Conference on Machine Learning (ICML 2010) (2010) Joseph, K.: Bradley and Carlos Guestrin. Learning tree conditional random Felds. In: International Conference on Machine Learning (ICML 2010) (2010)
22.
go back to reference Tran, T., Phung, D., Bui, H., et al.: Hierarchical semi-Markov conditional random fields for deep recursive sequential data. Artif. Intell. 246, 53–85 (2017)MathSciNetCrossRef Tran, T., Phung, D., Bui, H., et al.: Hierarchical semi-Markov conditional random fields for deep recursive sequential data. Artif. Intell. 246, 53–85 (2017)MathSciNetCrossRef
23.
go back to reference Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996) Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)
25.
go back to reference McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the 17th International Conference on Machine Learning (ICML’ 2000), pp. 591–598 (2000) McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of the 17th International Conference on Machine Learning (ICML’ 2000), pp. 591–598 (2000)
26.
go back to reference Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden Markov model structure for information extraction. In: Proceedings of AAAI’1999 Workshop on Machine Learning for Information Extraction (1999) Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden Markov model structure for information extraction. In: Proceedings of AAAI’1999 Workshop on Machine Learning for Information Extraction (1999)
Metadata
Title
Named entity recognition based on conditional random fields
Authors
Shengli Song
Nan Zhang
Haitao Huang
Publication date
08-09-2017
Publisher
Springer US
Published in
Cluster Computing / Issue Special Issue 3/2019
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-017-1146-3

Other articles of this Special Issue 3/2019

Cluster Computing 3/2019 Go to the issue

Premium Partner