Skip to main content
Top

Hint

Swipe to navigate through the chapters of this book

2021 | OriginalPaper | Chapter

The Optimization of Portuguese Named-Entity Recognition and Classification by Combining Local Grammars and Conditional Random Fields Trained with a Parsed Corpus

Abstract

This article presents the results of a study concerning named-entity recognition and classification for Portuguese focusing on temporal expressions. We have used the Conditional Random Fields (CRF) probabilistic method and features coming from an automatically annotated parsed corpus and local grammars. We were able to notice that Part-of-Speech (PoS) tags are the most relevant information coming from a parsed corpus to be used as a feature for this task. No positive synergy emerges from the association of these tags with other linguistic information from the parsed corpus. A NooJ local grammar, created to recognize “Time” category entities (without detailing types and subtypes), provides information that surpasses PoS tags as a feature for CRF training in terms of precision and recall. The combination of PoS and NooJ annotations does not bring any advantage.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Translation: “Evaluation of Recognizers of Mentioned Entities”.
 
2
The corpus file is called “CDSegundoHAREMReRelEM” and is available at the website https:​www.​linguateca.​pt.
 
Literature
1.
go back to reference Freitas, C., Carvalho, P., Gonçalo Oliveira, H., Mota, C., Santos, D.: SecondHAREM: advancing the state of the art of named entity recognition in Portuguese. In: Calzolari, N., et al. (eds.) Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010) (Valletta 17–23 May de 2010) European Language Resources Association. European Language Resources Association (2010) Freitas, C., Carvalho, P., Gonçalo Oliveira, H., Mota, C., Santos, D.: SecondHAREM: advancing the state of the art of named entity recognition in Portuguese. In: Calzolari, N., et al. (eds.) Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010) (Valletta 17–23 May de 2010) European Language Resources Association. European Language Resources Association (2010)
2.
go back to reference Gross, M.: A bootstrap method for constructing local grammars. In: Bokan, N. (ed.) Proceedings of the Symposium on Contemporary Mathematics, pp. 229–250 (1999) Gross, M.: A bootstrap method for constructing local grammars. In: Bokan, N. (ed.) Proceedings of the Symposium on Contemporary Mathematics, pp. 229–250 (1999)
3.
go back to reference Hirschman, L.: The evolution of evaluation: lessons from the message understanding conference. Comput. Speech Lang. 12, 208–305 (1998) CrossRef Hirschman, L.: The evolution of evaluation: lessons from the message understanding conference. Comput. Speech Lang. 12, 208–305 (1998) CrossRef
6.
go back to reference Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001). citeseer.ist.psu.edu/lafferty01conditional.html Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for segmenting and labelling sequence data. In: Proceedings of the 18th International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann, San Francisco (2001). citeseer.ist.psu.edu/lafferty01conditional.html
7.
go back to reference Mota, C., Silberztein, M.: Em busca da máxima precisão sem almanaques: O stencil/nooj no harem. In: Diana Santos, N.C. (ed.) Reconhecimento de entidades mencionadas em português: Documentação e actas do HAREM, a primeira avaliação conjunta na área, pp. 191–208 (2007) Mota, C., Silberztein, M.: Em busca da máxima precisão sem almanaques: O stencil/nooj no harem. In: Diana Santos, N.C. (ed.) Reconhecimento de entidades mencionadas em português: Documentação e actas do HAREM, a primeira avaliação conjunta na área, pp. 191–208 (2007)
8.
go back to reference Mota, C., Carvalho, P., Barreiro, A.: Port4NooJ v3.0: integrated linguistic resources for Portuguese NLP. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 1264–1269. European Language Resources Association (ELRA), Portorož (2016). https://​www.​aclweb.​org/​anthology/​L16-1201 Mota, C., Carvalho, P., Barreiro, A.: Port4NooJ v3.0: integrated linguistic resources for Portuguese NLP. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 1264–1269. European Language Resources Association (ELRA), Portorož (2016). https://​www.​aclweb.​org/​anthology/​L16-1201
9.
go back to reference Pirovani, J., Oliveira, E.: Portuguese named entity recognition using conditional random fields and local grammars. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki (2018). https://​www.​aclweb.​org/​anthology/​L18-1705 Pirovani, J., Oliveira, E.: Portuguese named entity recognition using conditional random fields and local grammars. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association (ELRA), Miyazaki (2018). https://​www.​aclweb.​org/​anthology/​L18-1705
10.
go back to reference Rademaker, A., Chalub, F., Real, L., Freitas, C., Bick, E., de Paiva, V.: Universal dependencies for Portuguese. In: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling), Pisa, Italy, pp. 197–206 (2017). https://​aclweb.​org/​anthology/​W17-6523 Rademaker, A., Chalub, F., Real, L., Freitas, C., Bick, E., de Paiva, V.: Universal dependencies for Portuguese. In: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling), Pisa, Italy, pp. 197–206 (2017). https://​aclweb.​org/​anthology/​W17-6523
12.
15.
go back to reference Wijffels, J., Okazaki, N.: CRFsuite: conditional random fields for labelling sequential data in natural language processing based on crfsuite: a fast implementation of conditional random fields (CRFs) (2007–2018). https://​github.​com/​bnosac/​crfsuite. R package version 0.1 Wijffels, J., Okazaki, N.: CRFsuite: conditional random fields for labelling sequential data in natural language processing based on crfsuite: a fast implementation of conditional random fields (CRFs) (2007–2018). https://​github.​com/​bnosac/​crfsuite. R package version 0.1
Metadata
Title
The Optimization of Portuguese Named-Entity Recognition and Classification by Combining Local Grammars and Conditional Random Fields Trained with a Parsed Corpus
Authors
Diego Alves
Božo Bekavac
Marko Tadić
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-70629-6_17

Premium Partner