Skip to main content

2014 | OriginalPaper | Buchkapitel

Pattern Mining for Named Entity Recognition

verfasst von : Damien Nouvel, Jean-Yves Antoine, Nathalie Friburger

Erschienen in: Human Language Technology Challenges for Computer Science and Linguistics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Many evaluation campaigns have shown that knowledge-based and data-driven approaches remain equally competitive for Named Entity Recognition. Our research team has developed CasEN, a symbolic system based on finite state transducers, which achieved promising results during the Ester2 French-speaking evaluation campaign. Despite these encouraging results, manually extending the coverage of such a hand-crafted system is a difficult task. In this paper, we present a novel approach based on pattern mining for NER and to supplement our system’s knowledge base. The system, mXS, exhaustively searches for hierarchical sequential patterns, that aim at detecting Named Entity boundaries. We assess their efficiency by using such patterns in a standalone mode and in combination with our existing system.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Each set of markers is mapped to a predetermined number of corresponding markers sequences, e.g. \(P(\{m_1, m_2\}) = P(<m_1, m_2>) = P(<m_2, m_1>) = P(<m_1, m_2, m_1>)\).
 
2
It limits the search space by considering at any position N most probable solutions.
 
3
With regularization parameter C = 4.
 
Literatur
1.
Zurück zum Zitat Marsh, E., Perzanowski, D.: Muc-7 evaluation of ie technology: overview of results. In: Proceedings of the 7th Message Understanding Conference (MUC-7) (1998) Marsh, E., Perzanowski, D.: Muc-7 evaluation of ie technology: overview of results. In: Proceedings of the 7th Message Understanding Conference (MUC-7) (1998)
2.
Zurück zum Zitat Galliano, S., Gravier, G., Chaubard, L.: The ester 2 evaluation campaign for the rich transcription of French radio broadcasts. In: 10th Conference of the International Speech Communication Association (INTERSPEECH’2009) (2009) Galliano, S., Gravier, G., Chaubard, L.: The ester 2 evaluation campaign for the rich transcription of French radio broadcasts. In: 10th Conference of the International Speech Communication Association (INTERSPEECH’2009) (2009)
3.
Zurück zum Zitat Voorhees, E.M., Harman, D.: International Speech Communication Association (INTERSPEECH’09) (2009) Voorhees, E.M., Harman, D.: International Speech Communication Association (INTERSPEECH’09) (2009)
4.
Zurück zum Zitat Friburger, N., Maurel, D.: Finite-state transducer cascades to extract named entities in texts. Theor. Comput. Sci. (TCS) 313, 93–104 (2004)CrossRefMATHMathSciNet Friburger, N., Maurel, D.: Finite-state transducer cascades to extract named entities in texts. Theor. Comput. Sci. (TCS) 313, 93–104 (2004)CrossRefMATHMathSciNet
5.
Zurück zum Zitat McDonald, D.D.: Internal and external evidence in the identification and semantic categorization of proper names. In: Boguraev, B., Pustejovsky, J. (eds.) Corpus Processing for Lexical Acquisition, pp. 21–39. MIT Press, Cambridge (1996) McDonald, D.D.: Internal and external evidence in the identification and semantic categorization of proper names. In: Boguraev, B., Pustejovsky, J. (eds.) Corpus Processing for Lexical Acquisition, pp. 21–39. MIT Press, Cambridge (1996)
6.
Zurück zum Zitat Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL’1999) (1999) Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL’1999) (1999)
7.
Zurück zum Zitat Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: 6th Workshop on Very Large Corpora (WVLC’1998) (1998) Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: 6th Workshop on Very Large Corpora (WVLC’1998) (1998)
8.
Zurück zum Zitat McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: 13th Conference on Computational Natural Language Learning (CONLL’2003) (2003) McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: 13th Conference on Computational Natural Language Learning (CONLL’2003) (2003)
9.
Zurück zum Zitat Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30, 3–26 (2007)CrossRef Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30, 3–26 (2007)CrossRef
10.
Zurück zum Zitat Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: European Conference on Artificial Intelligence (ECAI’00) - Workshop on Machine Learning for Information Extraction, Berlin, Germany (2000) Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: European Conference on Artificial Intelligence (ECAI’00) - Workshop on Machine Learning for Information Extraction, Berlin, Germany (2000)
11.
Zurück zum Zitat Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165, 91–134 (2005)CrossRef Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165, 91–134 (2005)CrossRef
12.
Zurück zum Zitat Nouvel, D., Antoine, J.Y., Friburger, N., Soulet, A.: Recognizing named entities using automatically extracted transduction rules. In: Language and Technology Conference (LTC’11) (2011) Nouvel, D., Antoine, J.Y., Friburger, N., Soulet, A.: Recognizing named entities using automatically extracted transduction rules. In: Language and Technology Conference (LTC’11) (2011)
13.
Zurück zum Zitat Nouvel, D.: Reconnaissance des entités nommées par exploration de régles d’annotation. Ph.D. thesis (2012) Nouvel, D.: Reconnaissance des entités nommées par exploration de régles d’annotation. Ph.D. thesis (2012)
14.
Zurück zum Zitat Bouchou, B., Maurel, D.: Prolexbase et lmf : vers un standard pour les ressources lexicales sur les noms propres. Traitement Automatique des Langues (TAL) 49, 61–88 (2008) Bouchou, B., Maurel, D.: Prolexbase et lmf : vers un standard pour les ressources lexicales sur les noms propres. Traitement Automatique des Langues (TAL) 49, 61–88 (2008)
15.
Zurück zum Zitat Nouvel, D., Antoine, J.Y., Friburger, N., Maurel, D.: An analysis of the performances of the casen named entities recognition system in the ester2 evaluation campaign. In: 7th International Language Resources and Evaluation (LREC’2010) (2010) Nouvel, D., Antoine, J.Y., Friburger, N., Maurel, D.: An analysis of the performances of the casen named entities recognition system in the ester2 evaluation campaign. In: 7th International Language Resources and Evaluation (LREC’2010) (2010)
16.
Zurück zum Zitat Galibert, O., Rosset, S., Grouin, C., Zweigenbaum, P., Quintard, L.: Structured and extended named entity evaluation in automatic speech transcriptions. In: International Joint Conference on Natural Language Processing (IJCNLP’11) (2011) Galibert, O., Rosset, S., Grouin, C., Zweigenbaum, P., Quintard, L.: Structured and extended named entity evaluation in automatic speech transcriptions. In: International Joint Conference on Natural Language Processing (IJCNLP’11) (2011)
17.
Zurück zum Zitat Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: 2nd International Conference on New Methods in Language Processing (NEMLP’1994) (1994) Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: 2nd International Conference on New Methods in Language Processing (NEMLP’1994) (1994)
18.
Zurück zum Zitat Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. In: Data Mining and Knowledge Discovery (DMKD). vol. 1, pp. 259–289 (1997) Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. In: Data Mining and Knowledge Discovery (DMKD). vol. 1, pp. 259–289 (1997)
19.
Zurück zum Zitat Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MATHMathSciNet Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MATHMathSciNet
20.
Zurück zum Zitat Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R.: Performance measures for information extraction. In: DARPA Broadcast News Workshop, pp. 249–252 (1994) Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R.: Performance measures for information extraction. In: DARPA Broadcast News Workshop, pp. 249–252 (1994)
21.
Zurück zum Zitat Nouvel, D., Antoine, J.Y., Friburger, N., Soulet, A.: Coupling knowledge-based and data-driven systems for named entity recognition. In: Innovative Hybrid Approaches to the Processing of Textual Data (HYBRID’12, EACL Workshop) (2012) Nouvel, D., Antoine, J.Y., Friburger, N., Soulet, A.: Coupling knowledge-based and data-driven systems for named entity recognition. In: Innovative Hybrid Approaches to the Processing of Textual Data (HYBRID’12, EACL Workshop) (2012)
Metadaten
Titel
Pattern Mining for Named Entity Recognition
verfasst von
Damien Nouvel
Jean-Yves Antoine
Nathalie Friburger
Copyright-Jahr
2014
DOI
https://doi.org/10.1007/978-3-319-08958-4_19

Premium Partner