Skip to main content

2017 | OriginalPaper | Buchkapitel

The Complementary Nature of Different NLP Toolkits for Named Entity Recognition in Social Media

verfasst von : Filipe Batista, Álvaro Figueira

Erschienen in: Progress in Artificial Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper we study the combined use of four different NLP toolkits—Stanford CoreNLP, GATE, OpenNLP and Twitter NLP tools—in the context of social media posts. Previous studies have shown performance comparisons between these tools, both on news and social media corporas. In this paper, we go further by trying to understand how differently these toolkits predict Named Entities, in terms of their precision and recall for three different entity types, and how they can complement each other in this task in order to achieve a combined performance superior to each individual one. Experiments on two publicly available datasets from the workshops WNUT-2015 and #MSM2013 show that using an ensemble of toolkits can improve the recognition of specific entity types - up to 10.62% for the entity type Person, 1.97% for the type Location and 1.31% for the type Organization, depending on the dataset and the criteria used for the voting. Our results also showed improvements of 3.76% and 1.69%, in each dataset respectively, on the average performance of the three entity types.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Atdağ, S., Labatut, V.: A comparison of named entity recognition tools applied to biographical texts. In: 2013 2nd International Conference on Systems and Computer Science (ICSCS), pp. 228–233. IEEE (2013) Atdağ, S., Labatut, V.: A comparison of named entity recognition tools applied to biographical texts. In: 2013 2nd International Conference on Systems and Computer Science (ICSCS), pp. 228–233. IEEE (2013)
2.
Zurück zum Zitat Baldwin, T., De Marneffe, M.C., Han, B., Kim, Y.-B., Ritter, A., Xu, W.: Shared tasks of the: Twitter lexical normalization and named entity recognition. In: Proceedings of the Workshop on Noisy User-generated Text (WNUT 2015), Beijing, China (2015) Baldwin, T., De Marneffe, M.C., Han, B., Kim, Y.-B., Ritter, A., Xu, W.: Shared tasks of the: Twitter lexical normalization and named entity recognition. In: Proceedings of the Workshop on Noisy User-generated Text (WNUT 2015), Beijing, China (2015)
3.
Zurück zum Zitat Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M.A., Maynard, D., Aswani, N.: Twitie: an open-source information extraction pipeline for microblog text. In: RANLP, pp. 83–90 (2013) Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M.A., Maynard, D., Aswani, N.: Twitie: an open-source information extraction pipeline for microblog text. In: RANLP, pp. 83–90 (2013)
4.
Zurück zum Zitat Cano Basave, A.E., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.-S.: Making sense of microposts (# msm2013) concept extraction challenge (2013) Cano Basave, A.E., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.-S.: Making sense of microposts (# msm2013) concept extraction challenge (2013)
5.
Zurück zum Zitat Clark, A., Fox, C., Lappin, S.: The Handbook of Computational Linguistics and Natural Language Processing. Wiley, Hoboken (2013) Clark, A., Fox, C., Lappin, S.: The Handbook of Computational Linguistics and Natural Language Processing. Wiley, Hoboken (2013)
6.
Zurück zum Zitat Figueira, A., Sandim, M., Fortuna, P.: An approach to relevancy detection: contributions to the automatic detection of relevance in social networks. In: Rocha, A., Correia, A.M., Adeli, H., Reis, L.P., Teixeira, M.M. (eds.) ITEM 2014. AISC, vol. 444, pp. 89–99. Springer, Cham (2016). doi:10.1007/978-3-319-31232-3_9CrossRef Figueira, A., Sandim, M., Fortuna, P.: An approach to relevancy detection: contributions to the automatic detection of relevance in social networks. In: Rocha, A., Correia, A.M., Adeli, H., Reis, L.P., Teixeira, M.M. (eds.) ITEM 2014. AISC, vol. 444, pp. 89–99. Springer, Cham (2016). doi:10.​1007/​978-3-319-31232-3_​9CrossRef
8.
Zurück zum Zitat Jiang, R., Banchs, R.E., Li, H.: Evaluating and combining named entity recognition systems. In: ACL 2016, p. 21 (2016) Jiang, R., Banchs, R.E., Li, H.: Evaluating and combining named entity recognition systems. In: ACL 2016, p. 21 (2016)
9.
Zurück zum Zitat Laboreiro, G., Sarmento, L., Teixeira, J., Oliveira, E.: Tokenizing micro-blogging messages using a text classification approach. In: Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, pp. 81–88. ACM (2010) Laboreiro, G., Sarmento, L., Teixeira, J., Oliveira, E.: Tokenizing micro-blogging messages using a text classification approach. In: Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, pp. 81–88. ACM (2010)
10.
Zurück zum Zitat C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky. The stanford corenlp natural language processing toolkit. In ACL (System Demonstrations), pp. 55–60 (2014) C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky. The stanford corenlp natural language processing toolkit. In ACL (System Demonstrations), pp. 55–60 (2014)
11.
Zurück zum Zitat Nebhi, K., Bontcheva, K., Gorrell, G.: Restoring capitalization in# tweets. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1111–1115. ACM (2015) Nebhi, K., Bontcheva, K., Gorrell, G.: Restoring capitalization in# tweets. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1111–1115. ACM (2015)
13.
Zurück zum Zitat Pinto, A., Gonçalo Oliveira, H., Oliveira Alves, A.: Comparing the performance of different nlp toolkits in formal and social media text. In: OASIcs-OpenAccess Series in Informatics, vol. 51. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2016) Pinto, A., Gonçalo Oliveira, H., Oliveira Alves, A.: Comparing the performance of different nlp toolkits in formal and social media text. In: OASIcs-OpenAccess Series in Informatics, vol. 51. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2016)
14.
Zurück zum Zitat Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora, vol. 11, pp. 157–176. Springer, Heidelberg (1999). doi:10.1007/978-94-017-2390-9_10CrossRef Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora, vol. 11, pp. 157–176. Springer, Heidelberg (1999). doi:10.​1007/​978-94-017-2390-9_​10CrossRef
15.
Zurück zum Zitat Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. Association for Computational Linguistics (2011) Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. Association for Computational Linguistics (2011)
16.
Zurück zum Zitat Rodriquez, K.J., Bryant, M., Blanke, T., Luszczynska, M.: Comparison of named entity recognition tools for raw OCR text. In: KONVENS, pp. 410–414 (2012) Rodriquez, K.J., Bryant, M., Blanke, T., Luszczynska, M.: Comparison of named entity recognition tools for raw OCR text. In: KONVENS, pp. 410–414 (2012)
17.
Zurück zum Zitat Saha, S., Ekbal, A.: Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition. Data Knowl. Eng. 85, 15–39 (2013)CrossRef Saha, S., Ekbal, A.: Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition. Data Knowl. Eng. 85, 15–39 (2013)CrossRef
18.
Zurück zum Zitat Wu, C.-W., Jan, S.-Y., Tsai, R.T.-H., Hsu, W.-L.: On using ensemble methods for Chinese named entity recognition. In: Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, pp. 142–145 (2006) Wu, C.-W., Jan, S.-Y., Tsai, R.T.-H., Hsu, W.-L.: On using ensemble methods for Chinese named entity recognition. In: Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, pp. 142–145 (2006)
Metadaten
Titel
The Complementary Nature of Different NLP Toolkits for Named Entity Recognition in Social Media
verfasst von
Filipe Batista
Álvaro Figueira
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-65340-2_65