Skip to main content
Top

2016 | OriginalPaper | Chapter

DoSeR - A Knowledge-Base-Agnostic Framework for Entity Disambiguation Using Semantic Embeddings

Authors : Stefan Zwicklbauer, Christin Seifert, Michael Granitzer

Published in: The Semantic Web. Latest Advances and New Domains

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Entity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. It finds its application in the extraction of structured data in RDF (Resource Description Framework) from textual documents, but equally so in facilitating artificial intelligence applications, such as Semantic Search, Reasoning and Question & Answering. In this work, we propose DoSeR (Disambiguation of Semantic Resources), a (named) entity disambiguation framework that is knowledge-base-agnostic in terms of RDF (e.g. DBpedia) and entity-annotated document knowledge bases (e.g. Wikipedia). Initially, our framework automatically generates semantic entity embeddings given one or multiple knowledge bases. In the following, DoSeR accepts documents with a given set of surface forms as input and collectively links them to an entity in a knowledge base with a graph-based approach. We evaluate DoSeR on seven different data sets against publicly available, state-of-the-art (named) entity disambiguation frameworks. Our approach outperforms the state-of-the-art approaches that make use of RDF knowledge bases and/or entity-annotated document knowledge bases by up to 10 % F1 measure.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: 7th WWW, pp. 107–117. Elsevier Science Publishers B.V., Amsterdam (1998) Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: 7th WWW, pp. 107–117. Elsevier Science Publishers B.V., Amsterdam (1998)
2.
go back to reference Cheng, X., Roth, D.: Relational inference for wikification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2013 (2013) Cheng, X., Roth, D.: Relational inference for wikification. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2013 (2013)
3.
go back to reference Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: EMNLP-CoNLL, pp. 708–716. ACL, Prague, June 2007 Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: EMNLP-CoNLL, pp. 708–716. ACL, Prague, June 2007
4.
go back to reference Ferragina, P., Scaiella, U.: Fast and accurate annotation of short texts with wikipedia pages. IEEE Softw. 29(1), 70–75 (2012)CrossRef Ferragina, P., Scaiella, U.: Fast and accurate annotation of short texts with wikipedia pages. IEEE Softw. 29(1), 70–75 (2012)CrossRef
5.
go back to reference Han, X., Sun, L.: An entity-topic model for entity linking. In: EMNLP-CoNLL, pp. 105–115. ACL, Stroudsburg (2012) Han, X., Sun, L.: An entity-topic model for entity linking. In: EMNLP-CoNLL, pp. 105–115. ACL, Stroudsburg (2012)
6.
go back to reference Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: SIGIR, pp. 765–774. ACM, New York (2011) Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: SIGIR, pp. 765–774. ACM, New York (2011)
7.
go back to reference Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: EMNLP, pp. 782–792. ACL, Stroudsburg (2011) Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: EMNLP, pp. 782–792. ACL, Stroudsburg (2011)
8.
go back to reference Huang, H., Heck, L., Ji, H.: Leveraging deep neural networks and knowledge graphs for entity disambiguation. CoRR abs/1504.07678 (2015) Huang, H., Heck, L., Ji, H.: Leveraging deep neural networks and knowledge graphs for entity disambiguation. CoRR abs/1504.07678 (2015)
9.
go back to reference Kataria, S.S., Kumar, K.S., Rastogi, R.R., Sen, P., Sengamedu, S.H.: Entity disambiguation with hierarchical topic models. In: 17th SIGKDD, pp. 1037–1045. ACM, New York (2011) Kataria, S.S., Kumar, K.S., Rastogi, R.R., Sen, P., Sengamedu, S.H.: Entity disambiguation with hierarchical topic models. In: 17th SIGKDD, pp. 1037–1045. ACM, New York (2011)
11.
go back to reference Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of wiki-pedia entities in web text. In: 15th SIGKDD, pp. 457–466. ACM, New York (2009) Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of wiki-pedia entities in web text. In: 15th SIGKDD, pp. 457–466. ACM, New York (2009)
12.
go back to reference Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web J. 6, 167–195 (2014) Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web J. 6, 167–195 (2014)
13.
go back to reference Mahdisoltani, F., Biega, J., Suchanek, F.M.: Yago3: a knowledge base from multilingual wikipedias (2015) Mahdisoltani, F., Biega, J., Suchanek, F.M.: Yago3: a knowledge base from multilingual wikipedias (2015)
14.
go back to reference Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: 7th I-Semantics, pp. 1–8. ACM, New York (2011) Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: Dbpedia spotlight: shedding light on the web of documents. In: 7th I-Semantics, pp. 1–8. ACM, New York (2011)
15.
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013) Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
16.
go back to reference Milne, D., Witten, I.H.: Learning to link with wikipedia. In: 17th CIKM, pp. 509–518. ACM, New York (2008) Milne, D., Witten, I.H.: Learning to link with wikipedia. In: 17th CIKM, pp. 509–518. ACM, New York (2008)
17.
go back to reference Piccinno, F., Ferragina, P.: From TagMe to WAT: a new entity annotator. In: First International Workshop on Entity Recognition/Disambiguation, ERD 2014, pp. 55–62. ACM, New York (2014) Piccinno, F., Ferragina, P.: From TagMe to WAT: a new entity annotator. In: First International Workshop on Entity Recognition/Disambiguation, ERD 2014, pp. 55–62. ACM, New York (2014)
18.
go back to reference Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: ACL, pp. 1375–1384. ACL, Stroudsburg (2011) Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: ACL, pp. 1375–1384. ACL, Stroudsburg (2011)
19.
go back to reference Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop, pp. 45–50. ELRA, Valletta, May 2010 Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop, pp. 45–50. ELRA, Valletta, May 2010
20.
go back to reference Röder, M., Usbeck, R., Hellmann, S., Gerber, D., Both, A.: N3 - a collection of datasets for named entity recognition and disambiguation in the NLP interchange format. In: 9th LREC, Reykjavik, Iceland, 26–31 May 2014 (2014) Röder, M., Usbeck, R., Hellmann, S., Gerber, D., Both, A.: N3 - a collection of datasets for named entity recognition and disambiguation in the NLP interchange format. In: 9th LREC, Reykjavik, Iceland, 26–31 May 2014 (2014)
21.
go back to reference Shen, W., Wang, J., Luo, P., Wang, M.: Linden: linking named entities with knowledge base via semantic knowledge. In: 21st WWW, pp. 449–458. ACM, New York (2012) Shen, W., Wang, J., Luo, P., Wang, M.: Linden: linking named entities with knowledge base via semantic knowledge. In: 21st WWW, pp. 449–458. ACM, New York (2012)
22.
go back to reference Usbeck, R., Ngonga Ngomo, A.-C., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS - graph-based disambiguation of named entities using linked data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 457–471. Springer, Heidelberg (2014) Usbeck, R., Ngonga Ngomo, A.-C., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS - graph-based disambiguation of named entities using linked data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 457–471. Springer, Heidelberg (2014)
23.
go back to reference Usbeck, R., Röder, M., Ngonga Ngomo, A.C., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL - general entity annotation benchmark framework. In: 24th WWW Conference (2015) Usbeck, R., Röder, M., Ngonga Ngomo, A.C., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL - general entity annotation benchmark framework. In: 24th WWW Conference (2015)
24.
go back to reference White, S., Smyth, P.: Algorithms for estimating relative importance in networks. In: 9th SIGKDD, pp. 266–275. ACM, New York (2003) White, S., Smyth, P.: Algorithms for estimating relative importance in networks. In: 9th SIGKDD, pp. 266–275. ACM, New York (2003)
25.
go back to reference Xie, W., Bindel, D., Demers, A., Gehrke, J.: Edge-weighted personalized pagerank: breaking a decade-old performance barrier. In: 21th SIGKDD, pp. 1325–1334. ACM, New York (2015) Xie, W., Bindel, D., Demers, A., Gehrke, J.: Edge-weighted personalized pagerank: breaking a decade-old performance barrier. In: 21th SIGKDD, pp. 1325–1334. ACM, New York (2015)
26.
go back to reference Zwicklbauer, S., Seifert, C., Granitzer, M.: From general to specialized domain: analyzing three crucial problems of biomedical entity disambiguation. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9261, pp. 76–93. Springer, Heidelberg (2015)CrossRef Zwicklbauer, S., Seifert, C., Granitzer, M.: From general to specialized domain: analyzing three crucial problems of biomedical entity disambiguation. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9261, pp. 76–93. Springer, Heidelberg (2015)CrossRef
Metadata
Title
DoSeR - A Knowledge-Base-Agnostic Framework for Entity Disambiguation Using Semantic Embeddings
Authors
Stefan Zwicklbauer
Christin Seifert
Michael Granitzer
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-34129-3_12