Skip to main content

2017 | OriginalPaper | Buchkapitel

Blocking for Entity Resolution in the Web of Data: Challenges and Algorithms

verfasst von : Kostas Stefanidis

Erschienen in: Strategic Innovative Marketing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the Web of data, entities are described by interlinked data rather than documents on the Web. In this talk, we focus on entity resolution in the Web of data, i.e., on the problem of identifying descriptions that refer to the same real-world entity within one or across knowledge bases in the Web of data. To reduce the required number of pairwise comparisons among descriptions, methods for entity resolution typically perform a preprocessing step, called blocking, which places similar entity descriptions into blocks and executes comparisons only between descriptions within the same block. The objective of this talk is to present challenges and algorithms for blocking for entity resolution, stemming from the Web openness in describing, by an unbounded number of KBs, a multitude of entity types across domains, as well as the high heterogeneity (semantic and structural) of descriptions, even for the same types of entities.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
For instance, the sameas.org service provides co-references of the same entities between different KBs that have been manually collected.
 
Literatur
Zurück zum Zitat Auer, S., C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z.G. Ives. 2007. Dbpedia: A nucleus for a web of open data. In ISWC. Auer, S., C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z.G. Ives. 2007. Dbpedia: A nucleus for a web of open data. In ISWC.
Zurück zum Zitat Bollacker, K.D., C. Evans, P. Paritosh, T. Sturge, and J. Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In SIGMOD. Bollacker, K.D., C. Evans, P. Paritosh, T. Sturge, and J. Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In SIGMOD.
Zurück zum Zitat Christen, P. 2012. Data matching—Concepts and techniques for record linkage, entity resolution, and duplicate detection. Data-centric systems and applications. Berling: Springer. Christen, P. 2012. Data matching—Concepts and techniques for record linkage, entity resolution, and duplicate detection. Data-centric systems and applications. Berling: Springer.
Zurück zum Zitat Christophides, V., V. Efthymiou, and K. Stefanidis. 2015. Entity resolution in the web of data. Synthesis lectures on the semantic web: Theory and technology. Morgan & Claypool Publishers. Christophides, V., V. Efthymiou, and K. Stefanidis. 2015. Entity resolution in the web of data. Synthesis lectures on the semantic web: Theory and technology. Morgan & Claypool Publishers.
Zurück zum Zitat Cimiano, P., C. Unger, and J. McCrae. 2014. Ontology-based interpretation of natural language. Synthesis lectures on human language technologies. Morgan & Claypool Publishers. Cimiano, P., C. Unger, and J. McCrae. 2014. Ontology-based interpretation of natural language. Synthesis lectures on human language technologies. Morgan & Claypool Publishers.
Zurück zum Zitat Deshpande, O., D.S. Lamba, M. Tourn, S. Das, S. Subramaniam, A. Rajaraman, V. Harinarayan, and A. Doan. 2013. Building, maintaining, and using knowledge bases: A report from the trenches. In SIGMOD. Deshpande, O., D.S. Lamba, M. Tourn, S. Das, S. Subramaniam, A. Rajaraman, V. Harinarayan, and A. Doan. 2013. Building, maintaining, and using knowledge bases: A report from the trenches. In SIGMOD.
Zurück zum Zitat Dong, X.L., and D. Srivastava. Big data integration. Synthesis lectures on data management. Morgan & Claypool Publishers. Dong, X.L., and D. Srivastava. Big data integration. Synthesis lectures on data management. Morgan & Claypool Publishers.
Zurück zum Zitat Efthymiou, V., G. Papadakis, G. Papastefanatos, K. Stefanidis, and T. Palpanas. 2015. Parallel meta-blocking: Realizing scalable entity resolution over large, heterogeneous data. In IEEE big data. Efthymiou, V., G. Papadakis, G. Papastefanatos, K. Stefanidis, and T. Palpanas. 2015. Parallel meta-blocking: Realizing scalable entity resolution over large, heterogeneous data. In IEEE big data.
Zurück zum Zitat Efthymiou, V., G. Papadakis, G. Papastefanatos, K. Stefanidis, and T. Palpanas. 2017. Parallel meta-blocking for scaling entity resolution over big heterogeneous data. Information Systems 65: 137–157.CrossRef Efthymiou, V., G. Papadakis, G. Papastefanatos, K. Stefanidis, and T. Palpanas. 2017. Parallel meta-blocking for scaling entity resolution over big heterogeneous data. Information Systems 65: 137–157.CrossRef
Zurück zum Zitat Efthymiou, V., K. Stefanidis, and V. Christophides. 2015. Big data entity resolution: From highly to somehow similar entity descriptions in the web. In IEEE big data. Efthymiou, V., K. Stefanidis, and V. Christophides. 2015. Big data entity resolution: From highly to somehow similar entity descriptions in the web. In IEEE big data.
Zurück zum Zitat Efthymiou, V., K. Stefanidis, and V. Christophides. 2017. Benchmarking blocking algorithms for web entities. IEEE Transactions on Big Data 3. Efthymiou, V., K. Stefanidis, and V. Christophides. 2017. Benchmarking blocking algorithms for web entities. IEEE Transactions on Big Data 3.
Zurück zum Zitat Hogan, A., A. Harth, J. Umbrich, S. Kinsella, A. Polleres, and S. Decker. 2011. Searching and browsing linked data with SWSE: The semantic web search engine. Journal of Web Semantics 9 (4): 365–401.CrossRef Hogan, A., A. Harth, J. Umbrich, S. Kinsella, A. Polleres, and S. Decker. 2011. Searching and browsing linked data with SWSE: The semantic web search engine. Journal of Web Semantics 9 (4): 365–401.CrossRef
Zurück zum Zitat Hogan, A., J. Umbrich, A. Harth, R. Cyganiak, A. Polleres, and S. Decker. 2012. An empirical survey of linked data conformance. Web Semantics 14: 14–44.CrossRef Hogan, A., J. Umbrich, A. Harth, R. Cyganiak, A. Polleres, and S. Decker. 2012. An empirical survey of linked data conformance. Web Semantics 14: 14–44.CrossRef
Zurück zum Zitat Hovy, E.H., R. Navigli, and S.P. Ponzetto. 2013. Collaboratively built semi-structured content and artificial intelligence: The story so far. Artificial Intelligence 194: 2–27.CrossRef Hovy, E.H., R. Navigli, and S.P. Ponzetto. 2013. Collaboratively built semi-structured content and artificial intelligence: The story so far. Artificial Intelligence 194: 2–27.CrossRef
Zurück zum Zitat Papadakis, G., E. Ioannou, T. Palpanas, C. Niederée, and W. Nejdl. 2013. A blocking framework for entity resolution in highly heterogeneous information spaces. IEEE Transactions on Knowledge and Data Engineering 25 (12): 2665–2682.CrossRef Papadakis, G., E. Ioannou, T. Palpanas, C. Niederée, and W. Nejdl. 2013. A blocking framework for entity resolution in highly heterogeneous information spaces. IEEE Transactions on Knowledge and Data Engineering 25 (12): 2665–2682.CrossRef
Zurück zum Zitat Schmachtenberg, M., C. Bizer, and H. Paulheim. 2014. Adoption of the linked data best practices in different topical domains. In ISWC. Schmachtenberg, M., C. Bizer, and H. Paulheim. 2014. Adoption of the linked data best practices in different topical domains. In ISWC.
Metadaten
Titel
Blocking for Entity Resolution in the Web of Data: Challenges and Algorithms
verfasst von
Kostas Stefanidis
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-56288-9_63