Skip to main content
Top

2018 | OriginalPaper | Chapter

Detecting Erroneous Identity Links on the Web Using Network Metrics

Authors : Joe Raad, Wouter Beek, Frank van Harmelen, Nathalie Pernelle, Fatiha Saïs

Published in: The Semantic Web – ISWC 2018

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In the absence of a central naming authority on the Semantic Web, it is common for different datasets to refer to the same thing by different IRIs. Whenever multiple names are used to denote the same thing, owl:sameAs statements are needed in order to link the data and foster reuse. Studies that date back as far as 2009, have observed that the owl:sameAs property is sometimes used incorrectly. In this paper, we show how network metrics such as the community structure of the owl:sameAs graph can be used in order to detect such possibly erroneous statements. One benefit of the here presented approach is that it can be applied to the network of owl:sameAs links itself, and does not rely on any additional knowledge. In order to illustrate its ability to scale, the approach is evaluated on the largest collection of identity links to date, containing over 558M owl:sameAs links scraped from the LOD Cloud.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
In RDF, nodes are terms that appear in the subject and/or object position of at least one triple.
 
4
On an 8 GB RAM Windows 10 machine, using 2 CPU cores.
 
5
Reflexive statements were discarded in I, and symmetric ones have the same err.
 
6
The judges were asked to not consider the owl:sameAs statements related to the term.
 
7
We also made sure to include 5 terms that belong to the same equality set.
 
Literature
4.
go back to reference Blondel, V., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 2008(10), P10008 (2008)CrossRef Blondel, V., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 2008(10), P10008 (2008)CrossRef
5.
go back to reference Cudré-Mauroux, P., Haghani, P., Jost, M., Aberer, K., De Meer, H.: idMesh: graph-based disambiguation of linked data. In: WWW Conference, pp. 591–600 (2009) Cudré-Mauroux, P., Haghani, P., Jost, M., Aberer, K., De Meer, H.: idMesh: graph-based disambiguation of linked data. In: WWW Conference, pp. 591–600 (2009)
7.
go back to reference de Melo, G.: Not quite the same: identity constraints for the web of linked data. In: des Jardins, M., Littman, M.L. (eds.) AAAI. AAAI Press (2013) de Melo, G.: Not quite the same: identity constraints for the web of linked data. In: des Jardins, M., Littman, M.L. (eds.) AAAI. AAAI Press (2013)
9.
go back to reference Dean, M., et al.: Owl web ontology language reference. W3C Recommendation, 10 February 2004 Dean, M., et al.: Owl web ontology language reference. W3C Recommendation, 10 February 2004
10.
go back to reference Ding, L., Shinavier, J., Finin, T., McGuinness, D.L.: owl:sameAs and linked data: an empirical study. In: Proceedings of the Second Web Science Conference (2010) Ding, L., Shinavier, J., Finin, T., McGuinness, D.L.: owl:sameAs and linked data: an empirical study. In: Proceedings of the Second Web Science Conference (2010)
14.
go back to reference Hogan, A., Zimmermann, A., Umbrich, J., Polleres, A., Decker, S.: Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora. Web Semant.: Sci. Serv. Agents World Wide Web 10, 76–110 (2012)CrossRef Hogan, A., Zimmermann, A., Umbrich, J., Polleres, A., Decker, S.: Scalable and distributed methods for entity matching, consolidation and disambiguation over linked data corpora. Web Semant.: Sci. Serv. Agents World Wide Web 10, 76–110 (2012)CrossRef
15.
go back to reference Jaffri, A., Glaser, H., Millard, I.: URI disambiguation in the context of Linked Data. In: Linked Data on the Web Workshop (LDOW) (2008) Jaffri, A., Glaser, H., Millard, I.: URI disambiguation in the context of Linked Data. In: Linked Data on the Web Workshop (LDOW) (2008)
16.
go back to reference Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80(5), 056117 (2009)CrossRef Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80(5), 056117 (2009)CrossRef
17.
go back to reference Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)CrossRef Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)CrossRef
18.
go back to reference Liu, W., Pellegrini, M., Wang, X.: Detecting communities based on network topology. Sci. Rep. 4, 5739 (2014)CrossRef Liu, W., Pellegrini, M., Wang, X.: Detecting communities based on network topology. Sci. Rep. 4, 5739 (2014)CrossRef
19.
go back to reference Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006)CrossRef Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006)CrossRef
21.
go back to reference Paulheim, H.: Identifying wrong links between datasets by multi-dimensional outlier detection. In: WoDOOM, pp. 27–38 (2014) Paulheim, H.: Identifying wrong links between datasets by multi-dimensional outlier detection. In: WoDOOM, pp. 27–38 (2014)
22.
go back to reference Raad, J., Pernelle, N., Saïs, F.: Detection of contextual identity links in a knowledge base. In: KCAP (2017) Raad, J., Pernelle, N., Saïs, F.: Detection of contextual identity links in a knowledge base. In: KCAP (2017)
23.
go back to reference Valdestilhas, A., Soru, T., Ngomo, A.-C.N.: CEDAL: time-efficient detection of erroneous links in large-scale link repositories. In: International Conference on Web Intelligence, pp. 106–113. ACM (2017) Valdestilhas, A., Soru, T., Ngomo, A.-C.N.: CEDAL: time-efficient detection of erroneous links in large-scale link repositories. In: International Conference on Web Intelligence, pp. 106–113. ACM (2017)
24.
go back to reference Yang, Z., Algesheimer, R., Tessone, C.: A comparative analysis of community detection algorithms on artificial networks. Sci. Rep. 6, 30750 (2016)CrossRef Yang, Z., Algesheimer, R., Tessone, C.: A comparative analysis of community detection algorithms on artificial networks. Sci. Rep. 6, 30750 (2016)CrossRef
Metadata
Title
Detecting Erroneous Identity Links on the Web Using Network Metrics
Authors
Joe Raad
Wouter Beek
Frank van Harmelen
Nathalie Pernelle
Fatiha Saïs
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-00671-6_23

Premium Partner