Skip to main content
Erschienen in: Journal on Data Semantics 4/2012

01.12.2012 | Original Article

On Link Discovery using a Hybrid Approach

verfasst von: Axel-Cyrille Ngonga Ngomo

Erschienen in: Journal on Data Semantics | Ausgabe 4/2012

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the growth of the Linked Data Web, time-efficient link discovery frameworks have become indispensable for implementing the fourth Linked Data principle, i.e., the provision of links between data sources. Due to the sheer size of the Data Web, detecting links even when using trivial link specifications based on a single property can be time-demanding. Moreover, non-trivial link discovery tasks require complex link specifications and are consequently even more challenging to optimize with respect to runtime. In this paper, we present a hybrid approach to link discovery that allows combining time-efficient algorithms specialized on specific data types. Especially, we present the HYPPO algorithm, which can process numeric data efficiently. These algorithms are combined by using original insights on the translation of complex link specifications to combinations of atomic specifications via a series of operations on sets and filters. We show in nine experiments that our approach outperforms SILK 2.5.1 with respect to runtime by up to four orders of magnitude.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
While it is clear that most knowledge bases should be linked to several other knowledge bases, determining the desirable proportion of links on the Linked Data Cloud remains work in progress.
 
3
An online demo of the framework can be found at http://​limes.​sf.​net.
 
4
Not losing recall is used in the same sense as [11] and means in this context that given a link specification, our approach is guaranteed to find all pairs of source and target instances that abide by the said specification.
 
5
The corresponding link specifications are available as download at http://​aksw.​org/​Projects/​LIMES.
 
6
Note that we consider numerical data to be data with a datatype such that there is a bijective mapping between the set of all elements of these datatypes and the real numbers.
 
7
See http://​limes.​sf.​net. The user manual available at the same page describes the architecture presented herein in more detail.
 
Literatur
1.
Zurück zum Zitat Auer S, Lehmann J, Ngonga Ngomo A-C (2011) Introduction to linked data and its lifecycle on the web. In: Reasoning web, pp 1–75 Auer S, Lehmann J, Ngonga Ngomo A-C (2011) Introduction to linked data and its lifecycle on the web. In: Reasoning web, pp 1–75
2.
Zurück zum Zitat Bayardo RJ, Ma Y, Srikant R (2007) Scaling up all pairs similarity search. In: WWW, pp 131–140 Bayardo RJ, Ma Y, Srikant R (2007) Scaling up all pairs similarity search. In: WWW, pp 131–140
3.
Zurück zum Zitat Ben-David D, Domany T, Tarem A (2010) Enterprise data classification using semantic web technologies. In: ISWC Ben-David D, Domany T, Tarem A (2010) Enterprise data classification using semantic web technologies. In: ISWC
4.
Zurück zum Zitat Bleiholder J, Naumann F (2008) Data fusion. ACM Comput Surv 41(1):1–41CrossRef Bleiholder J, Naumann F (2008) Data fusion. ACM Comput Surv 41(1):1–41CrossRef
5.
Zurück zum Zitat Christen P (2012) A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans Knowl Data Eng 24(9):1537–1555CrossRef Christen P (2012) A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans Knowl Data Eng 24(9):1537–1555CrossRef
6.
Zurück zum Zitat Cudré-Mauroux P, Haghani P, Jost M, Aberer K, de Meer H (2009) idmesh: graph-based disambiguation of linked data. In: WWW, pp 591–600 Cudré-Mauroux P, Haghani P, Jost M, Aberer K, de Meer H (2009) idmesh: graph-based disambiguation of linked data. In: WWW, pp 591–600
7.
Zurück zum Zitat Elmagarmid AK, Ipeirotis PG, Verykios VS (2007) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19:1–16CrossRef Elmagarmid AK, Ipeirotis PG, Verykios VS (2007) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19:1–16CrossRef
9.
Zurück zum Zitat Glaser H, Millard IC, Sung W-K, Lee S, Kim P, You B-J (2009) Research on linked data and co-reference resolution. University of Southampton, Technical Report Glaser H, Millard IC, Sung W-K, Lee S, Kim P, You B-J (2009) Research on linked data and co-reference resolution. University of Southampton, Technical Report
10.
Zurück zum Zitat Hogan A, Polleres A, Umbrich J, Zimmermann A (2010) Some entities are more equal than others: statistical methods to consolidate linked data. In: Workshop on new forms of reasoning for the semantic web: scalable and dynamic (NeFoRS2010) Hogan A, Polleres A, Umbrich J, Zimmermann A (2010) Some entities are more equal than others: statistical methods to consolidate linked data. In: Workshop on new forms of reasoning for the semantic web: scalable and dynamic (NeFoRS2010)
11.
Zurück zum Zitat Isele R, Jentzsch A, Bizer C (2011) Efficient multidimensional blocking for link discovery without losing recall. In: WebDB Isele R, Jentzsch A, Bizer C (2011) Efficient multidimensional blocking for link discovery without losing recall. In: WebDB
12.
Zurück zum Zitat Köpcke H, Thor A, Rahm E (2009) Comparative evaluation of entity resolution approaches with fever. Proc VLDB Endow 2(2):1574–1577 Köpcke H, Thor A, Rahm E (2009) Comparative evaluation of entity resolution approaches with fever. Proc VLDB Endow 2(2):1574–1577
13.
Zurück zum Zitat Lehmann J, Furche T, Grasso G, Ngonga Ngomo A-C, Schallhart C, Sellers A, Unger C, Bühmann L, Gerber D, Höffner K, Liu D, Auer S (2012) Deqa: deep web extraction for question answering. In: Proceedings of ISWC, (to appear) Lehmann J, Furche T, Grasso G, Ngonga Ngomo A-C, Schallhart C, Sellers A, Unger C, Bühmann L, Gerber D, Höffner K, Liu D, Auer S (2012) Deqa: deep web extraction for question answering. In: Proceedings of ISWC, (to appear)
14.
Zurück zum Zitat Lopez V, Uren V, Sabou MR, Motta E (2009) Cross ontology query answering on the semantic web: an initial evaluation. In: K-CAP ’09: proceedings of the fifth international conference on knowledge capture, New York, NY, USA. ACM, pp 17–24 Lopez V, Uren V, Sabou MR, Motta E (2009) Cross ontology query answering on the semantic web: an initial evaluation. In: K-CAP ’09: proceedings of the fifth international conference on knowledge capture, New York, NY, USA. ACM, pp 17–24
15.
Zurück zum Zitat Manlove D, Irving R, Iwama K, Miyazaki S, Morita Y (2002) Hard variants of stable marriage. Theor Comput Sci 276(1–2):261–279MathSciNetMATHCrossRef Manlove D, Irving R, Iwama K, Miyazaki S, Morita Y (2002) Hard variants of stable marriage. Theor Comput Sci 276(1–2):261–279MathSciNetMATHCrossRef
16.
Zurück zum Zitat Ngonga Ngomo A-C (2011) A time-efficient hybrid approach to link discovery. In: Sixth international workshop on ontology matching at ISWC Ngonga Ngomo A-C (2011) A time-efficient hybrid approach to link discovery. In: Sixth international workshop on ontology matching at ISWC
17.
Zurück zum Zitat Ngonga Ngomo A-C, Auer S (2011) Limes: a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of the international joint conference on artificial intelligence Ngonga Ngomo A-C, Auer S (2011) Limes: a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of the international joint conference on artificial intelligence
18.
Zurück zum Zitat Ngonga Ngomo A-C, Lehmann J, Auer S, Höffner K (2011) RAVEN: active learning of link specifications. In: Proceedings of the sixth international ontology matching workshop Ngonga Ngomo A-C, Lehmann J, Auer S, Höffner K (2011) RAVEN: active learning of link specifications. In: Proceedings of the sixth international ontology matching workshop
19.
Zurück zum Zitat Ngonga Ngomo A-C, Lyko K (2012) Eagle: efficient active learning of link specifications using genetic programming. In: Proceedings of ESWC Ngonga Ngomo A-C, Lyko K (2012) Eagle: efficient active learning of link specifications using genetic programming. In: Proceedings of ESWC
20.
Zurück zum Zitat Nikolov A, D’Aquin M, Motta E (2012) Unsupervised learning of data linking configuration. In: Proceedings of ESWC Nikolov A, D’Aquin M, Motta E (2012) Unsupervised learning of data linking configuration. In: Proceedings of ESWC
21.
Zurück zum Zitat Nikolov A, Uren VS, Motta E, De Roeck AN (2009) Overcoming schema heterogeneity between linked semantic repositories to improve coreference resolution. In: ASWC, pp 332–346 Nikolov A, Uren VS, Motta E, De Roeck AN (2009) Overcoming schema heterogeneity between linked semantic repositories to improve coreference resolution. In: ASWC, pp 332–346
22.
Zurück zum Zitat Papadakis G, Ioannou E, Niedere C, Palpanasz T, Nejdl W (2011) Eliminating the redundancy in blocking-based entity resolution methods. In: JCDL Papadakis G, Ioannou E, Niedere C, Palpanasz T, Nejdl W (2011) Eliminating the redundancy in blocking-based entity resolution methods. In: JCDL
23.
Zurück zum Zitat Raimond Y, Sutton C, Sandler M (2008) Automatic interlinking of music datasets on the semantic web. In: Proceedings of the 1st workshop about linked data on the web Raimond Y, Sutton C, Sandler M (2008) Automatic interlinking of music datasets on the semantic web. In: Proceedings of the 1st workshop about linked data on the web
24.
Zurück zum Zitat Scharffe F, Liu Y, Zhou C (2009) Rdf-ai: an architecture for rdf datasets matching, fusion and interlink. In: Proceedings of IJCAI 2009 workshop on identity, reference, and knowledge representation (IR-KR), Pasadena (CA US) Scharffe F, Liu Y, Zhou C (2009) Rdf-ai: an architecture for rdf datasets matching, fusion and interlink. In: Proceedings of IJCAI 2009 workshop on identity, reference, and knowledge representation (IR-KR), Pasadena (CA US)
25.
Zurück zum Zitat Sleeman J, Finin T (2010) Computing foaf co-reference relations with rules and machine learning. In: Proceedings of the third international workshop on social data on the web Sleeman J, Finin T (2010) Computing foaf co-reference relations with rules and machine learning. In: Proceedings of the third international workshop on social data on the web
26.
Zurück zum Zitat Urbani J, Kotoulas S, Maassen J, van Harmelen F, Bal H (2010) Owl reasoning with webpie: calculating the closure of 100 billion triples. In: Proceedings of the ESWC 2010 Urbani J, Kotoulas S, Maassen J, van Harmelen F, Bal H (2010) Owl reasoning with webpie: calculating the closure of 100 billion triples. In: Proceedings of the ESWC 2010
27.
Zurück zum Zitat Volz J, Bizer C, Gaedke M, Kobilarov G (2009) Discovering and maintaining links on the web of data. In: ISWC, pp 650–665 Volz J, Bizer C, Gaedke M, Kobilarov G (2009) Discovering and maintaining links on the web of data. In: ISWC, pp 650–665
28.
Zurück zum Zitat Wang J, Li G, Feng J (2010) Trie-join: efficient trie-based string similarity joins with edit-distance constraints. PVLDB 3(1):1219–1230 Wang J, Li G, Feng J (2010) Trie-join: efficient trie-based string similarity joins with edit-distance constraints. PVLDB 3(1):1219–1230
29.
Zurück zum Zitat Winkler W (2006) Overview of record linkage and current research directions. Technical Report, Bureau of the Census, Research Report Series Winkler W (2006) Overview of record linkage and current research directions. Technical Report, Bureau of the Census, Research Report Series
30.
Zurück zum Zitat Xiao C, Wang W, Lin X (2008) Ed-join: an efficient algorithm for similarity joins with edit distance constraints. Proc VLDB Endow 1(1):933–944 Xiao C, Wang W, Lin X (2008) Ed-join: an efficient algorithm for similarity joins with edit distance constraints. Proc VLDB Endow 1(1):933–944
31.
Zurück zum Zitat Xiao C, Wang W, Lin X, Yu JX (2008) Efficient similarity joins for near duplicate detection. In: WWW, pp 131–140 Xiao C, Wang W, Lin X, Yu JX (2008) Efficient similarity joins for near duplicate detection. In: WWW, pp 131–140
Metadaten
Titel
On Link Discovery using a Hybrid Approach
verfasst von
Axel-Cyrille Ngonga Ngomo
Publikationsdatum
01.12.2012
Verlag
Springer-Verlag
Erschienen in
Journal on Data Semantics / Ausgabe 4/2012
Print ISSN: 1861-2032
Elektronische ISSN: 1861-2040
DOI
https://doi.org/10.1007/s13740-012-0012-y