Skip to main content
Erschienen in: Journal on Data Semantics 1/2019

20.02.2019 | Original Article

Dynamic Discovery of Type Classes and Relations in Semantic Web Data

verfasst von: Serkan Ayvaz, Mehmet Aydar

Erschienen in: Journal on Data Semantics | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

With the rapidly growing resource description framework (RDF) data on the Semantic Web, processing large semantic graph data has become more challenging. Constructing a summary graph structure from the raw RDF can help obtain semantic type relations and reduce the computational complexity for graph processing purposes. In this paper, we addressed the problem of graph summarization in RDF graphs, and we proposed an approach for building summary graph structures automatically from RDF graph data based on instance similarities. To scale our approach, we utilized locality-sensitive hashing technique for identifying instance pairs which are candidates to be in the same type class. Moreover, we introduced a measure to help discover optimum class dissimilarity thresholds and an effective method to discover the type classes automatically. In future work, we plan to investigate further improvement options on the scalability of the proposed method.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Adida B, Birbeck M, McCarron S, Pemberton S (2008) RDFa in XHTML: syntax and processing. Recommendation W3C Adida B, Birbeck M, McCarron S, Pemberton S (2008) RDFa in XHTML: syntax and processing. Recommendation W3C
2.
Zurück zum Zitat Alzogbi A, Lausen G (2013) Similar structures inside rdf-graphs. LDOW 996 Alzogbi A, Lausen G (2013) Similar structures inside rdf-graphs. LDOW 996
3.
Zurück zum Zitat Antonellis I, Molina HG, Chang CC (2008) Simrank++: query rewriting through link analysis of the click graph. Proc VLDB Endow 1(1):408–421CrossRef Antonellis I, Molina HG, Chang CC (2008) Simrank++: query rewriting through link analysis of the click graph. Proc VLDB Endow 1(1):408–421CrossRef
4.
Zurück zum Zitat Atre M, Chaoji V, Zaki MJ, Hendler JA (2010) Matrix bit loaded: a scalable lightweight join query processor for rdf data. In: Proceedings of the 19th international conference on World wide web, ACM, pp 41–50 Atre M, Chaoji V, Zaki MJ, Hendler JA (2010) Matrix bit loaded: a scalable lightweight join query processor for rdf data. In: Proceedings of the 19th international conference on World wide web, ACM, pp 41–50
5.
Zurück zum Zitat Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a web of open data. Springer, Berlin Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a web of open data. Springer, Berlin
6.
Zurück zum Zitat Aydar M, Ayvaz S (2018) An improved method of locality-sensitive hashing for scalable instance matching. Knowl Inf Syst pp 1–20 Aydar M, Ayvaz S (2018) An improved method of locality-sensitive hashing for scalable instance matching. Knowl Inf Syst pp 1–20
7.
Zurück zum Zitat Aydar M, Ayvaz S, Melton AC (2015) Automatic weight generation and class predicate stability in rdf summary graphs. In: Workshop on intelligent exploration of semantic data (IESD2015), co-located with ISWC2015, vol 1472 Aydar M, Ayvaz S, Melton AC (2015) Automatic weight generation and class predicate stability in rdf summary graphs. In: Workshop on intelligent exploration of semantic data (IESD2015), co-located with ISWC2015, vol 1472
9.
Zurück zum Zitat Bizer C, Heath T, Berners-Lee T (2009) Linked data-the story so far. Int J Seman Web Inf Syst 5(3):1–22CrossRef Bizer C, Heath T, Berners-Lee T (2009) Linked data-the story so far. Int J Seman Web Inf Syst 5(3):1–22CrossRef
11.
Zurück zum Zitat Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of the compression and complexity of sequences 1997, IEEE, pp 21–29 Broder AZ (1997) On the resemblance and containment of documents. In: Proceedings of the compression and complexity of sequences 1997, IEEE, pp 21–29
12.
Zurück zum Zitat Campinas S, Perry TE, Ceccarelli D, Delbru R, Tummarello G (2012) Introducing rdf graph summary with application to assisted sparql formulation. In: 2012 23rd international workshop on database and expert systems applications, IEEE, pp 261–266 Campinas S, Perry TE, Ceccarelli D, Delbru R, Tummarello G (2012) Introducing rdf graph summary with application to assisted sparql formulation. In: 2012 23rd international workshop on database and expert systems applications, IEEE, pp 261–266
13.
Zurück zum Zitat Castano S, Ferrara A, Montanelli S, Lorusso D (2008) Instance matching for ontology population. In: SEBD, pp 121–132 Castano S, Ferrara A, Montanelli S, Lorusso D (2008) Instance matching for ontology population. In: SEBD, pp 121–132
14.
Zurück zum Zitat Chakrabarti D, Faloutsos C (2006) Graph mining: laws, generators, and algorithms. ACM Comput Surv (CSUR) 38(1):2CrossRef Chakrabarti D, Faloutsos C (2006) Graph mining: laws, generators, and algorithms. ACM Comput Surv (CSUR) 38(1):2CrossRef
15.
Zurück zum Zitat Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 219–228 Chierichetti F, Kumar R, Lattanzi S, Mitzenmacher M, Panconesi A, Raghavan P (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 219–228
16.
Zurück zum Zitat Chu E, Beckmann J, Naughton J (2007) The case for a wide-table approach to manage sparse relational data sets. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, ACM, pp 821–832 Chu E, Beckmann J, Naughton J (2007) The case for a wide-table approach to manage sparse relational data sets. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, ACM, pp 821–832
17.
Zurück zum Zitat Consens MP, Fionda V, Khatchadourian S, Pirro G (2015) S+ epps: construct and explore bisimulation summaries, plus optimize navigational queries; all on existing sparql systems. Proc VLDB Endow 8(12):2028–2031CrossRef Consens MP, Fionda V, Khatchadourian S, Pirro G (2015) S+ epps: construct and explore bisimulation summaries, plus optimize navigational queries; all on existing sparql systems. Proc VLDB Endow 8(12):2028–2031CrossRef
19.
Zurück zum Zitat Pierce D, Booth C, Ogbuji D, Deaton CC, Blackstone E, Lenat D (2012) Semanticdb: a semantic web infrastructure for clinical research and quality reporting. Curr Bioinform 7(3):267–277CrossRef Pierce D, Booth C, Ogbuji D, Deaton CC, Blackstone E, Lenat D (2012) Semanticdb: a semantic web infrastructure for clinical research and quality reporting. Curr Bioinform 7(3):267–277CrossRef
20.
Zurück zum Zitat Duan S, Kementsietsidis A, Srinivas K, Udrea O (2011) Apples and oranges: a comparison of rdf benchmarks and real rdf datasets. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, ACM, pp 145–156 Duan S, Kementsietsidis A, Srinivas K, Udrea O (2011) Apples and oranges: a comparison of rdf benchmarks and real rdf datasets. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data, ACM, pp 145–156
21.
Zurück zum Zitat Fan W, Li J, Wang X, Wu Y (2012) Query preserving graph compression. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data, ACM, pp 157–168 Fan W, Li J, Wang X, Wu Y (2012) Query preserving graph compression. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data, ACM, pp 157–168
22.
Zurück zum Zitat Gaertler M (2005) Clustering. In: Brandes U, Erlebach T (eds) Network analysis. Lecture Notes in computer science, chap. 8, Springer, Berlin, pp 178–215 Gaertler M (2005) Clustering. In: Brandes U, Erlebach T (eds) Network analysis. Lecture Notes in computer science, chap. 8, Springer, Berlin, pp 178–215
24.
Zurück zum Zitat Guo Y, Pan Z, Heflin J (2005) Lubm: a benchmark for owl knowledge base systems. Web Semant Sci Serv Agents World Wide Web 3(2):158–182CrossRef Guo Y, Pan Z, Heflin J (2005) Lubm: a benchmark for owl knowledge base systems. Web Semant Sci Serv Agents World Wide Web 3(2):158–182CrossRef
25.
Zurück zum Zitat He X, Kao MY, Lu HI (2000) A fast general methodology for information-theoretically optimal encodings of graphs. SIAM J Comput 30(3):838–846MathSciNetMATHCrossRef He X, Kao MY, Lu HI (2000) A fast general methodology for information-theoretically optimal encodings of graphs. SIAM J Comput 30(3):838–846MathSciNetMATHCrossRef
26.
Zurück zum Zitat Herrmann K, Voigt H, Lehner W (2014) Cinderella—adaptive online partitioning of irregularly structured data. In: 2014 IEEE 30th international conference on data engineering workshops (ICDEW), IEEE, pp 284–291 Herrmann K, Voigt H, Lehner W (2014) Cinderella—adaptive online partitioning of irregularly structured data. In: 2014 IEEE 30th international conference on data engineering workshops (ICDEW), IEEE, pp 284–291
27.
Zurück zum Zitat Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle RiverMATH Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle RiverMATH
28.
Zurück zum Zitat Jeh G, Widom J (2002) SimRank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 538–543 Jeh G, Widom J (2002) SimRank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 538–543
29.
Zurück zum Zitat Jin R, Lee VE, Hong H (2011) Axiomatic ranking of network role similarity. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 922–930 Jin R, Lee VE, Hong H (2011) Axiomatic ranking of network role similarity. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 922–930
30.
Zurück zum Zitat Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetMATHCrossRef Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetMATHCrossRef
31.
Zurück zum Zitat Khare R, Çelik T (2006) Microformats: a pragmatic path to the semantic web. In: Proceedings of the 15th international conference on world wide web, ACM, pp 865–866 Khare R, Çelik T (2006) Microformats: a pragmatic path to the semantic web. In: Proceedings of the 15th international conference on world wide web, ACM, pp 865–866
32.
Zurück zum Zitat Khatchadourian S, Consens MP (2010) Explod: summary-based exploration of interlinking and rdf usage in the linked open data cloud. In: Extended semantic web conference, vol 272–287, Springer, Berlin, pp 272–287 Khatchadourian S, Consens MP (2010) Explod: summary-based exploration of interlinking and rdf usage in the linked open data cloud. In: Extended semantic web conference, vol 272–287, Springer, Berlin, pp 272–287
33.
Zurück zum Zitat Levinson N (1946) The wiener (root mean square) error criterion in filter design and prediction. J Math Phys 25(1):261–278MathSciNetCrossRef Levinson N (1946) The wiener (root mean square) error criterion in filter design and prediction. J Math Phys 25(1):261–278MathSciNetCrossRef
34.
Zurück zum Zitat Lin Z, Lyu MR, King I (2006) Pagesim: a novel link-based measure of web page aimilarity. In: Proceedings of the 15th international conference on world wide web, ACM, pp 1019–1020 Lin Z, Lyu MR, King I (2006) Pagesim: a novel link-based measure of web page aimilarity. In: Proceedings of the 15th international conference on world wide web, ACM, pp 1019–1020
35.
Zurück zum Zitat Lin, Z., Lyu, MR, King I (2009) Matchsim: a novel neighbor-based similarity measure with maximum neighborhood matching. In: Proceedings of the 18th ACM conference on information and knowledge management, ACM, pp 1613–1616 Lin, Z., Lyu, MR, King I (2009) Matchsim: a novel neighbor-based similarity measure with maximum neighborhood matching. In: Proceedings of the 18th ACM conference on information and knowledge management, ACM, pp 1613–1616
36.
Zurück zum Zitat Luhn HP (1957) A statistical approach to mechanized encoding and searching of literary information. IBM J Res Dev 1(4):309–317MathSciNetCrossRef Luhn HP (1957) A statistical approach to mechanized encoding and searching of literary information. IBM J Res Dev 1(4):309–317MathSciNetCrossRef
37.
Zurück zum Zitat Möller K, Heath T, Handschuh S, Domingue J (2007) Recipes for semantic web dog food—the ESWC and ISWC metadata projects. In: The semantic web, Springer, Berlin, pp 802–815 Möller K, Heath T, Handschuh S, Domingue J (2007) Recipes for semantic web dog food—the ESWC and ISWC metadata projects. In: The semantic web, Springer, Berlin, pp 802–815
39.
Zurück zum Zitat Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026,113CrossRef Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026,113CrossRef
40.
Zurück zum Zitat Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Stanford InfoLab Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: bringing order to the web. Stanford InfoLab
42.
Zurück zum Zitat Palma G, Vidal ME, Raschid L (2014) Drug-target interaction prediction using semantic similarity and edge partitioning. In: International semantic web conference, Springer, Berlin, pp 131–146 Palma G, Vidal ME, Raschid L (2014) Drug-target interaction prediction using semantic similarity and edge partitioning. In: International semantic web conference, Springer, Berlin, pp 131–146
43.
Zurück zum Zitat Parundekar R, Knoblock CA, Ambite JL (2012) Discovering concept coverings in ontologies of linked data sources. In: International semantic web conference, Springer, Berlin, pp 427–443 Parundekar R, Knoblock CA, Ambite JL (2012) Discovering concept coverings in ontologies of linked data sources. In: International semantic web conference, Springer, Berlin, pp 427–443
44.
Zurück zum Zitat Pham MD, Passing L, Erling O, Boncz P (2015) Deriving an emergent relational schema from rdf data. In: Proceedings of the 24th international conference on world wide web, international world wide web conferences steering committee, pp 864–874 Pham MD, Passing L, Erling O, Boncz P (2015) Deriving an emergent relational schema from rdf data. In: Proceedings of the 24th international conference on world wide web, international world wide web conferences steering committee, pp 864–874
45.
Zurück zum Zitat Picalausa F, Luo Y, Fletcher GH, Hidders J, Vansummeren S (2012) A structural approach to indexing triples. In: Extended semantic web conference, Springer, Berlin, pp 406–421 Picalausa F, Luo Y, Fletcher GH, Hidders J, Vansummeren S (2012) A structural approach to indexing triples. In: Extended semantic web conference, Springer, Berlin, pp 406–421
46.
Zurück zum Zitat Raghavan S, Garcia-Molina H (2003) Representing web graphs. In: Proceedings of the 19th international conference on data engineering, 2003, IEEE, pp 405–416 Raghavan S, Garcia-Molina H (2003) Representing web graphs. In: Proceedings of the 19th international conference on data engineering, 2003, IEEE, pp 405–416
47.
Zurück zum Zitat Rajaraman A, Ullman JD (2011) Mining of massive datasets. Cambridge University Press, CambridgeCrossRef Rajaraman A, Ullman JD (2011) Mining of massive datasets. Cambridge University Press, CambridgeCrossRef
48.
Zurück zum Zitat Seddiqui MH, Nath RPD, Aono M (2015) An efficient metric of automatic weight generation for properties in instance matching technique. Int J Web Semant Technol 6(1):1CrossRef Seddiqui MH, Nath RPD, Aono M (2015) An efficient metric of automatic weight generation for properties in instance matching technique. Int J Web Semant Technol 6(1):1CrossRef
49.
Zurück zum Zitat Small H (1973) Co-citation in the scientific literature: a new measure of the relationship between two documents. J Am Soc Inf Sci 24(4):265–269CrossRefMathSciNet Small H (1973) Co-citation in the scientific literature: a new measure of the relationship between two documents. J Am Soc Inf Sci 24(4):265–269CrossRefMathSciNet
50.
Zurück zum Zitat Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21CrossRef Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21CrossRef
51.
Zurück zum Zitat Sun Y, Han J, Yan X, Yu PS, Wu T (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. VLDB–11 Sun Y, Han J, Yan X, Yu PS, Wu T (2011) Pathsim: meta path-based top-k similarity search in heterogeneous information networks. VLDB–11
52.
Zurück zum Zitat Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, ACM, pp 567–580 Tian Y, Hankins RA, Patel JM (2008) Efficient aggregation for graph summarization. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, ACM, pp 567–580
53.
Zurück zum Zitat Tran T, Ladwig G (2010) Structure index for rdf data. In: Workshop on semantic data management Tran T, Ladwig G (2010) Structure index for rdf data. In: Workshop on semantic data management
54.
Zurück zum Zitat Tran T, Wang H, Rudolph S, Cimiano P (2009) Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data. In: ICDE’09. IEEE 25th international conference on data engineering, 2009, IEEE, pp 101–104 Tran T, Wang H, Rudolph S, Cimiano P (2009) Top-k exploration of query candidates for efficient keyword search on graph-shaped (rdf) data. In: ICDE’09. IEEE 25th international conference on data engineering, 2009, IEEE, pp 101–104
55.
Zurück zum Zitat Traverso I, Vidal ME, Kämpgen B, Sure-Vetter Y (2016) Gades: a graph-based semantic similarity measure. In: Proceedings of the 12th international conference on semantic systems, ACM, pp 101–104 Traverso I, Vidal ME, Kämpgen B, Sure-Vetter Y (2016) Gades: a graph-based semantic similarity measure. In: Proceedings of the 12th international conference on semantic systems, ACM, pp 101–104
56.
Zurück zum Zitat Traverso-Ribón I, Palma G, Flores A, Vidal ME (2016) Considering semantics on the discovery of relations in knowledge graphs. In: European knowledge acquisition workshop, Springer, Berlin, pp 666–680 Traverso-Ribón I, Palma G, Flores A, Vidal ME (2016) Considering semantics on the discovery of relations in knowledge graphs. In: European knowledge acquisition workshop, Springer, Berlin, pp 666–680
57.
Zurück zum Zitat Xu X, Yuruk N, Feng Z, Schweiger TA (2007) Scan: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 824–833 Xu X, Yuruk N, Feng Z, Schweiger TA (2007) Scan: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 824–833
58.
Zurück zum Zitat Zhang N, Tian Y, Patel JM (2010) Discovery-driven graph summarization. In: 2010 IEEE 26th international conference on data engineering (ICDE 2010), IEEE, pp 880–891 Zhang N, Tian Y, Patel JM (2010) Discovery-driven graph summarization. In: 2010 IEEE 26th international conference on data engineering (ICDE 2010), IEEE, pp 880–891
59.
Zurück zum Zitat Zou L, Mo J, Chen L, Özsu MT, Zhao D (2011) gstore: answering sparql queries via subgraph matching. Proc VLDB Endow 4(8):482–493CrossRef Zou L, Mo J, Chen L, Özsu MT, Zhao D (2011) gstore: answering sparql queries via subgraph matching. Proc VLDB Endow 4(8):482–493CrossRef
Metadaten
Titel
Dynamic Discovery of Type Classes and Relations in Semantic Web Data
verfasst von
Serkan Ayvaz
Mehmet Aydar
Publikationsdatum
20.02.2019
Verlag
Springer Berlin Heidelberg
Erschienen in
Journal on Data Semantics / Ausgabe 1/2019
Print ISSN: 1861-2032
Elektronische ISSN: 1861-2040
DOI
https://doi.org/10.1007/s13740-019-00102-6