Skip to main content
Erschienen in:
Buchtitelbild

2019 | OriginalPaper | Buchkapitel

1. Method of Big-Graph Partitioning Using a Skeleton Graph

verfasst von : Iztok Savnik, Kiyoshi Nitta

Erschienen in: Exploring the DataFlow Supercomputing Paradigm

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a new method of graph partitioning for big graphs that include a conceptual schema. The conceptual schema of a graph database, called a schema graph, is defined implicitly as part of the graph database itself. A graph database is stored in a distributed triple-store, i.e., a distributed database system for managing graph edges represented by triples. We define the statistics of a graph database on the basis of the schema graph. The statistics are gathered for all schema triples, i.e., the types of graph edges. The space of the schema triples is partially ordered by the is-more-general relationship that is defined through the class and predicate taxonomies. The graph partitioning method has two phases. A skeleton graph of the triple-store is computed in the first phase. The skeleton graph is composed of the set of schema triples that have the extensions of an appropriate size to serve as the fragments of the distribution. The edges of the skeleton graph are selected in a top-down manner, i.e., from the most general schema triple to more specific schema triples. The edges of the skeleton graph are clustered into n partitions in the second phase of the algorithm. The function distance that is used in the clustering algorithm is based on the statistics of the schema triples. The graph partitioning function maps each schema triple from the skeleton graph to its partition, stored on a separate data server. The partitioning function is well defined in that it maps the types of the triple-patterns to k fragments such that k corresponds to the size of the portions of the triple-store addressed by the triple-patterns. In other words, it maps the types of triple-patterns that address a large number of triples to multiple distributed fragments, and the types of triple-patterns that address few triples to a single fragment.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The most general type depends on the design of the metadata included in a given triple-store (knowledge graph).
 
Literatur
1.
Zurück zum Zitat Armstrong J (2013) Programming erlang: software for a concurrent world. Pragmatic Bookshelf Armstrong J (2013) Programming erlang: software for a concurrent world. Pragmatic Bookshelf
2.
Zurück zum Zitat Baader F, Calvanese D, McGuinness D, Nardi D, Patel-Schneider P (2002) Description logic handbook. Cambridge University Press, CambridgeMATH Baader F, Calvanese D, McGuinness D, Nardi D, Patel-Schneider P (2002) Description logic handbook. Cambridge University Press, CambridgeMATH
3.
Zurück zum Zitat Babu S, Herodotou H (2012) Massively parallel databases and mapreduce systems. Found Trendsin Databases 5(1):1–104CrossRef Babu S, Herodotou H (2012) Massively parallel databases and mapreduce systems. Found Trendsin Databases 5(1):1–104CrossRef
4.
Zurück zum Zitat Blagojevic V et al (2016) A systematic approach to generation of new ideas for phd research in computing. Adv Comput 104:1–19 Blagojevic V et al (2016) A systematic approach to generation of new ideas for phd research in computing. Adv Comput 104:1–19
5.
Zurück zum Zitat Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(4)CrossRef Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(4)CrossRef
6.
Zurück zum Zitat Codd EF (1970) A relational model of data for large shared data banks. Commun ACM 13(6):377–387CrossRef Codd EF (1970) A relational model of data for large shared data banks. Commun ACM 13(6):377–387CrossRef
7.
Zurück zum Zitat DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database processing. Commun ACM 36(6) DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database processing. Commun ACM 36(6)
8.
Zurück zum Zitat DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: amazon’s highly available key-value store. In: Proceedings of the twenty-first ACM SIGOPS symposium on operating systems principles, SOSP ’07. ACM, New York, pp 205–220 DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: amazon’s highly available key-value store. In: Proceedings of the twenty-first ACM SIGOPS symposium on operating systems principles, SOSP ’07. ACM, New York, pp 205–220
9.
Zurück zum Zitat Dong XL, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: KDD-2014, KDD. ACM, New York Dong XL, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: KDD-2014, KDD. ACM, New York
11.
Zurück zum Zitat Graefe G (1993) Query evaluation techniques for large databases. ACM Comput Surv 25(2):73–169CrossRef Graefe G (1993) Query evaluation techniques for large databases. ACM Comput Surv 25(2):73–169CrossRef
12.
Zurück zum Zitat Graefe G (2000) Dynamic query evaluation plans: some course corrections? IEEE Data Eng Bull 23(2):3–6 Graefe G (2000) Dynamic query evaluation plans: some course corrections? IEEE Data Eng Bull 23(2):3–6
13.
Zurück zum Zitat Harris S, Lamb N, Shadbolt N (2009) 4store: the design and implementation of a clustered rdf store. In: Proceedings of the 5th international workshop on scalable semantic web knowledge base systems Harris S, Lamb N, Shadbolt N (2009) 4store: the design and implementation of a clustered rdf store. In: Proceedings of the 5th international workshop on scalable semantic web knowledge base systems
14.
Zurück zum Zitat Harth A, Umbrich J, Hogan A, Decker S (2007) Yars2: a federated repository for querying graph structured data from the web. In: Aberer K, Choi K-S, Noy N, Allemang D, Lee K-I, Nixon L, Golbeck J, Mika P, Maynard D, Mizoguchi R, Schreiber G, Cudré-Mauroux P (eds) The semantic web. Springer, Berlin, pp 211–224CrossRef Harth A, Umbrich J, Hogan A, Decker S (2007) Yars2: a federated repository for querying graph structured data from the web. In: Aberer K, Choi K-S, Noy N, Allemang D, Lee K-I, Nixon L, Golbeck J, Mika P, Maynard D, Mizoguchi R, Schreiber G, Cudré-Mauroux P (eds) The semantic web. Springer, Berlin, pp 211–224CrossRef
15.
Zurück zum Zitat Hendrickson B, Leland R (1992) An improved spectral graph partitioning algorithm for mapping parallel computations. Technical report SAND92-1460, Sandia National Laboratories, Albuquerque Hendrickson B, Leland R (1992) An improved spectral graph partitioning algorithm for mapping parallel computations. Technical report SAND92-1460, Sandia National Laboratories, Albuquerque
16.
Zurück zum Zitat Hoffart J, Suchanek FM, Berberich K, Weikum G (2013) Yago2: a spatially and temporally enhanced knowledge base from wikipedia. Artif Intell 194:28–61. Artificial intelligence, wikipedia and semi-structured resourcesMathSciNetCrossRef Hoffart J, Suchanek FM, Berberich K, Weikum G (2013) Yago2: a spatially and temporally enhanced knowledge base from wikipedia. Artif Intell 194:28–61. Artificial intelligence, wikipedia and semi-structured resourcesMathSciNetCrossRef
17.
Zurück zum Zitat Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRef Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRef
18.
Zurück zum Zitat Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, STOC ’97. ACM, New York, pp 654–663 Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, STOC ’97. ACM, New York, pp 654–663
19.
Zurück zum Zitat Lee K, Liu L (2013) Scaling queries over big rdf graphs with semantic hash partitioning. Proc VLDB Endow 6(14):1894–1905CrossRef Lee K, Liu L (2013) Scaling queries over big rdf graphs with semantic hash partitioning. Proc VLDB Endow 6(14):1894–1905CrossRef
20.
Zurück zum Zitat Neumann T, Weikum G (2008) Rdf-3x: a risc-style engine for rdf. Proc VLDB Endow 1(1):647–659CrossRef Neumann T, Weikum G (2008) Rdf-3x: a risc-style engine for rdf. Proc VLDB Endow 1(1):647–659CrossRef
21.
Zurück zum Zitat Nitta K, Savnik I (2014) A distributed query execution method for rdf storage managers. In: Liebig T, Fokoue A (eds) International workshop on scalable semantic web knowledge base systems (SSWS2014), CEUR workshop proceedings, vol 1261, Aachen, pp 45–60 Nitta K, Savnik I (2014) A distributed query execution method for rdf storage managers. In: Liebig T, Fokoue A (eds) International workshop on scalable semantic web knowledge base systems (SSWS2014), CEUR workshop proceedings, vol 1261, Aachen, pp 45–60
22.
Zurück zum Zitat OpenLink software documentation team (2009). OpenLink virtuoso universal server: documentation OpenLink software documentation team (2009). OpenLink virtuoso universal server: documentation
23.
Zurück zum Zitat Owens A, Seaborne A, Gibbins N, mc schraefel (2009) Clustered tdb: a clustered triple store for jena. In: WWW2009 Owens A, Seaborne A, Gibbins N, mc schraefel (2009) Clustered tdb: a clustered triple store for jena. In: WWW2009
24.
Zurück zum Zitat Owl 2 web ontology language (2012). http://www.w3.org/TR/owl2-overview/ Owl 2 web ontology language (2012). http://​www.​w3.​org/​TR/​owl2-overview/​
25.
Zurück zum Zitat Özsu MT, Valduriez P (1999) Principles of distributed database systems, 2nd edn. Prentice-Hall Inc., New York Özsu MT, Valduriez P (1999) Principles of distributed database systems, 2nd edn. Prentice-Hall Inc., New York
26.
Zurück zum Zitat Pothen A, Simon HD, Wang L, Bernard S (1992) Towards a fast implementation of spectral nested dissection. In: Supercomputing ’92 proceedings. IEEE Computer Society Press, pp 42–51 Pothen A, Simon HD, Wang L, Bernard S (1992) Towards a fast implementation of spectral nested dissection. In: Supercomputing ’92 proceedings. IEEE Computer Society Press, pp 42–51
29.
Zurück zum Zitat Savnik I, Nitta K (2016) Algebra of rdf graphs for querying large-scale distributed triple-store. In: Proceedings of the CD-ARES. Lecture notes in computer science, vol 9817. Springer, Berlin Savnik I, Nitta K (2016) Algebra of rdf graphs for querying large-scale distributed triple-store. In: Proceedings of the CD-ARES. Lecture notes in computer science, vol 9817. Springer, Berlin
30.
Zurück zum Zitat Savnik I, Nitta K (2017) Type inference of the graph patterns. University of Primorska, Technical report (In preparation), FAMNIT Savnik I, Nitta K (2017) Type inference of the graph patterns. University of Primorska, Technical report (In preparation), FAMNIT
31.
Zurück zum Zitat Savnik I, Nitta K, Krnc M, Skrekovski R (2017) Triple-store statistics based on conceptual schemata. Technical report submitted, FAMNIT, University of Primorska Savnik I, Nitta K, Krnc M, Skrekovski R (2017) Triple-store statistics based on conceptual schemata. Technical report submitted, FAMNIT, University of Primorska
32.
Zurück zum Zitat Stonebraker M (1986) The case for shared nothing. Database Eng Bull 9(1):4–9 Stonebraker M (1986) The case for shared nothing. Database Eng Bull 9(1):4–9
33.
Zurück zum Zitat White T (2011) Hadoop: the definitive guide, 2nd edn. O’Reilly Media Inc., CA, USA White T (2011) Hadoop: the definitive guide, 2nd edn. O’Reilly Media Inc., CA, USA
34.
Zurück zum Zitat Xu R, Wunsch D (2005) II. Survey of clustering algorithms. Trans Neur Netw 16(3):645–678CrossRef Xu R, Wunsch D (2005) II. Survey of clustering algorithms. Trans Neur Netw 16(3):645–678CrossRef
35.
36.
Zurück zum Zitat Xu Q, Wang X, Wang J, Yang Y, Feng Z (2017) Semantic-aware partitioning on rdf graphs. In: Chen L, Jensen CS, Shahabi C, Yang X, Lian X (eds) Web and big data. Springer International Publishing, Cham, pp 149–157CrossRef Xu Q, Wang X, Wang J, Yang Y, Feng Z (2017) Semantic-aware partitioning on rdf graphs. In: Chen L, Jensen CS, Shahabi C, Yang X, Lian X (eds) Web and big data. Springer International Publishing, Cham, pp 149–157CrossRef
37.
Zurück zum Zitat Zeng K, Yang J, Wang H, Shao B, Wang Z (2013) A distributed graph engine for web scale rdf data. Proc VLDB Endow 6(4):265–276CrossRef Zeng K, Yang J, Wang H, Shao B, Wang Z (2013) A distributed graph engine for web scale rdf data. Proc VLDB Endow 6(4):265–276CrossRef
Metadaten
Titel
Method of Big-Graph Partitioning Using a Skeleton Graph
verfasst von
Iztok Savnik
Kiyoshi Nitta
Copyright-Jahr
2019
DOI
https://doi.org/10.1007/978-3-030-13803-5_1

Premium Partner