nach oben

Erschienen in:

2019 | OriginalPaper | Buchkapitel

1. Method of Big-Graph Partitioning Using a Skeleton Graph

verfasst von : Iztok Savnik, Kiyoshi Nitta

Erschienen in: Exploring the DataFlow Supercomputing Paradigm

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We propose a new method of graph partitioning for big graphs that include a conceptual schema. The conceptual schema of a graph database, called a schema graph, is defined implicitly as part of the graph database itself. A graph database is stored in a distributed triple-store, i.e., a distributed database system for managing graph edges represented by triples. We define the statistics of a graph database on the basis of the schema graph. The statistics are gathered for all schema triples, i.e., the types of graph edges. The space of the schema triples is partially ordered by the is-more-general relationship that is defined through the class and predicate taxonomies. The graph partitioning method has two phases. A skeleton graph of the triple-store is computed in the first phase. The skeleton graph is composed of the set of schema triples that have the extensions of an appropriate size to serve as the fragments of the distribution. The edges of the skeleton graph are selected in a top-down manner, i.e., from the most general schema triple to more specific schema triples. The edges of the skeleton graph are clustered into n partitions in the second phase of the algorithm. The function distance that is used in the clustering algorithm is based on the statistics of the schema triples. The graph partitioning function maps each schema triple from the skeleton graph to its partition, stored on a separate data server. The partitioning function is well defined in that it maps the types of the triple-patterns to k fragments such that k corresponds to the size of the portions of the triple-store addressed by the triple-patterns. In other words, it maps the types of triple-patterns that address a large number of triples to multiple distributed fragments, and the types of triple-patterns that address few triples to a single fragment.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nächstes Kapitel On Cloud-Supported Web-Based Integrated Development Environment for Programming DataFlow Architectures

The most general type depends on the design of the metadata included in a given triple-store (knowledge graph).

Armstrong J (2013) Programming erlang: software for a concurrent world. Pragmatic Bookshelf

Baader F, Calvanese D, McGuinness D, Nardi D, Patel-Schneider P (2002) Description logic handbook. Cambridge University Press, CambridgeMATH

Babu S, Herodotou H (2012) Massively parallel databases and mapreduce systems. Found Trendsin Databases 5(1):1–104CrossRef

Blagojevic V et al (2016) A systematic approach to generation of new ideas for phd research in computing. Adv Comput 104:1–19

Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(4)CrossRef

Codd EF (1970) A relational model of data for large shared data banks. Commun ACM 13(6):377–387CrossRef

DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database processing. Commun ACM 36(6)

DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: amazon’s highly available key-value store. In: Proceedings of the twenty-first ACM SIGOPS symposium on operating systems principles, SOSP ’07. ACM, New York, pp 205–220

Dong XL, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: KDD-2014, KDD. ACM, New York

10.

Garlik SH, Seaborne A, Prud’hommeaux E (2013) SPARQL 1.1 Query language. http://www.w3.org/TR/sparql11-query/

11.

Graefe G (1993) Query evaluation techniques for large databases. ACM Comput Surv 25(2):73–169CrossRef

12.

Graefe G (2000) Dynamic query evaluation plans: some course corrections? IEEE Data Eng Bull 23(2):3–6

13.

Harris S, Lamb N, Shadbolt N (2009) 4store: the design and implementation of a clustered rdf store. In: Proceedings of the 5th international workshop on scalable semantic web knowledge base systems

14.

Harth A, Umbrich J, Hogan A, Decker S (2007) Yars2: a federated repository for querying graph structured data from the web. In: Aberer K, Choi K-S, Noy N, Allemang D, Lee K-I, Nixon L, Golbeck J, Mika P, Maynard D, Mizoguchi R, Schreiber G, Cudré-Mauroux P (eds) The semantic web. Springer, Berlin, pp 211–224CrossRef

15.

Hendrickson B, Leland R (1992) An improved spectral graph partitioning algorithm for mapping parallel computations. Technical report SAND92-1460, Sandia National Laboratories, Albuquerque

16.

Hoffart J, Suchanek FM, Berberich K, Weikum G (2013) Yago2: a spatially and temporally enhanced knowledge base from wikipedia. Artif Intell 194:28–61. Artificial intelligence, wikipedia and semi-structured resourcesMathSciNetCrossRef

17.

Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRef

18.

Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, STOC ’97. ACM, New York, pp 654–663

19.

Lee K, Liu L (2013) Scaling queries over big rdf graphs with semantic hash partitioning. Proc VLDB Endow 6(14):1894–1905CrossRef

20.

Neumann T, Weikum G (2008) Rdf-3x: a risc-style engine for rdf. Proc VLDB Endow 1(1):647–659CrossRef

21.

Nitta K, Savnik I (2014) A distributed query execution method for rdf storage managers. In: Liebig T, Fokoue A (eds) International workshop on scalable semantic web knowledge base systems (SSWS2014), CEUR workshop proceedings, vol 1261, Aachen, pp 45–60

22.

OpenLink software documentation team (2009). OpenLink virtuoso universal server: documentation

23.

Owens A, Seaborne A, Gibbins N, mc schraefel (2009) Clustered tdb: a clustered triple store for jena. In: WWW2009

24.

Owl 2 web ontology language (2012). http://www.w3.org/TR/owl2-overview/

25.

Özsu MT, Valduriez P (1999) Principles of distributed database systems, 2nd edn. Prentice-Hall Inc., New York

26.

Pothen A, Simon HD, Wang L, Bernard S (1992) Towards a fast implementation of spectral nested dissection. In: Supercomputing ’92 proceedings. IEEE Computer Society Press, pp 42–51

27.

Resource description framework (rdf) (2004). http://www.w3.org/RDF/

28.

Rdf schema (2004). http://www.w3.org/TR/rdf-schema/

29.

Savnik I, Nitta K (2016) Algebra of rdf graphs for querying large-scale distributed triple-store. In: Proceedings of the CD-ARES. Lecture notes in computer science, vol 9817. Springer, Berlin

30.

Savnik I, Nitta K (2017) Type inference of the graph patterns. University of Primorska, Technical report (In preparation), FAMNIT

31.

Savnik I, Nitta K, Krnc M, Skrekovski R (2017) Triple-store statistics based on conceptual schemata. Technical report submitted, FAMNIT, University of Primorska

32.

Stonebraker M (1986) The case for shared nothing. Database Eng Bull 9(1):4–9

33.

White T (2011) Hadoop: the definitive guide, 2nd edn. O’Reilly Media Inc., CA, USA

34.

Xu R, Wunsch D (2005) II. Survey of clustering algorithms. Trans Neur Netw 16(3):645–678CrossRef

35.

Xu D, Yingjie T (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165–193MathSciNetCrossRef

36.

Xu Q, Wang X, Wang J, Yang Y, Feng Z (2017) Semantic-aware partitioning on rdf graphs. In: Chen L, Jensen CS, Shahabi C, Yang X, Lian X (eds) Web and big data. Springer International Publishing, Cham, pp 149–157CrossRef

37.

Zeng K, Yang J, Wang H, Shao B, Wang Z (2013) A distributed graph engine for web scale rdf data. Proc VLDB Endow 6(4):265–276CrossRef

Titel: Method of Big-Graph Partitioning Using a Skeleton Graph
verfasst von: Iztok Savnik
Kiyoshi Nitta
Verlag: Springer International Publishing
Buch: Exploring the DataFlow Supercomputing Paradigm
Print ISBN: 978-3-030-13802-8

Electronic ISBN: 978-3-030-13803-5

Copyright-Jahr: 2019
DOI: https://doi.org/10.1007/978-3-030-13803-5_1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner