ABSTRACT
With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this language, graphs are the basic unit of information and each query manipulates one or more collections of graphs. To allow for flexible compositions of graph structures, we extend the notion of formal languages from strings to the graph domain. We present a graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs. Then, we investigate access methods of the selection operator. Pattern matching over large graphs is challenging due to the NP-completeness of subgraph isomorphism. We address this by a combination of techniques: use of neighborhood subgraphs and profiles, joint reduction of the search space, and optimization of the search order. Experimental results on real and synthetic large graphs demonstrate that our graph specific optimizations outperform an SQL-based implementation by orders of magnitude.
- S. Asthana et al. Predicting protein complex membership using probabilistic network reliability. Genome Research, May 2004.Google ScholarCross Ref
- S. Berretti, A. D. Bimbo, and E. Vicario. Efficient matching and indexing of graph models in content-based retrieval. In IEEE Trans. on Pattern Analysis and Machine Intelligence, volume 23, 2001. Google ScholarDigital Library
- S. Boag, D. Chamberlin, M. F. Fernández, D. Florescu, J. Robie, and J. Siméon. XQuery 1.0: An XML query language. W3C, http://www.w3.org/TR/xquery/, 2007.Google Scholar
- C. Branden and J. Tooze. Introduction to protein structure. Garland, 2 edition, 1998.Google Scholar
- S. Chaudhuri. An overview of query optimization in relational systems. In PODS, pages 34--43, 1998. Google ScholarDigital Library
- L. Chen, A. Gupta, and M. E. Kurul. Stack-based algorithms for pattern matching on dags. In Proc. of VLDB '05, pages 493--504, 2005. Google ScholarDigital Library
- J. Cheng, Y. Ke, W. Ng, and A. Lu. FG-Index: towards verification-free query processing on graph databases. In Proc. of SIGMOD '07, 2007. Google ScholarDigital Library
- J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast computation of reachability labeling for large graphs. In EDBT, pages 961--979, 2006. Google ScholarDigital Library
- E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SIAM J. Comput., 32(5):1338--1355, 2003. Google ScholarDigital Library
- M. P. Consens and A. O. Mendelzon. GraphLog: a visual formalism for real life recursion. In PODS, 1990. Google ScholarDigital Library
- P. Erdõs and A. Rényi. On random graphs I. Publ. Math. Debrecen, (6):290--297.Google Scholar
- Gene Ontology. http://www.geneontology.org/.Google Scholar
- R. H. Guting. GraphDB: Modeling and querying graphs in databases. In Proc. of VLDB'94, pages 297--308, 1994. Google ScholarDigital Library
- M. Gyssens, J. Paredaens, and D. van Gucht. A graph-oriented object database model. In Proc. of PODS '90, pages 417--424, 1990. Google ScholarDigital Library
- H. He and A. K. Singh. Closure-Tree: An Index Structure for Graph Queries. In Proc. of ICDE'06, Atlanta, 2006. Google ScholarDigital Library
- J. Hopcroft and R. Karp. An n 5/2 algorithm for maximum matchings in bipartite graphs. SIAM J. Computing, 1973.Google ScholarCross Ref
- J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison Wesley, 1979. Google ScholarDigital Library
- H. V. Jagadish, L. V. S. Lakshmanan, D. Srivastava, and K. Thompson. TAX: A tree algebra for XML. In Proc. of DBPL'01, 2001. Google ScholarDigital Library
- H. Jiang, H. Wang, P. S. Yu, and S. Zhou. GString: A novel approach for efficient search in graph databases. In ICDE, 2007.Google ScholarCross Ref
- J. Lee, J. Oh, and S. Hwang. STRG-Index: Spatio-temporal region graph indexing for large video databases. In Proc. of SIGMOD, 2005. Google ScholarDigital Library
- U. Leser. A query language for biological networks. Bioinformatics, 21:ii33--ii39, 2005. Google ScholarDigital Library
- F. Manola and E. Miller. RDF Primer. W3C, http://www.w3.org/TR/rdf-primer/, 2004.Google Scholar
- E. Prud'hommeaux and A. Seaborne. SPARQL query language for RDF. W3C, http://www.w3.org/TR/rdf-sparql-query/, 2007.Google Scholar
- R. Ramakrishnan and J. Gehrke. Database Management Systems, chapter 24 Deductive Databases. McGraw-Hill, third edition, 2003. Google ScholarDigital Library
- J. Rekers and A. Schurr. A graph grammar approach to graphical parsing. In 11th International IEEE Symposium on Visual Languages, 1995. Google ScholarDigital Library
- G. Rozenberg (Ed.). Handbook on Graph Grammars and Computing by Graph Transformation: Foundations, volume 1. World Scientific, 1997. Google ScholarDigital Library
- R. Schenkel, A. Theobald, and G. Weikum. Efficient creation and incremental maintenance of the HOPI index for complex XML document collections. In Proc. of ICDE '05, pages 360--371, 2005. Google ScholarDigital Library
- N. Shadbolt, T. Berners-Lee, and W. Hall. The semantic web revisited. IEEE Intelligent Systems, 21(3):96--101, 2006. Google ScholarDigital Library
- D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In Proc. of PODS, 2002. Google ScholarDigital Library
- L. Sheng, Z. M. Ozsoyoglu, and G. Ozsoyoglu. A graph query language and its query processing. In ICDE, 1999.Google ScholarDigital Library
- Y. Tian, R. C. McEachin, C. Santos, D. J. States, and J. M. Patel. SAGA: a subgraph matching tool for biological graphs. Bioinformatics, 23(2), 2007. Google ScholarDigital Library
- S. Tri$ßl and U. Leser. Fast and practical indexing and querying of very large graphs. In Proc. of SIGMOD '07, pages 845--856, 2007. Google ScholarDigital Library
- H. Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: Answering graph reachability queries in constant time. In Proc. of ICDE '06, page 75, 2006. Google ScholarDigital Library
- D. W. Williams, J. Huan, and W. Wang. Graph database indexing using structured graph decomposition. In ICDE, 2007.Google ScholarCross Ref
- X. Yan, P. S. Yu, and J. Han. Graph Indexing: A frequent structure-based approach. In Proc. of SIGMOD, 2004. Google ScholarDigital Library
- S. Zhang, M. Hu, and J. Yang. TreePi: A novel graph indexing method. In ICDE, 2007.Google ScholarCross Ref
- P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: Tree delta >= graph. In Proc. of VLDB, pages 938--949, 2007. Google ScholarDigital Library
Index Terms
- Graphs-at-a-time: query language and access methods for graph databases
Recommendations
Generalized quasirandom graphs
We prove that if a sequence of graphs has (asymptotically) the same distribution of small subgraphs as a generalized random graph modeled on a fixed weighted graph H, then these graphs have a structure that is asymptotically the same as the structure of ...
Collapsible graphs and Hamiltonian connectedness of line graphs
Thomassen conjectured that every 4-connected line graph is Hamiltonian. Chen and Lai [Z.-H. Chen, H.-J. Lai, Reduction techniques for super-Eulerian graphs and related topics-an update, in: Ku Tung-Hsin (Ed.), Combinatorics and Graph Theory, vol. 95, ...
Equimatchable Graphs on Surfaces
A graph G is equimatchable if each matching in G is a subset of a maximum-size matching and it is factor critical if G-v has a perfect matching for each vertex v of G. It is known that any 2-connected equimatchable graph is either bipartite or factor ...
Comments