ABSTRACT
One of the most common operations on graph databases is graph pattern matching (e.g., graph isomorphism and more general types of "subgraph pattern matching"). In fact, in some graph query languages every single query is expressed as a graph matching operation. Consequently, there has been a significant amount of research effort in optimizing graph matching operations in graph database systems. As graph databases have scaled in recent years, so too has recent work on scaling graph matching operations. However, the performance of recent proposals for scaling graph pattern matching is limited by the presence of high-degree nodes. These high-degree nodes result in an explosion of intermediate result sizes during query execution, and therefore significant performance bottlenecks. In this paper we present a dedensification technique that losslessly compresses the neighborhood around high-degree nodes. Furthermore, we introduce a query processing technique that enables direct operation of graph query processing operations over the compressed data, without ever having to decompress the data. For pattern matching operations, we show how this technique can be implemented as a layer above existing graph database systems, so that the end-user can benefit from this technique without requiring modifications to the core graph database engine code. Our technique reduces the size of the intermediate result sets during query processing, and thereby improves query performance.
- A.-L. Barabási and R. Albert, "Emergence of scaling in random networks," Science, vol. 286, no. 5439, 1999.Google Scholar
- J. Leskovec, J. M. Kleinberg, and C. Faloutsos, "Graph evolution: Densification and shrinking diameters," TKDD, vol. 1, no. 1, 2007. Google ScholarDigital Library
- R. Meusel, S. Vigna, O. Lehmberg, and C. Bizer, "Graph structure in the web - revisited: a trick of the heavy tail," in WWW, 2014. Google ScholarDigital Library
- C. Weiss, P. Karras, and A. Bernstein, "Hexastore: sextuple indexing for semantic web data management," PVLDB, vol. 1, no. 1, 2008. Google ScholarDigital Library
- T. Neumann and G. Weikum, "The RDF-3X engine for scalable management of RDF data," VLDB J., vol. 19, no. 1, pp. 91--113, 2010. Google ScholarDigital Library
- W. Fan, J. Li, X. Wang, and Y. Wu, "Query preserving graph compression," in SIGMOD, 2012, pp. 157--168. Google ScholarDigital Library
- W. Fan, X. Wang, and Y. Wu, "Querying big graphs within bounded resources," in SIGMOD, 2014, pp. 301--312. Google ScholarDigital Library
- V. Satuluri, S. Parthasarathy, and Y. Ruan, "Local graph sparsification for scalable clustering," in SIGMOD, 2011. Google ScholarDigital Library
- D. A. Spielman and N. Srivastava, "Graph sparsification by effective resistances," SIAM J. Comput., vol. 40, no. 6, 2011. Google ScholarDigital Library
- W. Fan, F. Geerts, Y. Cao, T. Deng, and P. Lu, "Querying big data by accessing small data," in PODS, 2015. Google ScholarDigital Library
- G. Buehrer and K. Chellapilla, "A scalable pattern mining approach to web graph compression with communities," in WSDM, 2008. Google ScholarDigital Library
- D. J. Abadi, A. Marcus, S. Madden, and K. Hollenbach, "SW-Store: a vertically partitioned DBMS for semantic web data management," VLDB J., vol. 18, no. 2, 2009. Google ScholarDigital Library
- A. Gubichev and T. Neumann, "Exploiting the query structure for efficient join ordering in SPARQL queries," in EDBT, 2014.Google Scholar
- L. Zou, M. T. Özsu, L. Chen, X. Shen, R. Huang, and D. Zhao, "gStore: a graph-based SPARQL query engine," VLDB J., vol. 23, no. 4, 2014. Google ScholarDigital Library
- J. Leskovec and A. Krevl, "SNAP Datasets: Stanford large network dataset collection," http://snap.stanford.edu/data.Google Scholar
- P. Boldi and S. Vigna, "The webgraph framework I: compression techniques," in WWW, 2004, pp. 595--602. Google ScholarDigital Library
- F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan, "On compressing social networks," in KDD, 2009. Google ScholarDigital Library
- Y. Lim, U. Kang, and C. Faloutsos, "SlashBurn: Graph compression and mining beyond caveman communities," TKDE, vol. 26, no. 12, pp. 3077--3089, 2014.Google ScholarCross Ref
- X. Yan, P. S. Yu, and J. Han, "Graph indexing: A frequent structure-based approach," in SIGMOD, 2004, pp. 335--346. Google ScholarDigital Library
- B. Shao, H. Wang, and Y. Li, "Trinity: a distributed graph engine on a memory cloud," in SIGMOD, 2013. Google ScholarDigital Library
- J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang, "Fast graph pattern matching," in ICDE, 2008, pp. 913--922. Google ScholarDigital Library
Index Terms
- Scalable Pattern Matching over Compressed Graphs via Dedensification
Recommendations
Incremental graph pattern matching
Graph pattern matching is commonly used in a variety of emerging applications such as social network analysis. These applications highlight the need for studying the following two issues. First, graph pattern matching is traditionally defined in terms ...
Minimal 2-matching-covered graphs
A perfect 2-matching M of a graph G is a spanning subgraph of G such that each component of M is either an edge or a cycle. A graph G is said to be 2-matching-covered if every edge of G lies in some perfect 2-matching of G. A 2-matching-covered graph is ...
Cypher-based Graph Pattern Matching in Gradoop
GRADES'17: Proceedings of the Fifth International Workshop on Graph Data-management Experiences & SystemsGraph pattern matching is an important and challenging operation on graph data. Typical use cases are related to graph analytics. Since analysts are often non-programmers, a graph system will only gain acceptance, if there is a comprehensible way to ...
Comments