Scalable Pattern Matching over Compressed Graphs via Dedensification

Authors:
Antonio Maccioni

Roma Tre University, Rome, Italy

Roma Tre University, Rome, Italy
View Profile

,
Daniel J. Abadi

Yale University, New Haven, CT, USA

Yale University, New Haven, CT, USA
View Profile

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAugust 2016Pages 1755–1764https://doi.org/10.1145/2939672.2939856

Published:13 August 2016Publication History

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 1755–1764

ABSTRACT

One of the most common operations on graph databases is graph pattern matching (e.g., graph isomorphism and more general types of "subgraph pattern matching"). In fact, in some graph query languages every single query is expressed as a graph matching operation. Consequently, there has been a significant amount of research effort in optimizing graph matching operations in graph database systems. As graph databases have scaled in recent years, so too has recent work on scaling graph matching operations. However, the performance of recent proposals for scaling graph pattern matching is limited by the presence of high-degree nodes. These high-degree nodes result in an explosion of intermediate result sizes during query execution, and therefore significant performance bottlenecks. In this paper we present a dedensification technique that losslessly compresses the neighborhood around high-degree nodes. Furthermore, we introduce a query processing technique that enables direct operation of graph query processing operations over the compressed data, without ever having to decompress the data. For pattern matching operations, we show how this technique can be implemented as a layer above existing graph database systems, so that the end-user can benefit from this technique without requiring modifications to the core graph database engine code. Our technique reduces the size of the intermediate result sets during query processing, and thereby improves query performance.

References

A.-L. Barabási and R. Albert, "Emergence of scaling in random networks," Science, vol. 286, no. 5439, 1999.Google Scholar
J. Leskovec, J. M. Kleinberg, and C. Faloutsos, "Graph evolution: Densification and shrinking diameters," TKDD, vol. 1, no. 1, 2007. Google ScholarDigital Library
R. Meusel, S. Vigna, O. Lehmberg, and C. Bizer, "Graph structure in the web - revisited: a trick of the heavy tail," in WWW, 2014. Google ScholarDigital Library
C. Weiss, P. Karras, and A. Bernstein, "Hexastore: sextuple indexing for semantic web data management," PVLDB, vol. 1, no. 1, 2008. Google ScholarDigital Library
T. Neumann and G. Weikum, "The RDF-3X engine for scalable management of RDF data," VLDB J., vol. 19, no. 1, pp. 91--113, 2010. Google ScholarDigital Library
W. Fan, J. Li, X. Wang, and Y. Wu, "Query preserving graph compression," in SIGMOD, 2012, pp. 157--168. Google ScholarDigital Library
W. Fan, X. Wang, and Y. Wu, "Querying big graphs within bounded resources," in SIGMOD, 2014, pp. 301--312. Google ScholarDigital Library
V. Satuluri, S. Parthasarathy, and Y. Ruan, "Local graph sparsification for scalable clustering," in SIGMOD, 2011. Google ScholarDigital Library
D. A. Spielman and N. Srivastava, "Graph sparsification by effective resistances," SIAM J. Comput., vol. 40, no. 6, 2011. Google ScholarDigital Library
W. Fan, F. Geerts, Y. Cao, T. Deng, and P. Lu, "Querying big data by accessing small data," in PODS, 2015. Google ScholarDigital Library
G. Buehrer and K. Chellapilla, "A scalable pattern mining approach to web graph compression with communities," in WSDM, 2008. Google ScholarDigital Library
D. J. Abadi, A. Marcus, S. Madden, and K. Hollenbach, "SW-Store: a vertically partitioned DBMS for semantic web data management," VLDB J., vol. 18, no. 2, 2009. Google ScholarDigital Library
A. Gubichev and T. Neumann, "Exploiting the query structure for efficient join ordering in SPARQL queries," in EDBT, 2014.Google Scholar
L. Zou, M. T. Özsu, L. Chen, X. Shen, R. Huang, and D. Zhao, "gStore: a graph-based SPARQL query engine," VLDB J., vol. 23, no. 4, 2014. Google ScholarDigital Library
J. Leskovec and A. Krevl, "SNAP Datasets: Stanford large network dataset collection," http://snap.stanford.edu/data.Google Scholar
P. Boldi and S. Vigna, "The webgraph framework I: compression techniques," in WWW, 2004, pp. 595--602. Google ScholarDigital Library
F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan, "On compressing social networks," in KDD, 2009. Google ScholarDigital Library
Y. Lim, U. Kang, and C. Faloutsos, "SlashBurn: Graph compression and mining beyond caveman communities," TKDE, vol. 26, no. 12, pp. 3077--3089, 2014.Google ScholarCross Ref
X. Yan, P. S. Yu, and J. Han, "Graph indexing: A frequent structure-based approach," in SIGMOD, 2004, pp. 335--346. Google ScholarDigital Library
B. Shao, H. Wang, and Y. Li, "Trinity: a distributed graph engine on a memory cloud," in SIGMOD, 2013. Google ScholarDigital Library
J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang, "Fast graph pattern matching," in ICDE, 2008, pp. 913--922. Google ScholarDigital Library

Index Terms

Scalable Pattern Matching over Compressed Graphs via Dedensification
1. Information systems
  1. Data management systems

Recommendations

Incremental graph pattern matching

Graph pattern matching is commonly used in a variety of emerging applications such as social network analysis. These applications highlight the need for studying the following two issues. First, graph pattern matching is traditionally defined in terms ...
Read More
Minimal 2-matching-covered graphs

A perfect 2-matching M of a graph G is a spanning subgraph of G such that each component of M is either an edge or a cycle. A graph G is said to be 2-matching-covered if every edge of G lies in some perfect 2-matching of G. A 2-matching-covered graph is ...
Read More
Cypher-based Graph Pattern Matching in Gradoop
GRADES'17: Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems

Graph pattern matching is an important and challenging operation on graph data. Typical use cases are related to graph analytics. Since analysts are often non-programmers, a graph system will only gain acceptance, if there is a comprehensible way to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 August 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
big graphs
graph databases
graph pattern matching
power-law
rdf
scale-free graphs
sparql
Qualifiers
- research-article
Conference

Acceptance Rates
KDD '16 Paper Acceptance Rate66of1,115submissions,6%Overall Acceptance Rate1,133of8,635submissions,13%
More
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 37
  Total Citations
  View Citations
- 946
  Total Downloads
- Downloads (Last 12 months)149
- Downloads (Last 6 weeks)16
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Scalable Pattern Matching over Compressed Graphs via Dedensification

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Incremental graph pattern matching

Minimal 2-matching-covered graphs

Cypher-based Graph Pattern Matching in Gradoop