Abstract
Graph pattern matching is typically defined in terms of subgraph isomorphism, which makes it an np-complete problem. Moreover, it requires bijective functions, which are often too restrictive to characterize patterns in emerging applications. We propose a class of graph patterns, in which an edge denotes the connectivity in a data graph within a predefined number of hops. In addition, we define matching based on a notion of bounded simulation, an extension of graph simulation. We show that with this revision, graph pattern matching can be performed in cubic-time, by providing such an algorithm. We also develop algorithms for incrementally finding matches when data graphs are updated, with performance guarantees for dag patterns. We experimentally verify that these algorithms scale well, and that the revised notion of graph pattern matching allows us to identify communities commonly found in real-world networks.
- S. Amer-Yahia, M. Benedikt, and P. Bohannon. Challenges in searching online communities. IEEE Data Eng. Bull., 30(2):23--31, 2007.Google Scholar
- J. Bang-Jensen and G. Z. Gutin. Digraphs: Theory, Algorithms and Applications. Springer, 2008. Google ScholarDigital Library
- T. Y. Berger-Wolf and J. Saia. A framework for analysis of dynamic social networks. In KDD, 2006. Google ScholarDigital Library
- N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. In SIGMOD, 2002. Google ScholarDigital Library
- E. P. Chan and H. Lim. Optimization and evaluation of shortest path queries. VLDB J., 16(3):343--369, 2007. Google ScholarDigital Library
- L. Chen, A. Gupta, and M. E. Kurul. Stack-based algorithms for pattern matching on dags. In VLDB, 2005. Google ScholarDigital Library
- J. Cheng, J. X. Yu, B. Ding, P. S. Yu, and H. Wang. Fast graph pattern matching. In ICDE, 2008. Google ScholarDigital Library
- J. Cheng, J. X. Yu, X. Lin, H. Wang, and P. S. Yu. Fast computing reachability labelings for large graphs with high compression rate. In EDBT, 2008. Google ScholarDigital Library
- J. Cho, N. Shivakumar, and H. Garcia-Molina. Finding replicated Web collections. SIGMOD Rec., 29(2), 2000. Google ScholarDigital Library
- E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SICOMP, 32(5), 2003. Google ScholarDigital Library
- W. Fan and P. Bohannon. Information preserving XML schema embedding. TODS, 33(1), 2008. Google ScholarDigital Library
- W. Fan, J. Li, S. Ma, H. Wang, and Y. Wu. Graph homomorphism revisited for graph matching. PVLDB, 3, 2010. Google ScholarDigital Library
- B. Gallagher. Matching structure and semantics: A survey on graph-based pattern matching. AAAI FS., 2006.Google Scholar
- H. He and A. K. Singh. Graphs-at-a-time: query language and access methods for graph databases. In SIGMOD, 2009. Google ScholarDigital Library
- M. R. Henzinger, T. Henzinger, and P. Kopke. Computing simulations on finite and infinite graphs. In FOCS, 1995. Google ScholarDigital Library
- R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In SIGMOD, 2009. Google ScholarDigital Library
- T. Milo and D. Suciu. Index structures for path expressions. In ICDT, 1999. Google ScholarDigital Library
- L. D. Nardo, F. Ranzato, and F. Tapparo. The subgraph similarity problem. TKDE, 21(5):748--749, 2009. Google ScholarDigital Library
- M. Natarajan. Understanding the structure of a drug trafficking organization: a conversational analysis. Crime Prevention Studies, 11:273--298, 2000.Google Scholar
- G. Ramalingam and T. Reps. An incremental algorithm for a generalization of the shortest-path problem. J. Algorithms, 21(2):267--305, 1996. Google ScholarDigital Library
- G. Ramalingam and T. Reps. On the computational complexity of dynamic graph problems. TCS, 158(1--2), 1996. Google ScholarDigital Library
- G. Ramalingam and T. W. Reps. A categorized bibliography on incremental computation. In POPL, 1993. Google ScholarDigital Library
- R. Ronen and O. Shmueli. SoQL: A language for querying and creating data in social networks. In ICDE, 2009. Google ScholarDigital Library
- D. Saha. An incremental bisimulation algorithm. In FSTTCS, 2007. Google ScholarDigital Library
- D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS, 2002. Google ScholarDigital Library
- L. Terveen and D. W. McDonald. Social matching: A framework and research agenda. ACM Trans. Comput.-Hum. Interact., 12(3), 2005. Google ScholarDigital Library
- H. Tong, C. Faloutsos, B. Gallagher, and T. Eliassi-Rad. Fast best-effort pattern matching in large attributed graphs. In KDD, 2007. Google ScholarDigital Library
- J. R. Ullmann. An algorithm for subgraph isomorphism. J. ACM, 23(1), 1976. Google ScholarDigital Library
- H. Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: Answering graph reachability queries in constant time. In ICDE, 2006. Google ScholarDigital Library
- K. Yi, H. He, I. Stanoi, and J. Yang. Incremental maintenance of XML structural indexes. In SIGMOD, 2004. Google ScholarDigital Library
- L. Zou, L. Chen, and M. T. Özsu. Distance-join: Pattern match query in a large graph database. In VLDB, 2009. Google ScholarDigital Library
Index Terms
- Graph pattern matching: from intractable to polynomial time
Recommendations
Incremental graph pattern matching
Graph pattern matching is commonly used in a variety of emerging applications such as social network analysis. These applications highlight the need for studying the following two issues. First, graph pattern matching is traditionally defined in terms ...
Induced Matching Extendable Graph Powers
A graph G is called induced matching extendable (shortly, IM-extendable) if every induced matching of G is included in a perfect matching of G. A graph G is called strongly IM-extendable if every spanning supergraph of G is IM-extendable. The k-th power ...
Graph Pattern Detection: Hardness for all Induced Patterns and Faster Noninduced Cycles
We consider the pattern detection problem in graphs: given a constant size pattern graph $H$ and a host graph $G$, determine whether $G$ contains a subgraph isomorphic to $H$. We present the following new improved upper and lower bounds: We prove that if a ...
Comments