Abstract
Graph data have become ubiquitous and manipulating them based on similarity is essential for many applications. Graph edit distance is one of the most widely accepted measures to determine similarities between graphs and has extensive applications in the fields of pattern recognition, computer vision etc. Unfortunately, the problem of graph edit distance computation is NP-Hard in general. Accordingly, in this paper we introduce three novel methods to compute the upper and lower bounds for the edit distance between two graphs in polynomial time. Applying these methods, two algorithms AppFull and AppSub are introduced to perform different kinds of graph search on graph databases. Comprehensive experimental studies are conducted on both real and synthetic datasets to examine various aspects of the methods for bounding graph edit distance. Result shows that these methods achieve good scalability in terms of both the number of graphs and the size of graphs. The effectiveness of these algorithms also confirms the usefulness of using our bounds in filtering and searching of graphs.
- H. A. Almohamad and S. O. Duffuaa. A linear programming approach for the weighted graph matching problem. IEEE Trans. PAMI, 15(5):522--525, 1993. Google ScholarDigital Library
- N. Augsten, M. Böhlen, and J. Gamper. Approximate matching of hierarchical data using pq-grams. In Proceedings of the 31st international conference on Very large data bases, pages 301--312, 2005. Google ScholarDigital Library
- N. Augsten, M. Böhlen, and J. Gamper. An incrementally maintainable index for approximate lookups in hierarchical data. In Proceedings of the 32nd international conference on Very large data bases, pages 247--258, 2006. Google ScholarDigital Library
- S. Berretti, A. D. Bimbo, and E. Vicario. Efficient matching and indexing of graph models in content-based retrieval. IEEE Trans. PAMI, 23(10):1089--1105, 2001. Google ScholarDigital Library
- C. Borgelt and M. R. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In ICDM '02. Google ScholarDigital Library
- J. M. Bower and H. Bolouri. Computational Modeling of Genetic and Biochemical Networks (Computational Molecular Biology). 2004. Google ScholarDigital Library
- A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web: experiments and models. In WWW '00. Google ScholarDigital Library
- H. Bunke and K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 19(3--4):255--259, 1998. Google ScholarDigital Library
- C. Chen, X. Yan, P. S. Yu, J. Han, D.-Q. Zhang, and X. Gu. Towards graph containment search and indexing. In VLDB '07. Google ScholarDigital Library
- M.-L. Fernández and G. Valiente. A graph distance metric combining maximum common subgraph and minimum common supergraph. Pattern Recognition Letters, 22(6--7):753--758, 2001. Google ScholarDigital Library
- M. R. Garey and D. S. Johnson. Computers and Intractability; A Guide to the Theory of NP-Completeness. 1990. Google ScholarDigital Library
- P. Hart, N. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. SSC, 4(2):100--107, 1968.Google Scholar
- H. He and A. Singh. Closure-tree: An index structure for graph queries. ICDE'06. Google ScholarDigital Library
- H. He, H. Wang, J. Yang, and P. S. Yu. Blinks: ranked keyword searches on graphs. In SIGMOD '07. Google ScholarDigital Library
- H. Hu, X. Y., Y. Hang, J. Han, and X. J. Zhou. Mining coherent dense subgraphs across massive biological network for functional discovery. Bioinformatics, 1(1):1--9, 2005. Google ScholarDigital Library
- D. Justice and A. Hero. A binary linear programming formulation of the graph edit distance. IEEE Trans. PAMI, 28(8):1200--1214, 2006. Google ScholarDigital Library
- V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB '05. Google ScholarDigital Library
- H. W. Kuhn. The hungarian method for the assignment problem. Naval Research Logistics, 2:83--97, 1955.Google ScholarCross Ref
- M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM '01. Google ScholarDigital Library
- B. T. Messmer and H. Bunke. A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Trans. PAMI, 20(5):493--504, 1998. Google ScholarDigital Library
- R. Myers, R. C. Wilson, and E. R. Hancock. Bayesian Graph Edit Distance. IEEE Trans. PAMI, 22(6):628--635, 2000. Google ScholarDigital Library
- M. Neuhaus, K. Riesen, and H. Bunke. Fast suboptimal algorithms for the computation of graph edit distance. In SSSPR '06. Google ScholarDigital Library
- E. G. M. Petrakis and C. Faloutsos. Similarity searching in medical image databases. IEEE Transactions on Knowledge and Data Engineering, 9(3):435--447, 1997. Google ScholarDigital Library
- J. Raymond, E. Gardiner, and P. Willett. RASCAL: Calculation of Graph Similarity using Maximum Common Edge Subgraphs. The Computer Journal, 45(6):631--644, 2002.Google ScholarCross Ref
- K. Riesen, S. Fankhauser, and H. Bunke. Speeding up graph edit distance computation with a bipartite heuristic. In MLG '07.Google Scholar
- D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS '02. Google ScholarDigital Library
- S. Srinivasa and S. Kumar. A platform based on the multi-dimensional data modal for analysis of bio-molecular structures. In VLDB '03. Google ScholarDigital Library
- Y. Tian, R. C. McEachin, C. Santos, D. J. States, and J. M. Patel. Saga: a subgraph matching tool for biological graphs. Bioinformatics, 23(2):232--239, 2007. Google ScholarDigital Library
- N. Trinajstic, J. V. Knop, W. R. Muller, K. Syzmanski, and S. Nikolic. Computational Chemical Graph Theory: Characterization, Enumeration and Generation of Chemical Structures by Computer Methods. 1991. Google ScholarDigital Library
- S. Trissl and U. Leser. Fast and practical indexing and querying of very large graphs. In SIGMOD '07. Google ScholarDigital Library
- P. Willett, J. Barnard, and G. Downs. Chemical similarity searching. J. Chem. Inf. Comput. Sci, 38(6):983--996, 1998.Google ScholarCross Ref
- R. C. Wilson and E. R. Hancock. Structural matching by discrete relaxation. IEEE Trans. PAMI, 19(6):634--648, 1997. Google ScholarDigital Library
- X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM'02. Google ScholarDigital Library
- X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In SIGMOD '04. Google ScholarDigital Library
- X. Yan, F. Zhu, P. S. Yu, and J. Han. Feature-based similarity search in graph structures. ACM TODS, 31(4), 2006. Google ScholarDigital Library
- R. Yang, P. Kalnis, and A. K. H. Tung. Similarity evaluation on tree-structured data. In SIGMOD '05. Google ScholarDigital Library
- P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: Tree + delta >= graph. In VLDB '07. Google ScholarDigital Library
Index Terms
- Comparing stars: on approximating graph edit distance
Recommendations
Stars and triangles
A graph is Sm-full if each vertex belongs to an induced star of order m. The stellarity of a vertex ν in a graph is the maximum size of an induced star centered at ν. Answering a question of Füredi, Mubayi, and West, we show that an Sm-full finite graph ...
Decomposition of complete graphs into paths and stars
Let P"k"+"1 denote a path of length k and let S"k"+"1 denote a star with k edges. As usual K"n denotes the complete graph on n vertices. In this paper we investigate the decomposition of K"n into paths and stars, and prove the following results. Theorem ...
Stars versus stripes Ramsey numbers
For given simple graphs G1,G2,,Gt, the Ramsey number R(G1,G2,,Gt) is the smallest positive integer n such that if the edges of the complete graph Kn are partitioned into t disjoint color classes giving t graphs H1,H2,,Ht, then at least one Hi has a ...
Comments