skip to main content
research-article

Comparing stars: on approximating graph edit distance

Published:01 August 2009Publication History
Skip Abstract Section

Abstract

Graph data have become ubiquitous and manipulating them based on similarity is essential for many applications. Graph edit distance is one of the most widely accepted measures to determine similarities between graphs and has extensive applications in the fields of pattern recognition, computer vision etc. Unfortunately, the problem of graph edit distance computation is NP-Hard in general. Accordingly, in this paper we introduce three novel methods to compute the upper and lower bounds for the edit distance between two graphs in polynomial time. Applying these methods, two algorithms AppFull and AppSub are introduced to perform different kinds of graph search on graph databases. Comprehensive experimental studies are conducted on both real and synthetic datasets to examine various aspects of the methods for bounding graph edit distance. Result shows that these methods achieve good scalability in terms of both the number of graphs and the size of graphs. The effectiveness of these algorithms also confirms the usefulness of using our bounds in filtering and searching of graphs.

References

  1. H. A. Almohamad and S. O. Duffuaa. A linear programming approach for the weighted graph matching problem. IEEE Trans. PAMI, 15(5):522--525, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Augsten, M. Böhlen, and J. Gamper. Approximate matching of hierarchical data using pq-grams. In Proceedings of the 31st international conference on Very large data bases, pages 301--312, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Augsten, M. Böhlen, and J. Gamper. An incrementally maintainable index for approximate lookups in hierarchical data. In Proceedings of the 32nd international conference on Very large data bases, pages 247--258, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Berretti, A. D. Bimbo, and E. Vicario. Efficient matching and indexing of graph models in content-based retrieval. IEEE Trans. PAMI, 23(10):1089--1105, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Borgelt and M. R. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In ICDM '02. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. M. Bower and H. Bolouri. Computational Modeling of Genetic and Biochemical Networks (Computational Molecular Biology). 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web: experiments and models. In WWW '00. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Bunke and K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 19(3--4):255--259, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Chen, X. Yan, P. S. Yu, J. Han, D.-Q. Zhang, and X. Gu. Towards graph containment search and indexing. In VLDB '07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M.-L. Fernández and G. Valiente. A graph distance metric combining maximum common subgraph and minimum common supergraph. Pattern Recognition Letters, 22(6--7):753--758, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. M. R. Garey and D. S. Johnson. Computers and Intractability; A Guide to the Theory of NP-Completeness. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Hart, N. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. SSC, 4(2):100--107, 1968.Google ScholarGoogle Scholar
  13. H. He and A. Singh. Closure-tree: An index structure for graph queries. ICDE'06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. He, H. Wang, J. Yang, and P. S. Yu. Blinks: ranked keyword searches on graphs. In SIGMOD '07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Hu, X. Y., Y. Hang, J. Han, and X. J. Zhou. Mining coherent dense subgraphs across massive biological network for functional discovery. Bioinformatics, 1(1):1--9, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Justice and A. Hero. A binary linear programming formulation of the graph edit distance. IEEE Trans. PAMI, 28(8):1200--1214, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB '05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. W. Kuhn. The hungarian method for the assignment problem. Naval Research Logistics, 2:83--97, 1955.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM '01. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. T. Messmer and H. Bunke. A new algorithm for error-tolerant subgraph isomorphism detection. IEEE Trans. PAMI, 20(5):493--504, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Myers, R. C. Wilson, and E. R. Hancock. Bayesian Graph Edit Distance. IEEE Trans. PAMI, 22(6):628--635, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Neuhaus, K. Riesen, and H. Bunke. Fast suboptimal algorithms for the computation of graph edit distance. In SSSPR '06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. G. M. Petrakis and C. Faloutsos. Similarity searching in medical image databases. IEEE Transactions on Knowledge and Data Engineering, 9(3):435--447, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Raymond, E. Gardiner, and P. Willett. RASCAL: Calculation of Graph Similarity using Maximum Common Edge Subgraphs. The Computer Journal, 45(6):631--644, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  25. K. Riesen, S. Fankhauser, and H. Bunke. Speeding up graph edit distance computation with a bipartite heuristic. In MLG '07.Google ScholarGoogle Scholar
  26. D. Shasha, J. T. L. Wang, and R. Giugno. Algorithmics and applications of tree and graph searching. In PODS '02. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Srinivasa and S. Kumar. A platform based on the multi-dimensional data modal for analysis of bio-molecular structures. In VLDB '03. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Tian, R. C. McEachin, C. Santos, D. J. States, and J. M. Patel. Saga: a subgraph matching tool for biological graphs. Bioinformatics, 23(2):232--239, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. N. Trinajstic, J. V. Knop, W. R. Muller, K. Syzmanski, and S. Nikolic. Computational Chemical Graph Theory: Characterization, Enumeration and Generation of Chemical Structures by Computer Methods. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Trissl and U. Leser. Fast and practical indexing and querying of very large graphs. In SIGMOD '07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. Willett, J. Barnard, and G. Downs. Chemical similarity searching. J. Chem. Inf. Comput. Sci, 38(6):983--996, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  32. R. C. Wilson and E. R. Hancock. Structural matching by discrete relaxation. IEEE Trans. PAMI, 19(6):634--648, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. X. Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM'02. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. X. Yan, P. S. Yu, and J. Han. Graph indexing: a frequent structure-based approach. In SIGMOD '04. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. X. Yan, F. Zhu, P. S. Yu, and J. Han. Feature-based similarity search in graph structures. ACM TODS, 31(4), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. Yang, P. Kalnis, and A. K. H. Tung. Similarity evaluation on tree-structured data. In SIGMOD '05. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. P. Zhao, J. X. Yu, and P. S. Yu. Graph indexing: Tree + delta >= graph. In VLDB '07. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Comparing stars: on approximating graph edit distance

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader