skip to main content
10.1145/1376616.1376662acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Mining significant graph patterns by leap search

Authors Info & Claims
Published:09 June 2008Publication History

ABSTRACT

With ever-increasing amounts of graph data from disparate sources, there has been a strong need for exploiting significant graph patterns with user-specified objective functions. Most objective functions are not antimonotonic, which could fail all of frequency-centric graph mining algorithms. In this paper, we give the first comprehensive study on general mining method aiming to find most significant patterns directly. Our new mining framework, called LEAP (Descending Leap Mine), is developed to exploit the correlation between structural similarity and significance similarity in a way that the most significant pattern could be identified quickly by searching dissimilar graph patterns. Two novel concepts, structural leap search and frequency descending mining, are proposed to support leap search in graph pattern space. Our new mining method revealed that the widely adopted branch-and-bound search in data mining literature is indeed not the best, thus sketching a new picture on scalable graph pattern discovery. Empirical results show that LEAP achieves orders of magnitude speedup in comparison with the state-of-the-art method. Furthermore, graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.

References

  1. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of SIGMOD, pages 207--216, 1993.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Bay and M. Pazzani. Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery, 5:213--246, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Bringmann and A. Zimmermann. Tree2 - decision trees for tree structured data. In Proc. of 2005 European Symp. Principle of Data Mining and Knowledge Discovery, pages 46--58, 2005.]]Google ScholarGoogle Scholar
  4. C. Chang and C. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~ cjlin/libsvm.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Cheng, X. Yan, J. Han, and C. Hsu. Discriminative frequent pattern analysis for e®ective classification. In Proc. of ICDE, pages 716--725, 2007.]]Google ScholarGoogle Scholar
  6. J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards veri¯cation-free query processing on graph databases. In Proc. of SIGMOD, pages 857--868, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. on Knowledge and Data Engineering, 17:1036--1050, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proc. of SIGKDD, pages 15--18, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Fröhlich, J. Wegner, F. Sieker, and A. Zell. Optimal assignment kernels for attributed molecular graphs. In Proc. of ICDM, pages 225--232, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Hasan, V. Chaoji, S. Salem, J. Besson, and M. Zaki. ORIGAMI: Mining representative orthogonal graph patterns. In Proc. of ICDM, pages 153--162, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. He and A. Singh. Graphrank: Statistical modeling and mining of significant subgraphs in the feature space. In Proc. of ICDM, pages 885--890, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of 2000 European Symp. Principle of Data Mining and Knowledge Discovery, pages 13--23, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Kamber and R. Shinghal. Evaluating the interestingness of characteristic rules. In Proc. of SIGKDD, pages 263--266, 1996.]]Google ScholarGoogle Scholar
  14. B. Kelley, R. Sharan, R. Karp, E. Sittler, D. Root, B. Stockwell, and T. Ideker. Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci U S A, 100:11394--9, 2003.]]Google ScholarGoogle ScholarCross RefCross Ref
  15. S. Kramer, L. Raedt, and C. Helma. Molecular feature mining in HIV data. In Proc. of SIGKDD, pages 136--143, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of ICDM, pages 313--320, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Morishita and J. Sese. Traversing itemset lattices with statistical metric pruning. In Proc. of SIGMOD, pages 226 -- 236, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Pennerath and A. Napoli. Mining frequent most informative subgraphs. In the 5th Int. Workshop on Mining and Learning with Graphs, 2007.]]Google ScholarGoogle Scholar
  19. G. Piatetsky-Shapiro. Discovery, analysis and presentation of strong rules. Knowledge Discovery in Databases, MIT press, pages 229--248, 1991.]]Google ScholarGoogle Scholar
  20. T. Scheffer and S. Wrobel. Finding the most interesting patterns in a database quickly by using sequential sampling. J. of Machine Learning Research, 3:833--862, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Sokal and F. Rohlf. Biometry: the principles and practice of statistics in biological research. W. H. Freeman, New York, 1994.]]Google ScholarGoogle Scholar
  22. P. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proc. of SIGKDD, pages 32 -- 41, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. In Proc. of ICDM, pages 678--689, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. I. Webb. Opus: An efficient admissible algorithm for unordered search. J. of Artificial Intelligence Research, 3:431--465, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. I. Webb. Discovering significant patterns. Machine Learning, 68:1 -- 33, 2007.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. of ICDM, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. X. Yan and J. Han. CloseGraph: Mining closed frequent graph patterns. In Proc. of SIGKDD, pages 286--295, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In Proc. of SIGMOD, pages 335--346, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mining significant graph patterns by leap search

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
      June 2008
      1396 pages
      ISBN:9781605581026
      DOI:10.1145/1376616

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 June 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader