research-article

Mining significant graph patterns by leap search

Authors:
Xifeng Yan

IBM T. J. Watson Research Center, Westchester, NY, USA

IBM T. J. Watson Research Center, Westchester, NY, USA
View Profile

,
Hong Cheng

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Jiawei Han

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Philip S. Yu

University of Illinois at Chicago, Chicago, IL, USA

University of Illinois at Chicago, Chicago, IL, USA
View Profile

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of dataJune 2008Pages 433–444https://doi.org/10.1145/1376616.1376662

Published:09 June 2008Publication History

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

Pages 433–444

ABSTRACT

With ever-increasing amounts of graph data from disparate sources, there has been a strong need for exploiting significant graph patterns with user-specified objective functions. Most objective functions are not antimonotonic, which could fail all of frequency-centric graph mining algorithms. In this paper, we give the first comprehensive study on general mining method aiming to find most significant patterns directly. Our new mining framework, called LEAP (Descending Leap Mine), is developed to exploit the correlation between structural similarity and significance similarity in a way that the most significant pattern could be identified quickly by searching dissimilar graph patterns. Two novel concepts, structural leap search and frequency descending mining, are proposed to support leap search in graph pattern space. Our new mining method revealed that the widely adopted branch-and-bound search in data mining literature is indeed not the best, thus sketching a new picture on scalable graph pattern discovery. Empirical results show that LEAP achieves orders of magnitude speedup in comparison with the state-of-the-art method. Furthermore, graph classifiers built on mined patterns outperform the up-to-date graph kernel method in terms of efficiency and accuracy, demonstrating the high promise of such patterns.

References

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of SIGMOD, pages 207--216, 1993.]] Google ScholarDigital Library
S. Bay and M. Pazzani. Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery, 5:213--246, 2001.]] Google ScholarDigital Library
B. Bringmann and A. Zimmermann. Tree2 - decision trees for tree structured data. In Proc. of 2005 European Symp. Principle of Data Mining and Knowledge Discovery, pages 46--58, 2005.]]Google Scholar
C. Chang and C. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~ cjlin/libsvm.]] Google ScholarDigital Library
H. Cheng, X. Yan, J. Han, and C. Hsu. Discriminative frequent pattern analysis for e®ective classification. In Proc. of ICDE, pages 716--725, 2007.]]Google Scholar
J. Cheng, Y. Ke, W. Ng, and A. Lu. Fg-index: towards veri¯cation-free query processing on graph databases. In Proc. of SIGMOD, pages 857--868, 2007.]] Google ScholarDigital Library
M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. on Knowledge and Data Engineering, 17:1036--1050, 2005.]] Google ScholarDigital Library
G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proc. of SIGKDD, pages 15--18, 1999.]] Google ScholarDigital Library
H. Fröhlich, J. Wegner, F. Sieker, and A. Zell. Optimal assignment kernels for attributed molecular graphs. In Proc. of ICDM, pages 225--232, 2005.]] Google ScholarDigital Library
M. Hasan, V. Chaoji, S. Salem, J. Besson, and M. Zaki. ORIGAMI: Mining representative orthogonal graph patterns. In Proc. of ICDM, pages 153--162, 2007.]] Google ScholarDigital Library
H. He and A. Singh. Graphrank: Statistical modeling and mining of significant subgraphs in the feature space. In Proc. of ICDM, pages 885--890, 2006.]] Google ScholarDigital Library
A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of 2000 European Symp. Principle of Data Mining and Knowledge Discovery, pages 13--23, 2000.]] Google ScholarDigital Library
M. Kamber and R. Shinghal. Evaluating the interestingness of characteristic rules. In Proc. of SIGKDD, pages 263--266, 1996.]]Google Scholar
B. Kelley, R. Sharan, R. Karp, E. Sittler, D. Root, B. Stockwell, and T. Ideker. Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci U S A, 100:11394--9, 2003.]]Google ScholarCross Ref
S. Kramer, L. Raedt, and C. Helma. Molecular feature mining in HIV data. In Proc. of SIGKDD, pages 136--143, 2001.]] Google ScholarDigital Library
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. of ICDM, pages 313--320, 2001.]] Google ScholarDigital Library
S. Morishita and J. Sese. Traversing itemset lattices with statistical metric pruning. In Proc. of SIGMOD, pages 226 -- 236, 2000.]] Google ScholarDigital Library
F. Pennerath and A. Napoli. Mining frequent most informative subgraphs. In the 5th Int. Workshop on Mining and Learning with Graphs, 2007.]]Google Scholar
G. Piatetsky-Shapiro. Discovery, analysis and presentation of strong rules. Knowledge Discovery in Databases, MIT press, pages 229--248, 1991.]]Google Scholar
T. Scheffer and S. Wrobel. Finding the most interesting patterns in a database quickly by using sequential sampling. J. of Machine Learning Research, 3:833--862, 2002.]] Google ScholarDigital Library
R. Sokal and F. Rohlf. Biometry: the principles and practice of statistics in biological research. W. H. Freeman, New York, 1994.]]Google Scholar
P. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proc. of SIGKDD, pages 32 -- 41, 2002.]] Google ScholarDigital Library
N. Wale and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. In Proc. of ICDM, pages 678--689, 2006.]] Google ScholarDigital Library
G. I. Webb. Opus: An efficient admissible algorithm for unordered search. J. of Artificial Intelligence Research, 3:431--465, 1995.]] Google ScholarDigital Library
G. I. Webb. Discovering significant patterns. Machine Learning, 68:1 -- 33, 2007.]] Google ScholarDigital Library
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In Proc. of ICDM, 2002.]] Google ScholarDigital Library
X. Yan and J. Han. CloseGraph: Mining closed frequent graph patterns. In Proc. of SIGKDD, pages 286--295, 2003.]] Google ScholarDigital Library
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In Proc. of SIGMOD, pages 335--346, 2004.]] Google ScholarDigital Library

Index Terms

Mining significant graph patterns by leap search
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Towards proximity pattern mining in large graphs
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Mining graph patterns in large networks is critical to a variety of applications such as malware detection and biological module discovery. However, frequent subgraphs are often ineffective to capture association existing in these applications, due to ...
Read More
Mining Frequent Subgraph Patterns from Uncertain Graph Data

In many real applications, graph data is subject to uncertainties due to incompleteness and imprecision of data. Mining such uncertain graph data is semantically different from and computationally more challenging than mining conventional exact graph ...
Read More
Mining hybrid sequential patterns and sequential rules

The problem addressed in this paper is to discover the frequently occurred sequential patterns from databases. Basically, the existing studies on finding sequential patterns can be roughly classified into two main categories. In the first category, the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
June 2008
1396 pages
ISBN:9781605581026
DOI:10.1145/1376616
General Chairs:
Laks V. S. Lakshmanan
University of British Columbia, Canada
,
Raymond T. Ng
University of British Columbia, Canada
,
Dennis Shasha
New York University, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 June 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification
graph
optimality
pattern
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 231
  Total Citations
  View Citations
- 1,996
  Total Downloads
- Downloads (Last 12 months)81
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mining significant graph patterns by leap search

SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards proximity pattern mining in large graphs

Mining Frequent Subgraph Patterns from Uncertain Graph Data

Mining hybrid sequential patterns and sequential rules