research-article

Mining graph patterns efficiently via randomized summaries

Authors:
Chen Chen

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

,
Cindy X. Lin

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

,
Matt Fredrikson

University of Wisconsin at Madison

University of Wisconsin at Madison
View Profile

,
Mihai Christodorescu

IBM T. J. Watson Research Center

IBM T. J. Watson Research Center
View Profile

,
Xifeng Yan

University of California at Santa Barbara

University of California at Santa Barbara
View Profile

,
Jiawei Han

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

Proceedings of the VLDB Endowment Volume 2 Issue 1pp 742–753https://doi.org/10.14778/1687627.1687711

Published:01 August 2009Publication History

Proceedings of the VLDB Endowment

Abstract

Graphs are prevalent in many domains such as Bioinformatics, social networks, Web and cyber-security. Graph pattern mining has become an important tool in the management and analysis of complexly structured data, where example applications include indexing, clustering and classification. Existing graph mining algorithms have achieved great success by exploiting various properties in the pattern space. Unfortunately, due to the fundamental role subgraph isomorphism plays in these methods, they may all enter into a pitfall when the cost to enumerate a huge set of isomorphic embeddings blows up, especially in large graphs.

The solution we propose for this problem resorts to reduction on the data space. For each graph, we build a summary of it and mine this shrunk graph instead. Compared to other data reduction techniques that either reduce the number of transactions or compress between transactions, this new framework, called Summarize-Mine, suggests a third path by compressing within transactions. Summarize-Mine is effective in cutting down the size of graphs, thus decreasing the embedding enumeration cost. However, compression might lose patterns at the same time. We address this issue by generating randomized summaries and repeating the process for multiple rounds, where the main idea is that true patterns are unlikely to miss from all rounds. We provide strict probabilistic guarantees on pattern loss likelihood. Experiments on real malware trace data show that Summarize-Mine is very efficient, which can find interesting malware fingerprints that were not revealed previously.

References

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, pages 487--499, 1994. Google ScholarDigital Library
D. Chakrabarti and C. Faloutsos. Graph mining: Laws, generators, and algorithms. ACM Computing Survey, 38(1):1--69, 2006. Google ScholarDigital Library
C. Chen, X. Yan, F. Zhu, J. Han, and P. S. Yu. Graph OLAP: Towards online analytical processing on graphs. In ICDM, pages 103--112, 2008. Google ScholarDigital Library
J. Chen, W. Hsu, M.-L. Lee, and S.-K. Ng. Nemofinder: Dissecting genome-wide protein-protein interactions with meso-scale network motifs. In KDD, pages 106--115, 2006. Google ScholarDigital Library
M. Christodorescu, S. Jha, and C. Kruegel. Mining specifications of malicious behavior. In ESEC/SIGSOFT FSE, pages 5--14, 2007. Google ScholarDigital Library
M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Transactions on Knowledge and Data Engineering, 17(8):1036--1050, 2005. Google ScholarDigital Library
M. N. Garofalakis and P. B. Gibbons. Approximate query processing: Taming the terabytes (tutorial). In VLDB, 2001. Google ScholarDigital Library
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD Conference, pages 1--12, 2000. Google ScholarDigital Library
M. A. Hasan, V. Chaoji, S. Salem, J. Besson, and M. J. Zaki. Origami: Mining representative orthogonal graph patterns. In ICDM, pages 153--162, 2007. Google ScholarDigital Library
H. He and A. K. Singh. Efficient algorithms for mining significant substructures in graphs with quality guarantees. In ICDM, pages 163--172, 2007. Google ScholarDigital Library
L. B. Holder, D. J. Cook, and S. Djoko. Substucture discovery in the subdue system. In KDD Workshop, pages 169--180, 1994.Google Scholar
J. Huan, W. Wang, J. Prins, and J. Yang. Spin: Mining maximal frequent subgraphs from graph databases. In KDD, pages 581--586, 2004. Google ScholarDigital Library
A. Inokuchi, T. Washio, and H. Motoda. Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning, 50(3):321--354, 2003. Google ScholarDigital Library
S. Kramer, L. D. Raedt, and C. Helma. Molecular feature mining in hiv data. In KDD, pages 136--143, 2001. Google ScholarDigital Library
M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM, pages 313--320, 2001. Google ScholarDigital Library
M. Kuramochi and G. Karypis. Finding frequent patterns in a large sparse graph. Data Mining and Knowledge Discovery, 11(3):243--271, 2005. Google ScholarDigital Library
A. Lachmann and M. Riedewald. Finding relevant patterns in bursty sequences. PVLDB, 1(1):78--89, 2008. Google ScholarDigital Library
J. Leskovec, J. M. Kleinberg, and C. Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. In KDD, pages 177--187, 2005. Google ScholarDigital Library
S. Navlakha, R. Rastogi, and N. Shrivastava. Graph summarization with bounded error. In SIGMOD Conference, pages 419--432, 2008. Google ScholarDigital Library
J. Pei, D. Jiang, and A. Zhang. On mining cross-graph quasi-cliques. In KDD, pages 228--238, 2005. Google ScholarDigital Library
N. Polyzotis and M. N. Garofalakis. Xsketch synopses for xml data graphs. ACM Transactions on Database Systems, 31(3):1014--1063, 2006. Google ScholarDigital Library
S. Raghavan and H. Garcia-Molina. Representing web graphs. In ICDE, pages 405--416, 2003.Google Scholar
S. Reinhardt and G. Karypis. A multi-level parallel implementation of a program for finding frequent patterns in a large sparse graph. In IPDPS, pages 1--8, 2007.Google ScholarCross Ref
T. Sarlós, A. A. Benczúr, K. Csalogány, D. Fogaras, and B. Rácz. To randomize or not to randomize: Space optimal summaries for hyperlink analysis. In WWW, pages 297--306, 2006. Google ScholarDigital Library
Y. Tian, R. A. Hankins, and J. M. Patel. Efficient aggregation for graph summarization. In SIGMOD Conference, pages 567--580, 2008. Google ScholarDigital Library
H. Toivonen. Sampling large databases for association rules. In VLDB, pages 134--145, 1996. Google ScholarDigital Library
X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In SIGMOD Conference, pages 433--444, 2008. Google ScholarDigital Library
X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. In ICDM, pages 721--724, 2002. Google ScholarDigital Library
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD Conference, pages 335--346, 2004. Google ScholarDigital Library
N. Zhang, V. Kacholia, and M. T. Özsu. A succinct physical storage scheme for efficient evaluation of path queries in xml. In ICDE, pages 54--65, 2004. Google ScholarDigital Library

Index Terms

Mining graph patterns efficiently via randomized summaries
1. Information systems
  1. Information retrieval
    1. Document representation
  2. Information systems applications
    1. Data mining
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory

Recommendations

Mining Frequent Subgraph Patterns from Uncertain Graph Data

In many real applications, graph data is subject to uncertainties due to incompleteness and imprecision of data. Mining such uncertain graph data is semantically different from and computationally more challenging than mining conventional exact graph ...
Read More
Graph colouring via the discharging method
Read More
Mining of Frequent Externally Extensible Outerplanar Graph Patterns
ICMLA '08: Proceedings of the 2008 Seventh International Conference on Machine Learning and Applications

An outerplanar graph is a planar graph which can be embedded in the plane in such a way that all of vertices lie on the outer boundary. Many chemical compounds are known to be expressed by outerplanar graphs. In this paper, firstly, we introduce an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 2, Issue 1
August 2009
1293 pages
ISSN:2150-8097
Editors:
Serge Abiteboul,
Tova Milo,
Jignesh Patel,
Philippe Rigaux
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2009
Published in pvldb Volume 2, Issue 1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 20
  Total Citations
  View Citations
- 617
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mining graph patterns efficiently via randomized summaries

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Mining Frequent Subgraph Patterns from Uncertain Graph Data

Graph colouring via the discharging method

Mining of Frequent Externally Extensible Outerplanar Graph Patterns

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Mining graph patterns efficiently via randomized summaries

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Mining Frequent Subgraph Patterns from Uncertain Graph Data

Graph colouring via the discharging method

Mining of Frequent Externally Extensible Outerplanar Graph Patterns

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media