research-article

Frequent subgraph mining on a single large graph using sampling techniques

Authors:
Ruoyu Zou

Washington State University, Pullman, WA

Washington State University, Pullman, WA
View Profile

,
Lawrence B. Holder

Washington State University, Pullman, WA

Washington State University, Pullman, WA
View Profile

MLG '10: Proceedings of the Eighth Workshop on Mining and Learning with GraphsJuly 2010Pages 171–178https://doi.org/10.1145/1830252.1830274

Published:24 July 2010Publication History

MLG '10: Proceedings of the Eighth Workshop on Mining and Learning with Graphs

Pages 171–178

ABSTRACT

Frequent subgraph mining has always been an important issue in data mining. Several frequent graph mining methods have been developed for mining graph transactions. However, these methods become less usable when the dataset is a single large graph. Also, when the graph is too large to fit in main memory, alternative techniques are necessary to efficiently find frequent subgraphs. We investigate the task of frequent subgraph mining on a single large graph using sampling approaches and find that sampling is a feasible approach for this task. We evaluate different sampling methods and provide a novel sampling method called 'random areas selection sampling', which produces better results than all the current graph sampling approaches with customized parameters.

References

R. Agrawal and R. Srikant. Mining Sequential Patterns. In ICDE, pp: 3--14, 1995 Google ScholarDigital Library
T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arkiwa. Efficient Substructure Discovery from Large Semi-Structured Data. In SIAM, 2002.Google Scholar
R. J. Bayardo. Efficiently Mining Long Patterns from Databases. In ACM SIGMOD, pp:85--93, 1998. Google ScholarDigital Library
I. Bordino, D. Donato, A. Gionis, S. Leonardi. Mining Large Networks with Subgraph Counting. ICDM, pp: 737--742, 2008 Google ScholarDigital Library
C. Borgelt and M. R. Berthold. Mining Molecular Fragments: Finding ReleVant Substructures of Molecules. In ICDM, pp: 51--58, 2002 Google ScholarDigital Library
D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. In ICDE, pp: 443--452, 2001. Google ScholarDigital Library
M. Coatney, S. Parthasarathy. MotifMiner: Efficient Discovery of Common Substructures in Biochemical Molecules. Knowledge and Information Systems, 7(2) pp: 202--223, 2005 Google ScholarDigital Library
C. Cong, L. Yi, B. Liu and K. Wang.Discovering Frequent Substructures from Hierarchical Semi-structured Data In SDM, pp: 175--192, 2002Google Scholar
The D Statistic. http://mste.illinois.edu/patel/chisquare/dstat. htmlGoogle Scholar
M. Fiedler and C. Borgelt. Support Computation for Mining Frequent Subgraphs in a Single Graph. http://mlg07.dsi.unifi.it/pdf/40_Borgelt.pdfGoogle Scholar
P. Hintsanen and H. Toivonen. Finding reliable subgraphs from large probabilistic graphs. In DMKD, 17(1) pp: 3--23, 2008 Google ScholarDigital Library
L. Holder, D. Cook and S. Djoko. Substructure Discovery in the Subdue System. In AAAI Workshop on Knowledge Discovery in Databases, pp:!169--180, 1994Google Scholar
J. Huan, W. Wang, J. Prins. Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. In ICDM, 2003 Google ScholarDigital Library
A. Inokuchi, T. Washio and H. Motoda. An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. In PKDD, pp: 13--23, 2000. Google ScholarDigital Library
S. Jacquemont, F. Jacquenet and M. Sebban. A Lower Bound on the Sample Size Needed to Perform a Significant frequent Pattern Mining Task. In Pattern Recognition Letters, 30(11) pp: 960--967, 2009 Google ScholarDigital Library
D. Jensen and J. Neville. Correlation and Sampling in Relational Data Mining. In Proceedings of the 33^rd Symposium on the Interface of Computing Science and Statistics, 2001.Google Scholar
J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation. In SIGMOD, pp: 1--21, 2000. Google ScholarDigital Library
X. Jiang, H. Xiong, C. Wang, and A. Tan, Mining Globally Distributed Frequent Subgraphs in A Single Labeled Graph, In Data & Knowledge Discovery, 68(10), pp: 1034--1058 Google ScholarDigital Library
N. Kashtan, S. Itzkovitz, R. Milo and U. Alon. Efficient Sampling algorithm for estimating subgraph concentrations and detecting network motifs. In Bioinformatics, 20(11): pp: 1746--1758, 2004 Google ScholarDigital Library
J. Kivinen and H. Mannila. The power of Sampling in Knowledge Discovery. Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pp: 77--85, 1994 Google ScholarDigital Library
S. Kramer, L. de Raedt, and C. Helma. Molecular Feature Mining in HIV Data. In ACM SIGKDD, pp: 136--143, 2001 Google ScholarDigital Library
M. Kuramochi and G. Karypis. Frequent Subgraph Discovery. In ICDM, pp: 313--320, 2001 Google ScholarDigital Library
M. Kuramochi and G. Karypis. Finding Frequent Patterns in a Large Sparse Graph. In Data Mining and Knowledge Discovery, 11(3) pp: 243--271, 2005 Google ScholarDigital Library
S. D. Lee, D. Cheung and B. Kao. Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules. In Data Mining and Knowledge Discovery, pp: 233--262, Sep, 1998. Google ScholarDigital Library
J. Leskovec and C. Faloutsos. Sampling from large Graphs. In SIGKDD, PP: 631--636, 2006 Google ScholarDigital Library
H. Mannila, H. Toivonen, and A. I. VerKamo. Discovery of Frequent Episodes in Event Sequences. In Data Mining and Knowledge Discovery, pp: 259--289, 1997 Google ScholarDigital Library
S. Nijssen and J. N. Kok. The Gaston Tool for Frequent Subgraph Mining. In GraBaTs, pp: 77--87, 2004 Google ScholarDigital Library
C. R. Palmer, P. B. Gibbons and C. Faloutsos.ANF: A Fast and Scalable Tool for Data Mining in Massive Graphs. In SIGKDD, pp: 81--90, 2002. Google ScholarDigital Library
J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-projected Pattern Growth. In ICDE, pp:215--224, 2001 Google ScholarDigital Library
M. P. H. Stumpf, C. Wiuf, and R. W. May. Subnets of scale free networks are not scale free: Sampling properties of networks. In PNAS, v 102, 2005Google Scholar
H. Toivonen. Sampling Large Databases for Association Rules. Proceedings of the 22th International Conference on Very Large Data Bases. pp: 134--145, 1996 Google ScholarDigital Library
C. Wang, W. Wang, J. Pei, Y. Zhu and B. Shi.Scalable Mining of Large Disk-based Graph Databases. In ACM SIGKDD, pp: 316--325, 2004. Google ScholarDigital Library
W. Wang, C. Wang, Y. Zhu, B. Shi, J. Pei, X. Yan and J. Han.GraphMiner: A Structural Pattern-Mining System for Large Disk-based Graph Databases and Its Applications. In ACM SIGMOD on management of data, pp: 879--881, 2005 Google ScholarDigital Library
T. Washio and H. Motoda. State of the Art of Graph- Based Data Mining. In ACM SIGKDD Explorations Newsletter, 5(1) pp: 59--68, 2003 Google ScholarDigital Library
X. Yan and J. Han. gSpan:Graph-Based Substructure Pattern Mining, In ICDM, pp: 721--724, 2002. Google ScholarDigital Library
M. Zaki, S. Parthasarathy, W. Li and M. Ogihara. Evaluation of Sampling for Data Mining of Association Rules. In RIDE, page 42, 1997. Google ScholarDigital Library
M. Zaki and C. J. Hsiao. CHARM: An Efficient Algorithm for Closed Itemset Mining. In SIAM, pp: 457--473, 2002.Google ScholarCross Ref
M. J. Zaki. Efficiently Mining Frequent Trees in A Forest. In ACM SIGKDD, pp: 71--80, 2002 Google ScholarDigital Library

Index Terms

Frequent subgraph mining on a single large graph using sampling techniques
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

Mining Frequent Subgraph Patterns from Uncertain Graph Data

In many real applications, graph data is subject to uncertainties due to incompleteness and imprecision of data. Mining such uncertain graph data is semantically different from and computationally more challenging than mining conventional exact graph ...
Read More
Mining frequent cross-graph quasi-cliques

Joint mining of multiple datasets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in bioinformatics, jointly mining multiple gene expression datasets obtained by different ...
Read More
Distributed frequent subgraph mining on evolving graph using SPARK

Within the graph mining context, frequent subgraph identification plays a key role in retrieving required information or patterns from the huge amount of data in a short period. The problem of finding frequent items in traditional mining changed to the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MLG '10: Proceedings of the Eighth Workshop on Mining and Learning with Graphs
July 2010
185 pages
ISBN:9781450302142
DOI:10.1145/1830252
Conference Chairs:
Ulf Brefeld,
Lise Getoor,
Sofus A. Macskassy
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 July 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph mining
large graph
sampling
Qualifiers
- research-article
Conference
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 978
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Frequent subgraph mining on a single large graph using sampling techniques

MLG '10: Proceedings of the Eighth Workshop on Mining and Learning with Graphs

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining Frequent Subgraph Patterns from Uncertain Graph Data

Mining frequent cross-graph quasi-cliques

Distributed frequent subgraph mining on evolving graph using SPARK

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Frequent subgraph mining on a single large graph using sampling techniques

MLG '10: Proceedings of the Eighth Workshop on Mining and Learning with Graphs

ABSTRACT

References

Cited By

Index Terms

Recommendations

Mining Frequent Subgraph Patterns from Uncertain Graph Data

Mining frequent cross-graph quasi-cliques

Distributed frequent subgraph mining on evolving graph using SPARK

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media