ABSTRACT
Frequent subgraph mining has always been an important issue in data mining. Several frequent graph mining methods have been developed for mining graph transactions. However, these methods become less usable when the dataset is a single large graph. Also, when the graph is too large to fit in main memory, alternative techniques are necessary to efficiently find frequent subgraphs. We investigate the task of frequent subgraph mining on a single large graph using sampling approaches and find that sampling is a feasible approach for this task. We evaluate different sampling methods and provide a novel sampling method called 'random areas selection sampling', which produces better results than all the current graph sampling approaches with customized parameters.
- R. Agrawal and R. Srikant. Mining Sequential Patterns. In ICDE, pp: 3--14, 1995 Google ScholarDigital Library
- T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arkiwa. Efficient Substructure Discovery from Large Semi-Structured Data. In SIAM, 2002.Google Scholar
- R. J. Bayardo. Efficiently Mining Long Patterns from Databases. In ACM SIGMOD, pp:85--93, 1998. Google ScholarDigital Library
- I. Bordino, D. Donato, A. Gionis, S. Leonardi. Mining Large Networks with Subgraph Counting. ICDM, pp: 737--742, 2008 Google ScholarDigital Library
- C. Borgelt and M. R. Berthold. Mining Molecular Fragments: Finding ReleVant Substructures of Molecules. In ICDM, pp: 51--58, 2002 Google ScholarDigital Library
- D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. In ICDE, pp: 443--452, 2001. Google ScholarDigital Library
- M. Coatney, S. Parthasarathy. MotifMiner: Efficient Discovery of Common Substructures in Biochemical Molecules. Knowledge and Information Systems, 7(2) pp: 202--223, 2005 Google ScholarDigital Library
- C. Cong, L. Yi, B. Liu and K. Wang.Discovering Frequent Substructures from Hierarchical Semi-structured Data In SDM, pp: 175--192, 2002Google Scholar
- The D Statistic. http://mste.illinois.edu/patel/chisquare/dstat. htmlGoogle Scholar
- M. Fiedler and C. Borgelt. Support Computation for Mining Frequent Subgraphs in a Single Graph. http://mlg07.dsi.unifi.it/pdf/40_Borgelt.pdfGoogle Scholar
- P. Hintsanen and H. Toivonen. Finding reliable subgraphs from large probabilistic graphs. In DMKD, 17(1) pp: 3--23, 2008 Google ScholarDigital Library
- L. Holder, D. Cook and S. Djoko. Substructure Discovery in the Subdue System. In AAAI Workshop on Knowledge Discovery in Databases, pp:!169--180, 1994Google Scholar
- J. Huan, W. Wang, J. Prins. Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. In ICDM, 2003 Google ScholarDigital Library
- A. Inokuchi, T. Washio and H. Motoda. An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. In PKDD, pp: 13--23, 2000. Google ScholarDigital Library
- S. Jacquemont, F. Jacquenet and M. Sebban. A Lower Bound on the Sample Size Needed to Perform a Significant frequent Pattern Mining Task. In Pattern Recognition Letters, 30(11) pp: 960--967, 2009 Google ScholarDigital Library
- D. Jensen and J. Neville. Correlation and Sampling in Relational Data Mining. In Proceedings of the 33rd Symposium on the Interface of Computing Science and Statistics, 2001.Google Scholar
- J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation. In SIGMOD, pp: 1--21, 2000. Google ScholarDigital Library
- X. Jiang, H. Xiong, C. Wang, and A. Tan, Mining Globally Distributed Frequent Subgraphs in A Single Labeled Graph, In Data & Knowledge Discovery, 68(10), pp: 1034--1058 Google ScholarDigital Library
- N. Kashtan, S. Itzkovitz, R. Milo and U. Alon. Efficient Sampling algorithm for estimating subgraph concentrations and detecting network motifs. In Bioinformatics, 20(11): pp: 1746--1758, 2004 Google ScholarDigital Library
- J. Kivinen and H. Mannila. The power of Sampling in Knowledge Discovery. Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pp: 77--85, 1994 Google ScholarDigital Library
- S. Kramer, L. de Raedt, and C. Helma. Molecular Feature Mining in HIV Data. In ACM SIGKDD, pp: 136--143, 2001 Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Frequent Subgraph Discovery. In ICDM, pp: 313--320, 2001 Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Finding Frequent Patterns in a Large Sparse Graph. In Data Mining and Knowledge Discovery, 11(3) pp: 243--271, 2005 Google ScholarDigital Library
- S. D. Lee, D. Cheung and B. Kao. Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules. In Data Mining and Knowledge Discovery, pp: 233--262, Sep, 1998. Google ScholarDigital Library
- J. Leskovec and C. Faloutsos. Sampling from large Graphs. In SIGKDD, PP: 631--636, 2006 Google ScholarDigital Library
- H. Mannila, H. Toivonen, and A. I. VerKamo. Discovery of Frequent Episodes in Event Sequences. In Data Mining and Knowledge Discovery, pp: 259--289, 1997 Google ScholarDigital Library
- S. Nijssen and J. N. Kok. The Gaston Tool for Frequent Subgraph Mining. In GraBaTs, pp: 77--87, 2004 Google ScholarDigital Library
- C. R. Palmer, P. B. Gibbons and C. Faloutsos.ANF: A Fast and Scalable Tool for Data Mining in Massive Graphs. In SIGKDD, pp: 81--90, 2002. Google ScholarDigital Library
- J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-projected Pattern Growth. In ICDE, pp:215--224, 2001 Google ScholarDigital Library
- M. P. H. Stumpf, C. Wiuf, and R. W. May. Subnets of scale free networks are not scale free: Sampling properties of networks. In PNAS, v 102, 2005Google Scholar
- H. Toivonen. Sampling Large Databases for Association Rules. Proceedings of the 22th International Conference on Very Large Data Bases. pp: 134--145, 1996 Google ScholarDigital Library
- C. Wang, W. Wang, J. Pei, Y. Zhu and B. Shi.Scalable Mining of Large Disk-based Graph Databases. In ACM SIGKDD, pp: 316--325, 2004. Google ScholarDigital Library
- W. Wang, C. Wang, Y. Zhu, B. Shi, J. Pei, X. Yan and J. Han.GraphMiner: A Structural Pattern-Mining System for Large Disk-based Graph Databases and Its Applications. In ACM SIGMOD on management of data, pp: 879--881, 2005 Google ScholarDigital Library
- T. Washio and H. Motoda. State of the Art of Graph- Based Data Mining. In ACM SIGKDD Explorations Newsletter, 5(1) pp: 59--68, 2003 Google ScholarDigital Library
- X. Yan and J. Han. gSpan:Graph-Based Substructure Pattern Mining, In ICDM, pp: 721--724, 2002. Google ScholarDigital Library
- M. Zaki, S. Parthasarathy, W. Li and M. Ogihara. Evaluation of Sampling for Data Mining of Association Rules. In RIDE, page 42, 1997. Google ScholarDigital Library
- M. Zaki and C. J. Hsiao. CHARM: An Efficient Algorithm for Closed Itemset Mining. In SIAM, pp: 457--473, 2002.Google ScholarCross Ref
- M. J. Zaki. Efficiently Mining Frequent Trees in A Forest. In ACM SIGKDD, pp: 71--80, 2002 Google ScholarDigital Library
Index Terms
- Frequent subgraph mining on a single large graph using sampling techniques
Recommendations
Mining Frequent Subgraph Patterns from Uncertain Graph Data
In many real applications, graph data is subject to uncertainties due to incompleteness and imprecision of data. Mining such uncertain graph data is semantically different from and computationally more challenging than mining conventional exact graph ...
Mining frequent cross-graph quasi-cliques
Joint mining of multiple datasets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in bioinformatics, jointly mining multiple gene expression datasets obtained by different ...
Distributed frequent subgraph mining on evolving graph using SPARK
Within the graph mining context, frequent subgraph identification plays a key role in retrieving required information or patterns from the huge amount of data in a short period. The problem of finding frequent items in traditional mining changed to the ...
Comments