Abstract
Recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of frequent subgraphs that are representative, discriminative or significant. The main motivation behind that is to cope with the scalability problem that the graph mining algorithms suffer when mining databases of large graphs. Another motivation is to obtain a succinct output set that is informative and useful. In the same spirit, researchers also proposed sampling based algorithms that sample the output space of the frequent patterns to obtain representative subgraphs. In this work, we propose a generic sampling framework that is based on Metropolis-Hastings algorithm to sample the output space of frequent subgraphs. Our experiments on various sampling strategies show the versatility, utility and efficiency of the proposed sampling approach.
- C. Bilgin, C. Demir, C. Nagi, and B. Yener. Cell-graph mining for breast tissue modeling and analysis. In IEEE Engineering in Medicine and Biology Society, 2007.Google Scholar
- M. Boley and H. Grosskreutz. A randomized approach for approximating the number of frequent sets. In IEEE Int'l Conf. on Data Mining, 2008. Google ScholarDigital Library
- I. Bordino, D. Donato, A. Gionis, and S. Leonardi. Mining Large Networks with Subgraph Counting. In Proc. of ICDM, 2008. Google ScholarDigital Library
- B. Bringmann, A. Zimmermann, L. Raedt, and S. Nijssen. Don't be afraid of simpler pattern. In Proc. of PKDD Conference, 2006. Google ScholarDigital Library
- C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In In Proc. of Neural Information Processing Systems (NIPS), 2006.Google Scholar
- V. Chakravarthy, V. Pandit, and Y. Sabharwal. Analysis of Sampling Techniques for Association Rule Mining. In Proc. of 12th International Conf. on Database Theory, 2009. Google ScholarDigital Library
- V. Chaoji, M. Hasan, S. Salem, J. Besson, and M. Zaki. ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns. Statistical Analysis and Data Mining, 1(2):67--84, June 2008. Google ScholarDigital Library
- V. Chaoji, M. Hasan, S. Salem, and M. Zaki. An Integrated, Generic Approach to Pattern Mining: Data Mining Template Library. Data Mining and Knowledge Discovery Journal, 17(3):457--495, 2008. Google ScholarDigital Library
- B. Chen, P. Hass, and P. Scheuermann. A new Two-Phase Sampling based Algorithm for discovering Association Rules. In SIGKDD Proceedings, pages 462--468, 2002. Google ScholarDigital Library
- R. K. Chung. Spectral Graph Theory. Americal Mathematical Society, 1997.Google Scholar
- V. Guruswami. Rapidly mixing markov chains: A comparison of techniques. Technical report, MIT Laboratory of Computer Science, 2000.Google Scholar
- M. A. Hasan and M. Zaki. Musk: Uniform sampling of k maximal patterns. In SIAM Data Mining, 2009.Google ScholarCross Ref
- J. Huan, W. Wang, D. B, J. Snoeyink, J. Prins, and A. Tropsha. Mining Protein Family Specific Residue Packing Patterns from Protein Structure Graphs. In Proc. of RECOMB, 2004. Google ScholarDigital Library
- J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM, 2003. Google ScholarDigital Library
- J. Huan, W. Wang, J. Prins, and J. Yang. SPIN: Mining Maximal Frequent Subgraphs from Graph Databases. In SIGKDD, 2004. Google ScholarDigital Library
- C. Hubler, H. Kriegel, K. Borgwardt, and Z. Ghahramani. Metropolis Algorithms for Representative Subgraph Sampling. In Proc. of ICDM, 2008. Google ScholarDigital Library
- R. Jin, M. Abu-Ata, Y. Xiang, and N. Ruan. Effective and efficient itemset pattern summarization: regression-based approaches. In KDD '08: Proc. of SIGKDD, pages 399--407, 2008. Google ScholarDigital Library
- S. Kramer, L. Raedt, and C. Helma. Molecular feature Mining in HIV data. In Proc. of SIGKDD, pages 136--143, 2001. Google ScholarDigital Library
- M. Kuramochi and G. Karypis. Frequent Subgraph Discovery. In ICDM, 2001. Google ScholarDigital Library
- L. Li, W. Fu, F. Guo, T. Mowry, and C. Faloutsos. Cut-And-Stitch: Efficient Parallel Learning of Linear Dynamical Systems on SMPs. In Proc. of SIGKDD, 2008. Google ScholarDigital Library
- S. Morishita and J. Sese. Traversing Itemset Lattice with Statistical Metric Pruning. PODS, pages 226--236, 2000. Google ScholarDigital Library
- R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995. Google ScholarDigital Library
- M. Thoma, H. Cheng, A. Gretton, J. Han, H. Kriegel, A. Smola, L. Song, P. Yu, X. Yan, and K. Borgwardt. Near-optimal supervised feature selection among frequent subgraphs. In SIAM Int'l Conf. on Data Mining, 2009.Google Scholar
- S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference. In KDD Proceedings. ACM, 2004. Google ScholarDigital Library
- R. Y. Rubinstein and D. K. Kroese. Simulation and the Monte Carlo Method, 2nd Ed. John Wiley & Sons, 2008. Google ScholarDigital Library
- A. Sinclair. Algorithms for Random Generation and Counting. BirkHauser, 1992. Google ScholarDigital Library
- P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proc. of SIGKDD, pages 32--41, 2002. Google ScholarDigital Library
- H. Toivonen. Sampling Large Databases for Association Rules. In VLDB Proceedings, pages 134--145, 1996. Google ScholarDigital Library
- J. R. Ullmann. An Algorithm for Subgraph Isomorphism. Journal of ACM, 23(1):31--42, 1976. Google ScholarDigital Library
- S. Vishwanathan, K. Borgwardt, and N. Schraudolph. Fast computation of graph kernels. In In Proc. of Neural Information Processing Systems (NIPS), 2006.Google Scholar
- C. Wang and S. Parthasarathy. Summarizing itemset patterns using probabilistic models. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 730--735. ACM, 2006. Google ScholarDigital Library
- D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In VLDB '05: Proceedings of the 31st international conference on Very large data bases, pages 709--720. VLDB Endowment, 2005. Google ScholarDigital Library
- X. Yan, H. Cheng, J. Han, and D. Xin. Summarizing Itemset Patterns: A Profile-Based Approach. In SIGKDD, 2005. Google ScholarDigital Library
- X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining Significant Graph Patterns by Leap Search. In SIGMOD Proceedings. ACM, 2008. Google ScholarDigital Library
- X. Yan and J. Han. gSpan: Graph-Based Substructure Pattern Mining. In ICDM, 2002. Google ScholarDigital Library
- X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 286--295, New York, NY, USA, 2003. ACM. Google ScholarDigital Library
- S. Zhang, X. Wu, C. Zhang, and J. Lu. Computing the Minimum-Support for Mining Frequent Patterns. Knowledge and Information Systems, 15(2):233--257, 2008. Google ScholarDigital Library
Index Terms
Output space sampling for graph patterns
Recommendations
Frequent subgraph mining on a single large graph using sampling techniques
MLG '10: Proceedings of the Eighth Workshop on Mining and Learning with GraphsFrequent subgraph mining has always been an important issue in data mining. Several frequent graph mining methods have been developed for mining graph transactions. However, these methods become less usable when the dataset is a single large graph. Also,...
Unbiased Sampling of Bipartite Graph
CYBERC '11: Proceedings of the 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge DiscoveryIncreasing size of online social networks (OSNs) has given rise to sampling method studies that provide a relatively small but representative sample of large-scale OSNs so that the measurement and analysis burden can be affordable. So far, a number of ...
Comments