skip to main content
research-article

Output space sampling for graph patterns

Published:01 August 2009Publication History
Skip Abstract Section

Abstract

Recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of frequent subgraphs that are representative, discriminative or significant. The main motivation behind that is to cope with the scalability problem that the graph mining algorithms suffer when mining databases of large graphs. Another motivation is to obtain a succinct output set that is informative and useful. In the same spirit, researchers also proposed sampling based algorithms that sample the output space of the frequent patterns to obtain representative subgraphs. In this work, we propose a generic sampling framework that is based on Metropolis-Hastings algorithm to sample the output space of frequent subgraphs. Our experiments on various sampling strategies show the versatility, utility and efficiency of the proposed sampling approach.

References

  1. C. Bilgin, C. Demir, C. Nagi, and B. Yener. Cell-graph mining for breast tissue modeling and analysis. In IEEE Engineering in Medicine and Biology Society, 2007.Google ScholarGoogle Scholar
  2. M. Boley and H. Grosskreutz. A randomized approach for approximating the number of frequent sets. In IEEE Int'l Conf. on Data Mining, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. I. Bordino, D. Donato, A. Gionis, and S. Leonardi. Mining Large Networks with Subgraph Counting. In Proc. of ICDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Bringmann, A. Zimmermann, L. Raedt, and S. Nijssen. Don't be afraid of simpler pattern. In Proc. of PKDD Conference, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Chu, S. Kim, Y. Lin, Y. Yu, G. Bradski, A. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In In Proc. of Neural Information Processing Systems (NIPS), 2006.Google ScholarGoogle Scholar
  6. V. Chakravarthy, V. Pandit, and Y. Sabharwal. Analysis of Sampling Techniques for Association Rule Mining. In Proc. of 12th International Conf. on Database Theory, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. Chaoji, M. Hasan, S. Salem, J. Besson, and M. Zaki. ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns. Statistical Analysis and Data Mining, 1(2):67--84, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. Chaoji, M. Hasan, S. Salem, and M. Zaki. An Integrated, Generic Approach to Pattern Mining: Data Mining Template Library. Data Mining and Knowledge Discovery Journal, 17(3):457--495, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Chen, P. Hass, and P. Scheuermann. A new Two-Phase Sampling based Algorithm for discovering Association Rules. In SIGKDD Proceedings, pages 462--468, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. K. Chung. Spectral Graph Theory. Americal Mathematical Society, 1997.Google ScholarGoogle Scholar
  11. V. Guruswami. Rapidly mixing markov chains: A comparison of techniques. Technical report, MIT Laboratory of Computer Science, 2000.Google ScholarGoogle Scholar
  12. M. A. Hasan and M. Zaki. Musk: Uniform sampling of k maximal patterns. In SIAM Data Mining, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  13. J. Huan, W. Wang, D. B, J. Snoeyink, J. Prins, and A. Tropsha. Mining Protein Family Specific Residue Packing Patterns from Protein Structure Graphs. In Proc. of RECOMB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Huan, W. Wang, and J. Prins. Efficient mining of frequent subgraphs in the presence of isomorphism. In ICDM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Huan, W. Wang, J. Prins, and J. Yang. SPIN: Mining Maximal Frequent Subgraphs from Graph Databases. In SIGKDD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Hubler, H. Kriegel, K. Borgwardt, and Z. Ghahramani. Metropolis Algorithms for Representative Subgraph Sampling. In Proc. of ICDM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Jin, M. Abu-Ata, Y. Xiang, and N. Ruan. Effective and efficient itemset pattern summarization: regression-based approaches. In KDD '08: Proc. of SIGKDD, pages 399--407, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Kramer, L. Raedt, and C. Helma. Molecular feature Mining in HIV data. In Proc. of SIGKDD, pages 136--143, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Kuramochi and G. Karypis. Frequent Subgraph Discovery. In ICDM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Li, W. Fu, F. Guo, T. Mowry, and C. Faloutsos. Cut-And-Stitch: Efficient Parallel Learning of Linear Dynamical Systems on SMPs. In Proc. of SIGKDD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Morishita and J. Sese. Traversing Itemset Lattice with Statistical Metric Pruning. PODS, pages 226--236, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Thoma, H. Cheng, A. Gretton, J. Han, H. Kriegel, A. Smola, L. Song, P. Yu, X. Yan, and K. Borgwardt. Near-optimal supervised feature selection among frequent subgraphs. In SIAM Int'l Conf. on Data Mining, 2009.Google ScholarGoogle Scholar
  24. S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference. In KDD Proceedings. ACM, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Y. Rubinstein and D. K. Kroese. Simulation and the Monte Carlo Method, 2nd Ed. John Wiley & Sons, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Sinclair. Algorithms for Random Generation and Counting. BirkHauser, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the right interestingness measure for association patterns. In Proc. of SIGKDD, pages 32--41, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Toivonen. Sampling Large Databases for Association Rules. In VLDB Proceedings, pages 134--145, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. R. Ullmann. An Algorithm for Subgraph Isomorphism. Journal of ACM, 23(1):31--42, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Vishwanathan, K. Borgwardt, and N. Schraudolph. Fast computation of graph kernels. In In Proc. of Neural Information Processing Systems (NIPS), 2006.Google ScholarGoogle Scholar
  31. C. Wang and S. Parthasarathy. Summarizing itemset patterns using probabilistic models. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 730--735. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In VLDB '05: Proceedings of the 31st international conference on Very large data bases, pages 709--720. VLDB Endowment, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. X. Yan, H. Cheng, J. Han, and D. Xin. Summarizing Itemset Patterns: A Profile-Based Approach. In SIGKDD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining Significant Graph Patterns by Leap Search. In SIGMOD Proceedings. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. X. Yan and J. Han. gSpan: Graph-Based Substructure Pattern Mining. In ICDM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. X. Yan and J. Han. Closegraph: mining closed frequent graph patterns. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 286--295, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Zhang, X. Wu, C. Zhang, and J. Lu. Computing the Minimum-Support for Mining Frequent Patterns. Knowledge and Information Systems, 15(2):233--257, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Output space sampling for graph patterns

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader