skip to main content
10.1145/1830252.1830274acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Frequent subgraph mining on a single large graph using sampling techniques

Published:24 July 2010Publication History

ABSTRACT

Frequent subgraph mining has always been an important issue in data mining. Several frequent graph mining methods have been developed for mining graph transactions. However, these methods become less usable when the dataset is a single large graph. Also, when the graph is too large to fit in main memory, alternative techniques are necessary to efficiently find frequent subgraphs. We investigate the task of frequent subgraph mining on a single large graph using sampling approaches and find that sampling is a feasible approach for this task. We evaluate different sampling methods and provide a novel sampling method called 'random areas selection sampling', which produces better results than all the current graph sampling approaches with customized parameters.

References

  1. R. Agrawal and R. Srikant. Mining Sequential Patterns. In ICDE, pp: 3--14, 1995 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. Asai, K. Abe, S. Kawasoe, H. Arimura, H. Satamoto, and S. Arkiwa. Efficient Substructure Discovery from Large Semi-Structured Data. In SIAM, 2002.Google ScholarGoogle Scholar
  3. R. J. Bayardo. Efficiently Mining Long Patterns from Databases. In ACM SIGMOD, pp:85--93, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. Bordino, D. Donato, A. Gionis, S. Leonardi. Mining Large Networks with Subgraph Counting. ICDM, pp: 737--742, 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Borgelt and M. R. Berthold. Mining Molecular Fragments: Finding ReleVant Substructures of Molecules. In ICDM, pp: 51--58, 2002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. In ICDE, pp: 443--452, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Coatney, S. Parthasarathy. MotifMiner: Efficient Discovery of Common Substructures in Biochemical Molecules. Knowledge and Information Systems, 7(2) pp: 202--223, 2005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Cong, L. Yi, B. Liu and K. Wang.Discovering Frequent Substructures from Hierarchical Semi-structured Data In SDM, pp: 175--192, 2002Google ScholarGoogle Scholar
  9. The D Statistic. http://mste.illinois.edu/patel/chisquare/dstat. htmlGoogle ScholarGoogle Scholar
  10. M. Fiedler and C. Borgelt. Support Computation for Mining Frequent Subgraphs in a Single Graph. http://mlg07.dsi.unifi.it/pdf/40_Borgelt.pdfGoogle ScholarGoogle Scholar
  11. P. Hintsanen and H. Toivonen. Finding reliable subgraphs from large probabilistic graphs. In DMKD, 17(1) pp: 3--23, 2008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Holder, D. Cook and S. Djoko. Substructure Discovery in the Subdue System. In AAAI Workshop on Knowledge Discovery in Databases, pp:!169--180, 1994Google ScholarGoogle Scholar
  13. J. Huan, W. Wang, J. Prins. Efficient Mining of Frequent Subgraphs in the Presence of Isomorphism. In ICDM, 2003 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Inokuchi, T. Washio and H. Motoda. An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. In PKDD, pp: 13--23, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Jacquemont, F. Jacquenet and M. Sebban. A Lower Bound on the Sample Size Needed to Perform a Significant frequent Pattern Mining Task. In Pattern Recognition Letters, 30(11) pp: 960--967, 2009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Jensen and J. Neville. Correlation and Sampling in Relational Data Mining. In Proceedings of the 33rd Symposium on the Interface of Computing Science and Statistics, 2001.Google ScholarGoogle Scholar
  17. J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation. In SIGMOD, pp: 1--21, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. Jiang, H. Xiong, C. Wang, and A. Tan, Mining Globally Distributed Frequent Subgraphs in A Single Labeled Graph, In Data & Knowledge Discovery, 68(10), pp: 1034--1058 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. N. Kashtan, S. Itzkovitz, R. Milo and U. Alon. Efficient Sampling algorithm for estimating subgraph concentrations and detecting network motifs. In Bioinformatics, 20(11): pp: 1746--1758, 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Kivinen and H. Mannila. The power of Sampling in Knowledge Discovery. Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pp: 77--85, 1994 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Kramer, L. de Raedt, and C. Helma. Molecular Feature Mining in HIV Data. In ACM SIGKDD, pp: 136--143, 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Kuramochi and G. Karypis. Frequent Subgraph Discovery. In ICDM, pp: 313--320, 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Kuramochi and G. Karypis. Finding Frequent Patterns in a Large Sparse Graph. In Data Mining and Knowledge Discovery, 11(3) pp: 243--271, 2005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. D. Lee, D. Cheung and B. Kao. Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules. In Data Mining and Knowledge Discovery, pp: 233--262, Sep, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Leskovec and C. Faloutsos. Sampling from large Graphs. In SIGKDD, PP: 631--636, 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Mannila, H. Toivonen, and A. I. VerKamo. Discovery of Frequent Episodes in Event Sequences. In Data Mining and Knowledge Discovery, pp: 259--289, 1997 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Nijssen and J. N. Kok. The Gaston Tool for Frequent Subgraph Mining. In GraBaTs, pp: 77--87, 2004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. R. Palmer, P. B. Gibbons and C. Faloutsos.ANF: A Fast and Scalable Tool for Data Mining in Massive Graphs. In SIGKDD, pp: 81--90, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-projected Pattern Growth. In ICDE, pp:215--224, 2001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. P. H. Stumpf, C. Wiuf, and R. W. May. Subnets of scale free networks are not scale free: Sampling properties of networks. In PNAS, v 102, 2005Google ScholarGoogle Scholar
  31. H. Toivonen. Sampling Large Databases for Association Rules. Proceedings of the 22th International Conference on Very Large Data Bases. pp: 134--145, 1996 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Wang, W. Wang, J. Pei, Y. Zhu and B. Shi.Scalable Mining of Large Disk-based Graph Databases. In ACM SIGKDD, pp: 316--325, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. W. Wang, C. Wang, Y. Zhu, B. Shi, J. Pei, X. Yan and J. Han.GraphMiner: A Structural Pattern-Mining System for Large Disk-based Graph Databases and Its Applications. In ACM SIGMOD on management of data, pp: 879--881, 2005 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. Washio and H. Motoda. State of the Art of Graph- Based Data Mining. In ACM SIGKDD Explorations Newsletter, 5(1) pp: 59--68, 2003 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. X. Yan and J. Han. gSpan:Graph-Based Substructure Pattern Mining, In ICDM, pp: 721--724, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Zaki, S. Parthasarathy, W. Li and M. Ogihara. Evaluation of Sampling for Data Mining of Association Rules. In RIDE, page 42, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Zaki and C. J. Hsiao. CHARM: An Efficient Algorithm for Closed Itemset Mining. In SIAM, pp: 457--473, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  38. M. J. Zaki. Efficiently Mining Frequent Trees in A Forest. In ACM SIGKDD, pp: 71--80, 2002 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Frequent subgraph mining on a single large graph using sampling techniques

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MLG '10: Proceedings of the Eighth Workshop on Mining and Learning with Graphs
      July 2010
      185 pages
      ISBN:9781450302142
      DOI:10.1145/1830252

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 July 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader