Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 3/2005

01.11.2005

Finding Frequent Patterns in a Large Sparse Graph*

verfasst von: Michihiro Kuramochi, George Karypis

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 3/2005

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Graph-based modeling has emerged as a powerful abstraction capable of capturing in a single and unified framework many of the relational, spatial, topological, and other characteristics that are present in a variety of datasets and application areas. Computationally efficient algorithms that find patterns corresponding to frequently occurring subgraphs play an important role in developing data mining-driven methodologies for analyzing the graphs resulting from such datasets. This paper presents two algorithms, based on the horizontal and vertical pattern discovery paradigms, that find the connected subgraphs that have a sufficient number of edge-disjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods for determining the number of edge-disjoint embeddings of a subgraph and employ novel algorithms for candidate generation and frequency counting, which allow them to operate on datasets with different characteristics and to quickly prune unpromising subgraphs. Experimental evaluation on real datasets from various domains show that both algorithms achieve good performance, scale well to sparse input graphs with more than 120,000 vertices or 110,000 edges, and significantly outperform previously developed algorithms.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The symbol v i in the figure is a vertex ID, not a vertex label, and blank elements in the adjacency matrix means there is no edge between the corresponding pair of vertices.
 
2
SiGraM stands for Single Graph Miner.
 
3
http://cygnus.uta.edu/subdue/databases/index.html
 
4
http://www.cs.cornell.edu/projects/kddcup/datasets.html
 
5
DTP 2D and 3D Structural Information. http://dtp.nci.nih.gov/docs/3d_database/structural_information/structural_data.html
 
6
http://dip.doe-mbi.ucla.edu/
 
7
http://vlsicad.cs.ucla.edu/~cheese/ispd98.html
 
8
http://www.google.com/programming-contest/
 
9
Although this version is not the latest one, it runs significantly faster than the current latest version, 5.0.8.
 
10
ftp://ftp.comlab.ox.ac.uk/pub/Packages/ILP/Datasets/carcinogenesis/progol/carcinogenesis.tar.Z
 
Literatur
Zurück zum Zitat Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. 1998. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. of 1998 ACM SIGMOD International Conference on Management of Data, pp. 94–105. Agrawal, R., Gehrke, J., Gunopulos, D., and Raghavan, P. 1998. Automatic subspace clustering of high dimensional data for data mining applications. In Proc. of 1998 ACM SIGMOD International Conference on Management of Data, pp. 94–105.
Zurück zum Zitat Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proc. of the 20th International Conference on Very Large Data Bases (VLDB), J.B. Bocca, M. Jarke, and C. Zaniolo (Eds.), Morgan Kaufmann, pp. 487–499. Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proc. of the 20th International Conference on Very Large Data Bases (VLDB), J.B. Bocca, M. Jarke, and C. Zaniolo (Eds.), Morgan Kaufmann, pp. 487–499.
Zurück zum Zitat Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proc. of the 11th International Conference on Data Engineering (ICDE), P.S. Yu and A.L.P. Chen (Eds.), IEEE Press, pp. 3–14. Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proc. of the 11th International Conference on Data Engineering (ICDE), P.S. Yu and A.L.P. Chen (Eds.), IEEE Press, pp. 3–14.
Zurück zum Zitat Berendt, B., Hotho, A., and Stumme, G. 2002. Towards semantic web mining. In International Semantic Web Conference (ISWC), pp. 264–278. Berendt, B., Hotho, A., and Stumme, G. 2002. Towards semantic web mining. In International Semantic Web Conference (ISWC), pp. 264–278.
Zurück zum Zitat Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H. et al. 2000. The protein data bank. Nucleic Acids Research, 28:235–242.CrossRefPubMed Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H. et al. 2000. The protein data bank. Nucleic Acids Research, 28:235–242.CrossRefPubMed
Zurück zum Zitat Berman, P. and Fujito, T. 1995. On the approximation properties of independent set problem in degree 3 graphs. In Proc. of Workshop on Algorithms and Data Structures, pp. 449–460. Berman, P. and Fujito, T. 1995. On the approximation properties of independent set problem in degree 3 graphs. In Proc. of Workshop on Algorithms and Data Structures, pp. 449–460.
Zurück zum Zitat Blake, C.L. and Merz, C.J. 1998. UCI Repository of Machine Learning Databases. Blake, C.L. and Merz, C.J. 1998. UCI Repository of Machine Learning Databases.
Zurück zum Zitat Borgelt, C. and Berthold, M.R. 2002. Mining molecular fragments: Finding relevant substructures of molecules. In Proc. of 2002 IEEE International Conference on Data Mining (ICDM), pp. 51–58. Borgelt, C. and Berthold, M.R. 2002. Mining molecular fragments: Finding relevant substructures of molecules. In Proc. of 2002 IEEE International Conference on Data Mining (ICDM), pp. 51–58.
Zurück zum Zitat Chew, L.P., Huttenlocher, D., Kedem, K., and Kleinberg, J. 1999. Fast detection of common geometric substructure in proteins. In Proc. of the 3rd ACM RECOMB International Conference on Computational Molecular Biology, pp. 104–114. Chew, L.P., Huttenlocher, D., Kedem, K., and Kleinberg, J. 1999. Fast detection of common geometric substructure in proteins. In Proc. of the 3rd ACM RECOMB International Conference on Computational Molecular Biology, pp. 104–114.
Zurück zum Zitat Cohen, M. and Gudes, E. 2004. Diagonally subgraphs pattern mining. In Proc. of the 9th ACM SIGMOD workshop on research issues in Data Mining and Knowledge Discovery DMKD '04, pp. 51–58. Cohen, M. and Gudes, E. 2004. Diagonally subgraphs pattern mining. In Proc. of the 9th ACM SIGMOD workshop on research issues in Data Mining and Knowledge Discovery DMKD '04, pp. 51–58.
Zurück zum Zitat Cook, D.J. and Holder, L.B. 1994. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1:231–255. Cook, D.J. and Holder, L.B. 1994. Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1:231–255.
Zurück zum Zitat Cook, D.J. and Holder, L.B. 2000. Graph-based data mining. IEEE Intelligent Systems, 15(2):32–41.CrossRef Cook, D.J. and Holder, L.B. 2000. Graph-based data mining. IEEE Intelligent Systems, 15(2):32–41.CrossRef
Zurück zum Zitat Cook, D.J., Holder, L.B., and Djoko, S. 1995. Knowledge discovery from structural data. Journal of Intelligent Information Systems, 5(3):229–245.CrossRef Cook, D.J., Holder, L.B., and Djoko, S. 1995. Knowledge discovery from structural data. Journal of Intelligent Information Systems, 5(3):229–245.CrossRef
Zurück zum Zitat Dehaspe, L., Toivonen, H., and King, R.D. 1998. Finding frequent substructures in chemical compounds. In Proc. of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-98), R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro (Eds.), AAAI Press, pp. 30–36. Dehaspe, L., Toivonen, H., and King, R.D. 1998. Finding frequent substructures in chemical compounds. In Proc. of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-98), R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro (Eds.), AAAI Press, pp. 30–36.
Zurück zum Zitat De Raedt, L. and Kramer, S. 2001. The level-wise version space algorithm and its application to molecular fragment finding. In Proc. of the 17th International Joint Conference on Artificial Intelligence (IJCAI-01), pp. 853–862. De Raedt, L. and Kramer, S. 2001. The level-wise version space algorithm and its application to molecular fragment finding. In Proc. of the 17th International Joint Conference on Artificial Intelligence (IJCAI-01), pp. 853–862.
Zurück zum Zitat Deshpande, M., Kuramochi, M., and Karypis, G. 2002. Automated approaches for classifying structures. In Proc. of the 2nd Workshop on Data Mining in Bioinformatics (BIOKDD '02) pp. 11–18. Deshpande, M., Kuramochi, M., and Karypis, G. 2002. Automated approaches for classifying structures. In Proc. of the 2nd Workshop on Data Mining in Bioinformatics (BIOKDD '02) pp. 11–18.
Zurück zum Zitat Deshpande, M., Kuramochi, M., and Karypis, G. 2003. Frequent sub-structure based approaches for classifying chemical compounds. In Proc. of 2003 IEEE International Conference on Data Mining (ICDM), pp. 35–42. Deshpande, M., Kuramochi, M., and Karypis, G. 2003. Frequent sub-structure based approaches for classifying chemical compounds. In Proc. of 2003 IEEE International Conference on Data Mining (ICDM), pp. 35–42.
Zurück zum Zitat Feige, U., Goldwasser, S., Lovasz, L., Safra, S., and Szegedy, M. 1991. Approximating clique is almost NP-complete. In Proc. of the 32nd IEEE Symposium on Foundations of Computer Science (FOCS), pp. 2–12. Feige, U., Goldwasser, S., Lovasz, L., Safra, S., and Szegedy, M. 1991. Approximating clique is almost NP-complete. In Proc. of the 32nd IEEE Symposium on Foundations of Computer Science (FOCS), pp. 2–12.
Zurück zum Zitat Fortin, S. 1996. The Graph Isomorphism Problem (Tech. Rep. No. TR96-20). Department of Computing Science, University of Alberta. Fortin, S. 1996. The Graph Isomorphism Problem (Tech. Rep. No. TR96-20). Department of Computing Science, University of Alberta.
Zurück zum Zitat Garey, M.R. and Johnson, D.S. 1979. Computers and Intractability: A Guide to the Theory of np-completeness. New York: W. H. Freeman and Company.MATH Garey, M.R. and Johnson, D.S. 1979. Computers and Intractability: A Guide to the Theory of np-completeness. New York: W. H. Freeman and Company.MATH
Zurück zum Zitat Ghazizadeh, S. and Chawathe, S. 2002a. Discovering Freuqent Structures Using Summaries (Tech. Rep. No. CS-TR-4364). Department of Computer Science, University of Maryland. Ghazizadeh, S. and Chawathe, S. 2002a. Discovering Freuqent Structures Using Summaries (Tech. Rep. No. CS-TR-4364). Department of Computer Science, University of Maryland.
Zurück zum Zitat Ghazizadeh, S. and Chawathe, S. 2002b. SEuS: Structure extraction using summaries. In Proc. of the 5th International Conference on Discovery Science, pp. 71–85. Ghazizadeh, S. and Chawathe, S. 2002b. SEuS: Structure extraction using summaries. In Proc. of the 5th International Conference on Discovery Science, pp. 71–85.
Zurück zum Zitat Gonzalez, J., Holder, L.B. and Cook, D.J. 2001. Application of graph-based concept learning to the predictive toxicology domain. In Proc. of the Predictive Toxicology Challenge Workshop. Gonzalez, J., Holder, L.B. and Cook, D.J. 2001. Application of graph-based concept learning to the predictive toxicology domain. In Proc. of the Predictive Toxicology Challenge Workshop.
Zurück zum Zitat Grindley, H.M., Artymiuk, P.J., Rice, D.W., and Willett, P. 1993. Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. Journal of Molecular Biology, 229:707–721.CrossRefPubMed Grindley, H.M., Artymiuk, P.J., Rice, D.W., and Willett, P. 1993. Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. Journal of Molecular Biology, 229:707–721.CrossRefPubMed
Zurück zum Zitat Guralnik, V. and Karypis, G. 2001. A scalabale algorithm for clustering sequence datasets. In Proc. of 2001 IEEE International Conference on Data Mining (ICDM). Guralnik, V. and Karypis, G. 2001. A scalabale algorithm for clustering sequence datasets. In Proc. of 2001 IEEE International Conference on Data Mining (ICDM).
Zurück zum Zitat Halldórsson, M.M., and Radhakrishnan, J. 1997. Greed is good: Approximating independent sets in sparse and bounded-degree graphs. Algorithmica, 18(1):145–163.MATHCrossRefMathSciNet Halldórsson, M.M., and Radhakrishnan, J. 1997. Greed is good: Approximating independent sets in sparse and bounded-degree graphs. Algorithmica, 18(1):145–163.MATHCrossRefMathSciNet
Zurück zum Zitat Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proc. of ACM SIGMOD International Conference on Management of Data Dallas, TX, pp. 1–12. Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proc. of ACM SIGMOD International Conference on Management of Data Dallas, TX, pp. 1–12.
Zurück zum Zitat Hochbaum, D.S. 1983. Efficient bounds for the stable set, vertex cover, and set packing problems. Discrete Applied Mathematics, 6:243–254.CrossRefMATHMathSciNet Hochbaum, D.S. 1983. Efficient bounds for the stable set, vertex cover, and set packing problems. Discrete Applied Mathematics, 6:243–254.CrossRefMATHMathSciNet
Zurück zum Zitat Holder, L.B., Cook, D.J., and Djoko, S. 1994. Substructure discovery in the SUBDUE system. In Proc. of the AAAI Workshop on Knowledge Discovery in Databases, pp. 169–180. Holder, L.B., Cook, D.J., and Djoko, S. 1994. Substructure discovery in the SUBDUE system. In Proc. of the AAAI Workshop on Knowledge Discovery in Databases, pp. 169–180.
Zurück zum Zitat Hong, M., Zhou, H., Wang, W., and Shi, B. 2003. An efficient algorithm of frequent connected subgraph extraction. In Proc. of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-03), Springer-Verlag, Vol. 2637, pp. 40–51. Hong, M., Zhou, H., Wang, W., and Shi, B. 2003. An efficient algorithm of frequent connected subgraph extraction. In Proc. of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-03), Springer-Verlag, Vol. 2637, pp. 40–51.
Zurück zum Zitat Huan, J., Wang, W., and Prins, J. 2003. Efficient mining of frequent subgraph in the presence of isomophism. In Proc. of 2003 IEEE International Conference on Data Mining (ICDM'03), pp. 549–552. Huan, J., Wang, W., and Prins, J. 2003. Efficient mining of frequent subgraph in the presence of isomophism. In Proc. of 2003 IEEE International Conference on Data Mining (ICDM'03), pp. 549–552.
Zurück zum Zitat Inokuchi, A., Washio, T., and Motoda, H. 2000. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'00), Lyon, France, pp. 13–23. Inokuchi, A., Washio, T., and Motoda, H. 2000. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'00), Lyon, France, pp. 13–23.
Zurück zum Zitat Inokuchi, A., Washio, T., and Motoda, H. 2003. Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning, 50(3):321–354.MATHCrossRef Inokuchi, A., Washio, T., and Motoda, H. 2003. Complete mining of frequent patterns from graphs: Mining graph data. Machine Learning, 50(3):321–354.MATHCrossRef
Zurück zum Zitat Inokuchi, A., Washio, T., Nishimura, K., and Motoda, H. 2002. A Fast Algorithm for Mining Frequent Connected Subgraphs (Tech. Rep. No. RT0448). IBM Research, Tokyo Research Laboratory. Inokuchi, A., Washio, T., Nishimura, K., and Motoda, H. 2002. A Fast Algorithm for Mining Frequent Connected Subgraphs (Tech. Rep. No. RT0448). IBM Research, Tokyo Research Laboratory.
Zurück zum Zitat Jensen, D. and Goldberg, H. (Eds.). 1998. Artificial Intelligence and Link Analysis Papers from the 1998 Fall Symposium. AAAI Press. Jensen, D. and Goldberg, H. (Eds.). 1998. Artificial Intelligence and Link Analysis Papers from the 1998 Fall Symposium. AAAI Press.
Zurück zum Zitat Jonyer, I., Cook, D.J., and Holder, L.B. 2001a. Discovery and evaluation of graph-based hierarchical conceptual clusters. Journal of Machine Learning Research, 2:19–43.CrossRef Jonyer, I., Cook, D.J., and Holder, L.B. 2001a. Discovery and evaluation of graph-based hierarchical conceptual clusters. Journal of Machine Learning Research, 2:19–43.CrossRef
Zurück zum Zitat Jonyer, I., Holder, L.B., and Cook, D.J. 2001b. Hierarchical conceptual structural clustering. International Journal on Artificial Intelligence Tools, 10(1–2):107–136.CrossRef Jonyer, I., Holder, L.B., and Cook, D.J. 2001b. Hierarchical conceptual structural clustering. International Journal on Artificial Intelligence Tools, 10(1–2):107–136.CrossRef
Zurück zum Zitat Khanna, S., Motwani, R., Sudan, M., and Vazirani, U.V. 1994. On syntactic versus computational views of approximability. In Proc. of IEEE Symposium on Foundations of Computer Science, pp. 819–830. Khanna, S., Motwani, R., Sudan, M., and Vazirani, U.V. 1994. On syntactic versus computational views of approximability. In Proc. of IEEE Symposium on Foundations of Computer Science, pp. 819–830.
Zurück zum Zitat Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A.S. 1999. The Web as a graph: Measurements, models and methods. Lecture Notes in Computer Science, 1627:1–17.MathSciNet Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A.S. 1999. The Web as a graph: Measurements, models and methods. Lecture Notes in Computer Science, 1627:1–17.MathSciNet
Zurück zum Zitat Ko, C. 2000. Logic induction of valid behavior specifications for intrusion detection. In IEEE Symposium on Security and Privacy (S&P), pp. 142–155. Ko, C. 2000. Logic induction of valid behavior specifications for intrusion detection. In IEEE Symposium on Security and Privacy (S&P), pp. 142–155.
Zurück zum Zitat Koch, I., Lengauer, T., and Wanke, E. 1996. An algorithm for finding maximal common subtopoloties in a set of protein structures. Journal of Computational Biology, 3(2):289–306.PubMedCrossRef Koch, I., Lengauer, T., and Wanke, E. 1996. An algorithm for finding maximal common subtopoloties in a set of protein structures. Journal of Computational Biology, 3(2):289–306.PubMedCrossRef
Zurück zum Zitat Kramer, S., De Raedt, L., and Helma, C. 2001. Molecular feature mining in HIV data. In Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-01), pp. 136–143. Kramer, S., De Raedt, L., and Helma, C. 2001. Molecular feature mining in HIV data. In Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-01), pp. 136–143.
Zurück zum Zitat Kuramochi, M., Deshpande, M., and Karypis, G. 2004. Data mining: Next generation challenges and future directions. In H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds.), AAAI Press (in press), pp. 315–334. Kuramochi, M., Deshpande, M., and Karypis, G. 2004. Data mining: Next generation challenges and future directions. In H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds.), AAAI Press (in press), pp. 315–334.
Zurück zum Zitat Kuramochi, M. and Karypis, G. 2001. Frequent subgraph discovery. In Proc. of 2001 IEEE International Conference on Data Mining (ICDM), pp. 313–320. Kuramochi, M. and Karypis, G. 2001. Frequent subgraph discovery. In Proc. of 2001 IEEE International Conference on Data Mining (ICDM), pp. 313–320.
Zurück zum Zitat Kuramochi, M. and Karypis, G. 2002. An Efficient Algorithm for Discovering Frequent Subgraphs (Tech. Rep. No. 02-026). University of Minnesota, Department of Computer Science. Kuramochi, M. and Karypis, G. 2002. An Efficient Algorithm for Discovering Frequent Subgraphs (Tech. Rep. No. 02-026). University of Minnesota, Department of Computer Science.
Zurück zum Zitat Kuramochi, M. and Karypis, G. 2004a. An efficient algorithm for discovering frequent subgraphs. IEEE Transactions on Knowledge and Data Engineering, 16(9):1038–1051.CrossRef Kuramochi, M. and Karypis, G. 2004a. An efficient algorithm for discovering frequent subgraphs. IEEE Transactions on Knowledge and Data Engineering, 16(9):1038–1051.CrossRef
Zurück zum Zitat Kuramochi, M. and Karypis, G. 2004b. Finding frequent patterns in a large sparse graph. In Proc. of the 2004 SIAM International Conference on Data Mining (SDM04). Kuramochi, M. and Karypis, G. 2004b. Finding frequent patterns in a large sparse graph. In Proc. of the 2004 SIAM International Conference on Data Mining (SDM04).
Zurück zum Zitat Lee, W. and Stolfo, S.J. 2000. A framework for constructing features and models for intrusion detection systems. ACM Transactions on Information and System Security, 3(4):227–261.CrossRef Lee, W. and Stolfo, S.J. 2000. A framework for constructing features and models for intrusion detection systems. ACM Transactions on Information and System Security, 3(4):227–261.CrossRef
Zurück zum Zitat Leibowitz, N., Fligelman, Z.Y., Nussinov, R., and Wolfson, H.J. 1999. Multiple structural alignment and core detection by geometric hashing. In Proc. of the 7th international conference on intelligent systems in molecular biology, Heidelberg: Germany, pp. 169–177. Leibowitz, N., Fligelman, Z.Y., Nussinov, R., and Wolfson, H.J. 1999. Multiple structural alignment and core detection by geometric hashing. In Proc. of the 7th international conference on intelligent systems in molecular biology, Heidelberg: Germany, pp. 169–177.
Zurück zum Zitat Leibowitz, N., Nussinov, R., and Wolfson, H.J. 2001. MUSTA—a general, efficient, automated method for multiple structure alignment and detection of common motifs: Application to proteins. Journal of Computational Biology, 8(2):93–121.CrossRefPubMed Leibowitz, N., Nussinov, R., and Wolfson, H.J. 2001. MUSTA—a general, efficient, automated method for multiple structure alignment and detection of common motifs: Application to proteins. Journal of Computational Biology, 8(2):93–121.CrossRefPubMed
Zurück zum Zitat Li, W., Han, J., and Pei, J. 2001. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proc. of 2001 IEEE International Conference on Data Mining (ICDM), pp. 369–376. Li, W., Han, J., and Pei, J. 2001. CMAR: Accurate and efficient classification based on multiple class-association rules. In Proc. of 2001 IEEE International Conference on Data Mining (ICDM), pp. 369–376.
Zurück zum Zitat Liu, B., Hsu, W., and Ma, Y. 1998. Integrating classification and association rule mining. In Proc. of the 4th Internation Conference on Knowledge Discovery and Data Mining (kdd-98), pp. 80–86. Liu, B., Hsu, W., and Ma, Y. 1998. Integrating classification and association rule mining. In Proc. of the 4th Internation Conference on Knowledge Discovery and Data Mining (kdd-98), pp. 80–86.
Zurück zum Zitat McKay, B.D. 1981. Practical graph isomorphism. Congressus Numerantium, 30:45–87.MathSciNet McKay, B.D. 1981. Practical graph isomorphism. Congressus Numerantium, 30:45–87.MathSciNet
Zurück zum Zitat Mitchell, E.M., Artymiuk, P.J., Rice, D.W., and Willett, P. 1989. Use of techniques derived from graph theory to compare secondary structure motifs in proteins. Journal of Molecular Biology, 212:151–166.CrossRef Mitchell, E.M., Artymiuk, P.J., Rice, D.W., and Willett, P. 1989. Use of techniques derived from graph theory to compare secondary structure motifs in proteins. Journal of Molecular Biology, 212:151–166.CrossRef
Zurück zum Zitat Mooney, R.J., Melville, P., Tang, L.R., Shavlik, J., Castro Dutra, I. de, Page, D. et al. 2004. Relational data mining with inductive logic programming for link discovery. In AAAI Press/The MIT Press, pp. 239–255. Mooney, R.J., Melville, P., Tang, L.R., Shavlik, J., Castro Dutra, I. de, Page, D. et al. 2004. Relational data mining with inductive logic programming for link discovery. In AAAI Press/The MIT Press, pp. 239–255.
Zurück zum Zitat Muggleton, S.H. 1999. Scientific knowledge discovery using Inductive Logic Programming. Communications of the ACM, 42(11):42–46.CrossRef Muggleton, S.H. 1999. Scientific knowledge discovery using Inductive Logic Programming. Communications of the ACM, 42(11):42–46.CrossRef
Zurück zum Zitat Östergård, P.R.J. 2002. A fast algorithm for the maximum clique problem. Discrete Applied Mathematics, 120:195–205.CrossRef Östergård, P.R.J. 2002. A fast algorithm for the maximum clique problem. Discrete Applied Mathematics, 120:195–205.CrossRef
Zurück zum Zitat Palmer, C.R., Gibbons, P.B., and Faloutsos, C. 2002. ANF: A fast and scalable tool for data mining in massive graphs. In Proc. of the 8th ACM SIGKDD Internal Conference on Knowlege Discovery and Data Mining (KDD-2002) Edmonton, AB, Canada, pp. 81–90. Palmer, C.R., Gibbons, P.B., and Faloutsos, C. 2002. ANF: A fast and scalable tool for data mining in massive graphs. In Proc. of the 8th ACM SIGKDD Internal Conference on Knowlege Discovery and Data Mining (KDD-2002) Edmonton, AB, Canada, pp. 81–90.
Zurück zum Zitat Pennec, X. and Ayache, N. 1998. A geometric algorithm to find small but highly simialar 3D substructures in proteins. Bioinformatics, 14(6):516–522.CrossRefPubMed Pennec, X. and Ayache, N. 1998. A geometric algorithm to find small but highly simialar 3D substructures in proteins. Bioinformatics, 14(6):516–522.CrossRefPubMed
Zurück zum Zitat Raymond, J.W. 2002. Heuristics for similarity searching of chemical graphs using a maximum common edge subgraph algorithm. J. Chem. Inf. Comput. Sci., 42:305–316.CrossRefPubMed Raymond, J.W. 2002. Heuristics for similarity searching of chemical graphs using a maximum common edge subgraph algorithm. J. Chem. Inf. Comput. Sci., 42:305–316.CrossRefPubMed
Zurück zum Zitat Srinivasan, A., King, R.D., Muggleton, S.H., and Sternberg, M.J.E. 1997. Carcinogenesis predictions using ILP. In Proc. of the 7th International Workshop on Inductive Logic Programming S. Džeroski and N. Lavrač (Eds.), Springer-Verlag, (Vol. 1297, pp. 273–287). Srinivasan, A., King, R.D., Muggleton, S.H., and Sternberg, M.J.E. 1997. Carcinogenesis predictions using ILP. In Proc. of the 7th International Workshop on Inductive Logic Programming S. Džeroski and N. Lavrač (Eds.), Springer-Verlag, (Vol. 1297, pp. 273–287).
Zurück zum Zitat Vanetik, N., Gudes, E., and Shimony, S.E. 2002. Computing frequent graph patterns from semistructured data. In Proc. of 2002 IEEE International Conference on Data Mining (ICDM), pp. 458–465. Vanetik, N., Gudes, E., and Shimony, S.E. 2002. Computing frequent graph patterns from semistructured data. In Proc. of 2002 IEEE International Conference on Data Mining (ICDM), pp. 458–465.
Zurück zum Zitat Wang, X., Wang, J.T.L., Shasha, D., Shapiro, B.A., Rigoutsos, I., and Zhang, K. 2002. Finding patterns in three dimensional graphs: Algorithms and applications to scientific data mining. IEEE Transactions on Knowledge and Data Engineering, 14(4):731–749.CrossRef Wang, X., Wang, J.T.L., Shasha, D., Shapiro, B.A., Rigoutsos, I., and Zhang, K. 2002. Finding patterns in three dimensional graphs: Algorithms and applications to scientific data mining. IEEE Transactions on Knowledge and Data Engineering, 14(4):731–749.CrossRef
Zurück zum Zitat Wasserman, S., Faust, K., and Iacobucci, D. 1994. Social network analysis : Methods and applications. Cambridge University Press. Wasserman, S., Faust, K., and Iacobucci, D. 1994. Social network analysis : Methods and applications. Cambridge University Press.
Zurück zum Zitat Yan, X. and Han, J. 2002. gSpan: Graph-based substructure pattern mining. In Proc. of 2002 IEEE International Conference on Data Mining (ICDM), pp. 721–724. Yan, X. and Han, J. 2002. gSpan: Graph-based substructure pattern mining. In Proc. of 2002 IEEE International Conference on Data Mining (ICDM), pp. 721–724.
Zurück zum Zitat Yan, X. and Han, J. 2003. CloseGraph: Mining closed frequent graph patterns. In Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), pp. 286–295. Yan, X. and Han, J. 2003. CloseGraph: Mining closed frequent graph patterns. In Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), pp. 286–295.
Zurück zum Zitat Yoshida, K. and Motoda, H. 1995. CLIP: Concept learning from inference patterns. Artificial Intelligence, 75(1):63–92.CrossRef Yoshida, K. and Motoda, H. 1995. CLIP: Concept learning from inference patterns. Artificial Intelligence, 75(1):63–92.CrossRef
Zurück zum Zitat Yoshida, K., Motoda, H., and Indurkhya, N. 1994. Graph-based induction as a unified learning framework. Journal of Applied Intelligence, 4:297–328.CrossRef Yoshida, K., Motoda, H., and Indurkhya, N. 1994. Graph-based induction as a unified learning framework. Journal of Applied Intelligence, 4:297–328.CrossRef
Zurück zum Zitat Zaki, M.J. and Gouda, K. 2003. Fast vertical mining using diffsets. In Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), pp. 326–335. Zaki, M.J. and Gouda, K. 2003. Fast vertical mining using diffsets. In Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003), pp. 326–335.
Metadaten
Titel
Finding Frequent Patterns in a Large Sparse Graph*
verfasst von
Michihiro Kuramochi
George Karypis
Publikationsdatum
01.11.2005
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 3/2005
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-005-0003-9

Weitere Artikel der Ausgabe 3/2005

Data Mining and Knowledge Discovery 3/2005 Zur Ausgabe

Premium Partner