Skip to main content
Erschienen in: Knowledge and Information Systems 2/2015

01.08.2015 | Regular Paper

AGraP: an algorithm for mining frequent patterns in a single graph using inexact matching

verfasst von: Marisol Flores-Garrido, Jesús-Ariel Carrasco-Ochoa, José Fco. Martínez-Trinidad

Erschienen in: Knowledge and Information Systems | Ausgabe 2/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Frequent graph mining algorithms commonly use graph isomorphism to identify occurrences of a given pattern, but in the last years, a few works have focused on the case where a pattern could differ from its occurrences, which can be important to analyze noisy data. These later algorithms allow differences in labels and structural differences in edges, but to the best of our knowledge, none of them considers structural differences in vertices. How can we identify occurrences that differ by one (or several) nodes from the pattern they represent? Our work approaches the problem of frequent graph pattern mining with two main characteristics. First, we use inexact matching, allowing structural differences in both edges and vertices. Second, we focus on the problem of mining patterns in a single graph, a problem that has been less explored than the case in which patterns are mined from a graph collection. In this paper, we introduce two similarity functions to compare graphs using inexact matching and an algorithm, AGraP, able to identify patterns that can have structural differences with respect to their occurrences. Our experimental results show that AGraP is able to find patterns that cannot be found by other state-of-the-art algorithms. Additionally, we show that the patterns mined by AGraP are useful in classification tasks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Acosta-Mendoza N, Gago-Alonso A, Medina-Pagola JE (2012) Frequent approximate subgraphs as features for graph-based image classification. Knowl-Based Syst 27:381–392CrossRef Acosta-Mendoza N, Gago-Alonso A, Medina-Pagola JE (2012) Frequent approximate subgraphs as features for graph-based image classification. Knowl-Based Syst 27:381–392CrossRef
2.
Zurück zum Zitat Aggarwal CC, Wang H (2010) Managing and mining graph data. Springer, BerlinCrossRef Aggarwal CC, Wang H (2010) Managing and mining graph data. Springer, BerlinCrossRef
3.
Zurück zum Zitat Alguliev R, Aliguliyev R, Ganjaliyev F (2011) Extracting a heterogeneous social network of academic researchers on the web based on information retrieved from multiple sources. Am J Oper Res 2(1):33–38CrossRef Alguliev R, Aliguliyev R, Ganjaliyev F (2011) Extracting a heterogeneous social network of academic researchers on the web based on information retrieved from multiple sources. Am J Oper Res 2(1):33–38CrossRef
4.
Zurück zum Zitat Anchuri P, Zaki MJ, Barkol O, Bergman R, Felder Y, Golan S, Sityon A (2012) Graph mining for discovering infrastructure patterns in configuration management databases. Knowl Inf Syst 33(3):491–522CrossRef Anchuri P, Zaki MJ, Barkol O, Bergman R, Felder Y, Golan S, Sityon A (2012) Graph mining for discovering infrastructure patterns in configuration management databases. Knowl Inf Syst 33(3):491–522CrossRef
5.
Zurück zum Zitat Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: Proceedings of the 2002 IEEE international conference on data mining. Maebashi City, Japan, pp 51–58 Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: Proceedings of the 2002 IEEE international conference on data mining. Maebashi City, Japan, pp 51–58
6.
Zurück zum Zitat Bringmann B, Nijssen S (2008) What is frequent in a single graph?. In: Washio T, Suzuki E, Ting K, Inokuchi A (eds) Advances in knowledge discovery and data mining, lecture notes in computer science, Springer, Berlin 5012:858–863 Bringmann B, Nijssen S (2008) What is frequent in a single graph?. In: Washio T, Suzuki E, Ting K, Inokuchi A (eds) Advances in knowledge discovery and data mining, lecture notes in computer science, Springer, Berlin 5012:858–863
7.
Zurück zum Zitat Bunke H, Shearer K (1998) A graph distance metric based on the maximal common subgraph. Pattern Recognit Lett 19(3–4):255–259CrossRef Bunke H, Shearer K (1998) A graph distance metric based on the maximal common subgraph. Pattern Recognit Lett 19(3–4):255–259CrossRef
8.
Zurück zum Zitat Bunke H, Riesen K (2011) Recent advances in graph-based pattern recognition with applications in document analysis. Pattern Recognit 44:1057–1067CrossRef Bunke H, Riesen K (2011) Recent advances in graph-based pattern recognition with applications in document analysis. Pattern Recognit 44:1057–1067CrossRef
9.
Zurück zum Zitat Chen C, Yan X, Zhu F, Han J (2007) gApprox: mining frequent approximate patterns from a massive network. In: ICDM, IEEE computer society, pp 445–450 Chen C, Yan X, Zhu F, Han J (2007) gApprox: mining frequent approximate patterns from a massive network. In: ICDM, IEEE computer society, pp 445–450
10.
Zurück zum Zitat Cook D, Holder L (2007) Mining graph data. Wiley-Interscience, New York Cook D, Holder L (2007) Mining graph data. Wiley-Interscience, New York
11.
Zurück zum Zitat Dehmer M, Emmert-Streib F (2007) Comparing large graphs efficiently by margins of feature vectors. Appl Math Comput 188(2):1699–1710CrossRef Dehmer M, Emmert-Streib F (2007) Comparing large graphs efficiently by margins of feature vectors. Appl Math Comput 188(2):1699–1710CrossRef
12.
Zurück zum Zitat Deshpande M, Kuramochi M, Wale N, Karypis G (2005) Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng 17(8):1036–1050CrossRef Deshpande M, Kuramochi M, Wale N, Karypis G (2005) Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng 17(8):1036–1050CrossRef
13.
Zurück zum Zitat Fiedler M, Borgelt C (2007) Support computation for mining frequent subgraphs in a single graph. In: 5th International workshop on mining and learning with graphs, pp 25–30 Fiedler M, Borgelt C (2007) Support computation for mining frequent subgraphs in a single graph. In: 5th International workshop on mining and learning with graphs, pp 25–30
14.
Zurück zum Zitat Gago-Alonso A, Medina-Pagola J, Carrasco-Ochoa J, Martnez-Trinidad J (2008) Mining frequent connected subgraphs reducing the number of candidates. In: Daelemans W, Goethals B, Morik K (eds) Machine learning and knowledge discovery in databases, lecture notes in computer science, Springer, Berlin, 5211:365–376 Gago-Alonso A, Medina-Pagola J, Carrasco-Ochoa J, Martnez-Trinidad J (2008) Mining frequent connected subgraphs reducing the number of candidates. In: Daelemans W, Goethals B, Morik K (eds) Machine learning and knowledge discovery in databases, lecture notes in computer science, Springer, Berlin, 5211:365–376
15.
Zurück zum Zitat Gao X, Xiao B, Tao D, Li X (2010) A survey of graph edit distance. Pattern Anal Appl 13(1):113–129CrossRef Gao X, Xiao B, Tao D, Li X (2010) A survey of graph edit distance. Pattern Anal Appl 13(1):113–129CrossRef
16.
Zurück zum Zitat Gärtner T (2002) Exponential an geometric kernels for graphs. In: NIPS*02 workshop on unreal data, principles of modeling nonvectorial data Gärtner T (2002) Exponential an geometric kernels for graphs. In: NIPS*02 workshop on unreal data, principles of modeling nonvectorial data
17.
Zurück zum Zitat Gärtner T, Flach P, Wrobel, S (2003) On graph kernels: hardness results and efficient alternatives. In: Conference on learning theory, pp 129–143 Gärtner T, Flach P, Wrobel, S (2003) On graph kernels: hardness results and efficient alternatives. In: Conference on learning theory, pp 129–143
18.
Zurück zum Zitat Hellal A, Romdhane LB (2013) Nodar: mining globally distributed substructures from a single labeled graph. J Intell Inf Syst 40(1):1–15CrossRef Hellal A, Romdhane LB (2013) Nodar: mining globally distributed substructures from a single labeled graph. J Intell Inf Syst 40(1):1–15CrossRef
19.
Zurück zum Zitat Hidovic D, Pelillo M (2004) Metrics for attributed graphs based on the maximal similarity common subgraph. In: IJPRAI, pp 299–313 Hidovic D, Pelillo M (2004) Metrics for attributed graphs based on the maximal similarity common subgraph. In: IJPRAI, pp 299–313
20.
Zurück zum Zitat Holder LB (1988) Substructure discovery in subdue. Technical Report UILU-ENG-88-2220, Department of Computer Science, University of Illinois, Urbana Holder LB (1988) Substructure discovery in subdue. Technical Report UILU-ENG-88-2220, Department of Computer Science, University of Illinois, Urbana
21.
Zurück zum Zitat Holder L, Cook D, Djoko S (1994) Substructure discovery in the SUBDUE system. In: Proceedings of the AAAI workshop on knowledge discovery in databases, pp 169–180 Holder L, Cook D, Djoko S (1994) Substructure discovery in the SUBDUE system. In: Proceedings of the AAAI workshop on knowledge discovery in databases, pp 169–180
22.
Zurück zum Zitat Huan J, Wang W, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A (2004), Mining spatial motifs from protein structure graphs. In: Proceedings of the 8th annual international conference on research in computational, molecular biology (RECOMB04), pp 308–315 Huan J, Wang W, Bandyopadhyay D, Snoeyink J, Prins J, Tropsha A (2004), Mining spatial motifs from protein structure graphs. In: Proceedings of the 8th annual international conference on research in computational, molecular biology (RECOMB04), pp 308–315
23.
Zurück zum Zitat Jia Y, Zhang J, Huan J (2011) An efficient graph-mining method for complicated and noisy data with real-world applications. Knowl Inf Syst 28(2):423–447CrossRef Jia Y, Zhang J, Huan J (2011) An efficient graph-mining method for complicated and noisy data with real-world applications. Knowl Inf Syst 28(2):423–447CrossRef
24.
Zurück zum Zitat Kondor R, Lafferty J (2002) Diffusion kernels on graphs and other discrete input spaces. In: International conference on machine learning (ICML) Kondor R, Lafferty J (2002) Diffusion kernels on graphs and other discrete input spaces. In: International conference on machine learning (ICML)
25.
Zurück zum Zitat Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Discov 11(3):243–271CrossRef Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Discov 11(3):243–271CrossRef
26.
Zurück zum Zitat Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In Proceedings of the international conference data mining (ICDM01), pp 313–320 Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In Proceedings of the international conference data mining (ICDM01), pp 313–320
27.
Zurück zum Zitat Kuramochi M, Karypis G (2004) GREW—a scalable frequent subgraph discovery algorithm. In: Proceedings of the fourth IEEE international conference on data mining, pp 439–442 Kuramochi M, Karypis G (2004) GREW—a scalable frequent subgraph discovery algorithm. In: Proceedings of the fourth IEEE international conference on data mining, pp 439–442
29.
Zurück zum Zitat Lin ZJ, Lyu MR, King I (2012) Matchsim: a novel similarity measure based on maximum neighborhood matching. Knowl Inf Syst 32(1):141–166CrossRef Lin ZJ, Lyu MR, King I (2012) Matchsim: a novel similarity measure based on maximum neighborhood matching. Knowl Inf Syst 32(1):141–166CrossRef
30.
Zurück zum Zitat López-Presa JL (2009) Efficient algorithms for graph isomorphism testing. PhD Thesis at the Universidad Rey Juan Carlos, Madrid, España López-Presa JL (2009) Efficient algorithms for graph isomorphism testing. PhD Thesis at the Universidad Rey Juan Carlos, Madrid, España
31.
Zurück zum Zitat Lu W, Janssen J, Milios E, Japkowicz N, Zhang Y (2007) Node similarity in the citation graph. Knowl Inf Syst 11(1):105–129CrossRef Lu W, Janssen J, Milios E, Japkowicz N, Zhang Y (2007) Node similarity in the citation graph. Knowl Inf Syst 11(1):105–129CrossRef
32.
Zurück zum Zitat Moody J (2001) Peer influence groups: identifying dense clusters in large networks. Soc Netw 23:261–283CrossRef Moody J (2001) Peer influence groups: identifying dense clusters in large networks. Soc Netw 23:261–283CrossRef
34.
Zurück zum Zitat Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 647–652 Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 647–652
35.
Zurück zum Zitat Ramon J, Gärtner T (2003) Expressivity versus efficiency of graph kernels. In: Proceedings of the first international workshop on mining graphs, trees and sequences, pp 65–74 Ramon J, Gärtner T (2003) Expressivity versus efficiency of graph kernels. In: Proceedings of the first international workshop on mining graphs, trees and sequences, pp 65–74
36.
Zurück zum Zitat Ranu S, Singh AK (2009) GraphSig: a scalable approach to mining significant subgraphs in large graph databases. In: IEEE 25th international conference on data, engineering, pp 844–855 Ranu S, Singh AK (2009) GraphSig: a scalable approach to mining significant subgraphs in large graph databases. In: IEEE 25th international conference on data, engineering, pp 844–855
37.
Zurück zum Zitat Riesen K, Bunke, H (2008) IAM graph database repository for graph based pattern recognition and machine learning. In: Proceedings of the international workshop on structural syntatctic and statistical pattern recognition, lecture notes in computer science pp 287–297 Riesen K, Bunke, H (2008) IAM graph database repository for graph based pattern recognition and machine learning. In: Proceedings of the international workshop on structural syntatctic and statistical pattern recognition, lecture notes in computer science pp 287–297
38.
Zurück zum Zitat Saeedy ME, Kalnis P (2011) GraMi: generalized frequent pattern mining in a single large graph. Technical Report at the Division of Mathematical and Computer Sciences and Engineering, King Abdullah University of Science and Technology Saeedy ME, Kalnis P (2011) GraMi: generalized frequent pattern mining in a single large graph. Technical Report at the Division of Mathematical and Computer Sciences and Engineering, King Abdullah University of Science and Technology
39.
Zurück zum Zitat Sanfeliu A, Fu KS (1983) A distance measure between attributed relational graphs for pattern recognition. IEEE Trans Syst Man Cybern 13(3):353–362CrossRef Sanfeliu A, Fu KS (1983) A distance measure between attributed relational graphs for pattern recognition. IEEE Trans Syst Man Cybern 13(3):353–362CrossRef
40.
Zurück zum Zitat Shervashidze N, Vishwanathan SVN, Petri T, Mehlhorn K, Borgwardt K (2009) Efficient graphlet kernels for large graph comparison. In: Proceedings of international conference on artificial intelligence and statistics Shervashidze N, Vishwanathan SVN, Petri T, Mehlhorn K, Borgwardt K (2009) Efficient graphlet kernels for large graph comparison. In: Proceedings of international conference on artificial intelligence and statistics
41.
Zurück zum Zitat Smola AJ, Kondor IR (2003) Kernels and regularization on graphs. In: Schölkopf B, Warmuth MK (eds) Proceedings of the annual conference computational learning theory, pp 144–158 Smola AJ, Kondor IR (2003) Kernels and regularization on graphs. In: Schölkopf B, Warmuth MK (eds) Proceedings of the annual conference computational learning theory, pp 144–158
42.
Zurück zum Zitat Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson International Edition, Pearson Addison Wesley Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Pearson International Edition, Pearson Addison Wesley
43.
Zurück zum Zitat Vishwanathan SVN, Borgwardt K, Kondor I, Schraudolph N (2008) Graph kernels. J Mach Learn Res 9:1–41 Vishwanathan SVN, Borgwardt K, Kondor I, Schraudolph N (2008) Graph kernels. J Mach Learn Res 9:1–41
44.
Zurück zum Zitat Wale N, Watson IA, Karypis G (2008) Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl Inf Syst 14(3):347–375CrossRef Wale N, Watson IA, Karypis G (2008) Comparison of descriptor spaces for chemical compound retrieval and classification. Knowl Inf Syst 14(3):347–375CrossRef
45.
Zurück zum Zitat Xiao Y, Dong H, Wu W, Xiong M, Wang W, Shi B (2008) Structure-based graph distance measures of high degree of precision. Pattern Recognit 41(12):3547–3561CrossRef Xiao Y, Dong H, Wu W, Xiong M, Wang W, Shi B (2008) Structure-based graph distance measures of high degree of precision. Pattern Recognit 41(12):3547–3561CrossRef
46.
Zurück zum Zitat Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM ’02) Yan X, Han J (2002) gSpan: graph-based substructure pattern mining. In: Proceedings of the 2002 IEEE international conference on data mining (ICDM ’02)
47.
Zurück zum Zitat Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: Proceedings of the SIGMOD conference, pp 335–346 Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: Proceedings of the SIGMOD conference, pp 335–346
48.
Zurück zum Zitat Zhang S, Yang J, Cheedella V (2007) Monkey: approximate graph mining based on spanning trees. In: Proceedings of the IEEE 23rd international conference on data engineering (ICDE 2007), pp 1247–1249 Zhang S, Yang J, Cheedella V (2007) Monkey: approximate graph mining based on spanning trees. In: Proceedings of the IEEE 23rd international conference on data engineering (ICDE 2007), pp 1247–1249
49.
Zurück zum Zitat Zou Z, Li J, Gao H, Zhang S (2009) Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM ’09), pp 583–592 Zou Z, Li J, Gao H, Zhang S (2009) Frequent subgraph pattern mining on uncertain graph data. In: Proceedings of the 18th ACM conference on information and knowledge management (CIKM ’09), pp 583–592
Metadaten
Titel
AGraP: an algorithm for mining frequent patterns in a single graph using inexact matching
verfasst von
Marisol Flores-Garrido
Jesús-Ariel Carrasco-Ochoa
José Fco. Martínez-Trinidad
Publikationsdatum
01.08.2015
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 2/2015
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-014-0747-x

Weitere Artikel der Ausgabe 2/2015

Knowledge and Information Systems 2/2015 Zur Ausgabe

Premium Partner