Skip to main content
Erschienen in: Knowledge and Information Systems 1/2018

08.11.2017 | Regular Paper

Resling: a scalable and generic framework to mine top-k representative subgraph patterns

verfasst von: Dheepikaa Natarajan, Sayan Ranu

Erschienen in: Knowledge and Information Systems | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Mining subgraph patterns is an active area of research due to its wide-ranging applications. Examples include frequent subgraph mining, discriminative subgraph mining, statistically significant subgraphs. Existing research has primarily focused on mining all subgraph patterns in the database. However, due to the exponential subgraph search space, the number of patterns mined, typically, is too large for any human-mediated analysis. Consequently, deriving insights from the mined patterns is hard for domain scientists. In addition, subgraph pattern mining is posed in multiple forms: the function that models if a subgraph is a pattern varies based on the application and the database could be over multiple graphs or a single, large graph. In this paper, we ask the following question: Given a subgraph importance function and a budget k, which are the k subgraph patterns that best represent all other patterns of interest? We show that the problem is NP-hard, and propose a generic framework called Resling that adapts to arbitrary subgraph importance functions and generalizable to both transactional graph databases as well as single, large graphs. Resling derives its power by structuring the search space in the form of an edit map, where each subgraph is a node, and two subgraphs are connected if they have an edit distance of one. We rank nodes in the edit map through two random walk based algorithms: vertex-reinforced random walks ( Resling -VR) and negative-reinforced random walks( Resling -NR). Experiments show that Resling-VR is up to 20 times more representative of the pattern space and two orders of magnitude faster than the state-of-the-art techniques. Resling-NR further improves the running time while maintaining comparable or better performance in representative power.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
We slightly abuse the term “undirected graph” here. Although the edges are undirected, the sign of the edge weight depends on the direction.
 
Literatur
1.
Zurück zum Zitat Ranu S, Singh AK (2012) Indexing and mining topological patterns for drug discovery. In: EDBT, pp 562–565 Ranu S, Singh AK (2012) Indexing and mining topological patterns for drug discovery. In: EDBT, pp 562–565
2.
Zurück zum Zitat Ranu S, Hoang M, Singh A (2013) Mining discriminative subgraphs from global-state networks. In: KDD, pp 509–517 Ranu S, Hoang M, Singh A (2013) Mining discriminative subgraphs from global-state networks. In: KDD, pp 509–517
3.
Zurück zum Zitat Chaoji V, Ranu S, Rastogi R, Bhatt R (2012) Recommendations to boost content spread in social networks. In: WWW, pp 529–538 Chaoji V, Ranu S, Rastogi R, Bhatt R (2012) Recommendations to boost content spread in social networks. In: WWW, pp 529–538
4.
Zurück zum Zitat Banerjee P, Ranu S, Raghavan S (2014) Inferring uncertain trajectories from partial observations. In: ICDM, pp 30–39 Banerjee P, Ranu S, Raghavan S (2014) Inferring uncertain trajectories from partial observations. In: ICDM, pp 30–39
5.
Zurück zum Zitat Banerjee P, Yawalkar P, Ranu S (2016) Mantra: a scalable approach to mining temporally anomalous sub-trajectories. In: KDD, pp 1415–1424 Banerjee P, Yawalkar P, Ranu S (2016) Mantra: a scalable approach to mining temporally anomalous sub-trajectories. In: KDD, pp 1415–1424
6.
Zurück zum Zitat Yan X, Han J (2002) Gspan: graph-based substructure pattern mining. In: ICDM, p 721. ISBN: 0-7695-1754-4 Yan X, Han J (2002) Gspan: graph-based substructure pattern mining. In: ICDM, p 721. ISBN: 0-7695-1754-4
7.
Zurück zum Zitat Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph*. Data Min Knowl Discov 11(3):243–271MathSciNetCrossRef Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph*. Data Min Knowl Discov 11(3):243–271MathSciNetCrossRef
8.
Zurück zum Zitat Elseidy M, Abdelhamid E, Skiadopoulos S, Kalnis P (2014) Grami: frequent subgraph and pattern mining in a single large graph. PVLDB 7(7):517–528 Elseidy M, Abdelhamid E, Skiadopoulos S, Kalnis P (2014) Grami: frequent subgraph and pattern mining in a single large graph. PVLDB 7(7):517–528
9.
Zurück zum Zitat Gurukar S, Ranu S, Ravindran B (2015) Commit: A scalable approach to mining communication motifs from dynamic networks. In: SIGMOD, pp 475–489 Gurukar S, Ranu S, Ravindran B (2015) Commit: A scalable approach to mining communication motifs from dynamic networks. In: SIGMOD, pp 475–489
10.
Zurück zum Zitat Thoma M, Cheng H, Gretton A, Han J, Kriegel H-P, Smola A, Song L, Yu PS, Yan X, Borgwardt K (2009) Near-optimal supervised feature selection among frequent subgraphs. In: SDM 2009, pp 1076–1087 Thoma M, Cheng H, Gretton A, Han J, Kriegel H-P, Smola A, Song L, Yu PS, Yan X, Borgwardt K (2009) Near-optimal supervised feature selection among frequent subgraphs. In: SDM 2009, pp 1076–1087
11.
Zurück zum Zitat Hasan MA, Zaki MJ (2009) Output space sampling for graph patterns. PVLDB 2(1):730–741 Hasan MA, Zaki MJ (2009) Output space sampling for graph patterns. PVLDB 2(1):730–741
12.
Zurück zum Zitat Ranu S, Singh AK (2009) Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: ICDE Ranu S, Singh AK (2009) Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: ICDE
13.
Zurück zum Zitat Ranu S, Calhoun BT, Singh AK, Swamidass SJ (2011) Probabilistic substructure mining from small-molecule screens. Mol Inf 30(9):809–815CrossRef Ranu S, Calhoun BT, Singh AK, Swamidass SJ (2011) Probabilistic substructure mining from small-molecule screens. Mol Inf 30(9):809–815CrossRef
14.
Zurück zum Zitat Ranu S, Singh AK (2009) Mining statistically significant molecular substructures for efficient molecular classification. J Chem Inf Model 49:2537–2550CrossRef Ranu S, Singh AK (2009) Mining statistically significant molecular substructures for efficient molecular classification. J Chem Inf Model 49:2537–2550CrossRef
15.
Zurück zum Zitat Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: ICDM Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: ICDM
16.
Zurück zum Zitat Nijssen S, Kok JN (2004) The Gaston tool for frequent subgraph mining. In: Proceedings of the international workshop on graph-based tools Nijssen S, Kok JN (2004) The Gaston tool for frequent subgraph mining. In: Proceedings of the international workshop on graph-based tools
17.
Zurück zum Zitat Yan X, Cheng H, Han J, Yu PS (2008) Mining significant graph patterns by scalable leap search. In: SIGMOD Yan X, Cheng H, Han J, Yu PS (2008) Mining significant graph patterns by scalable leap search. In: SIGMOD
18.
Zurück zum Zitat Jin N, Young C, Wang W (2010) Gaia: graph classification using evolutionary computation. In: SIGMOD Jin N, Young C, Wang W (2010) Gaia: graph classification using evolutionary computation. In: SIGMOD
19.
Zurück zum Zitat Cheng H, Lo D, Zhou Y, Wang X, Yan X (2009) Identifying bug signatures using discriminative graph mining, In: Proceedings of the eighteenth international symposium on software testing and analysis, pp 141–152 Cheng H, Lo D, Zhou Y, Wang X, Yan X (2009) Identifying bug signatures using discriminative graph mining, In: Proceedings of the eighteenth international symposium on software testing and analysis, pp 141–152
20.
Zurück zum Zitat Dutkowski J, Ideker T (2011) Protein networks as logic functions in development and cancer. PLoS Comput Biol 7:e1002180CrossRef Dutkowski J, Ideker T (2011) Protein networks as logic functions in development and cancer. PLoS Comput Biol 7:e1002180CrossRef
21.
Zurück zum Zitat Hasan MA, Chaoji V, Salem S, Besson J, Zaki MJ (2007) Origami: mining representative orthogonal graph patterns. In: ICDM, pp 153–162 Hasan MA, Chaoji V, Salem S, Besson J, Zaki MJ (2007) Origami: mining representative orthogonal graph patterns. In: ICDM, pp 153–162
22.
Zurück zum Zitat Yan X, Han J (2003) Closegraph: mining closed frequent graph patterns. In: KDD, pp 286–295 Yan X, Han J (2003) Closegraph: mining closed frequent graph patterns. In: KDD, pp 286–295
23.
Zurück zum Zitat Zeng Z, Tung AKH, Wang J, Feng J, Zhou L (2009) Comparing stars: on approximating graph edit distance. PVLDB 2(1):25–36 Zeng Z, Tung AKH, Wang J, Feng J, Zhou L (2009) Comparing stars: on approximating graph edit distance. PVLDB 2(1):25–36
24.
Zurück zum Zitat Zhang S, Yang J, Li S (2009) Ring: an integrated method for frequent representative subgraph mining, In: ICDM, pp 1082–1087 Zhang S, Yang J, Li S (2009) Ring: an integrated method for frequent representative subgraph mining, In: ICDM, pp 1082–1087
25.
Zurück zum Zitat Natarajan D, Ranu S (2016) A scalable and generic framework to mine top-k representative subgraph patterns. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 370–379 Natarajan D, Ranu S (2016) A scalable and generic framework to mine top-k representative subgraph patterns. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE, pp 370–379
26.
Zurück zum Zitat Metwally A, Agrawal D, El Abbadi A (2005) Efficient computation of frequent and top-k elements in data streams. In: ICDT, pp 398–412 Metwally A, Agrawal D, El Abbadi A (2005) Efficient computation of frequent and top-k elements in data streams. In: ICDT, pp 398–412
27.
Zurück zum Zitat Ranu S, Hoang M, Singh A (2014) Answering top-k representative queries on graph databases. In: SIGMOD, pp 1163–1174 Ranu S, Hoang M, Singh A (2014) Answering top-k representative queries on graph databases. In: SIGMOD, pp 1163–1174
28.
Zurück zum Zitat Drosou M, Pitoura E (2012) Disc diversity: result diversification based on dissimilarity and coverage. PVLDB 6(1):13–24 Drosou M, Pitoura E (2012) Disc diversity: result diversification based on dissimilarity and coverage. PVLDB 6(1):13–24
29.
Zurück zum Zitat Cornuejols G, Fisher ML, Nemhauser GL (1977) Location of bank accounts to optimize float: an analytic study of exact and approximate algorithms. Manag Sci 23(8):789–810MathSciNetCrossRefMATH Cornuejols G, Fisher ML, Nemhauser GL (1977) Location of bank accounts to optimize float: an analytic study of exact and approximate algorithms. Manag Sci 23(8):789–810MathSciNetCrossRefMATH
30.
Zurück zum Zitat He H, Singh AK (2006) Closure-tree: an index structure for graph queries. In: ICDE He H, Singh AK (2006) Closure-tree: an index structure for graph queries. In: ICDE
31.
Zurück zum Zitat Page L, Brin S, Motwani R, Winograd T (1998) The pagerank citation ranking: bringing order to the web. In: WWW, pp 161–172 Page L, Brin S, Motwani R, Winograd T (1998) The pagerank citation ranking: bringing order to the web. In: WWW, pp 161–172
33.
Zurück zum Zitat Badrinath R, Madhavan CEV (2012) Diversity in ranking using negative reinforcement. In: Proceedings of the ACM SIGKDD workshop on mining data semantics, vol 11, no 1–11, p 6 Badrinath R, Madhavan CEV (2012) Diversity in ranking using negative reinforcement. In: Proceedings of the ACM SIGKDD workshop on mining data semantics, vol 11, no 1–11, p 6
34.
Zurück zum Zitat Mei Q, Guo J, Radev D (2010) Divrank: the interplay of prestige and diversity in information networks. In: KDD Mei Q, Guo J, Radev D (2010) Divrank: the interplay of prestige and diversity in information networks. In: KDD
35.
Zurück zum Zitat Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: KDD, pp 581–586 Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: KDD, pp 581–586
36.
Zurück zum Zitat Thomas L, Valluri S, Karlapalem K (2006) Margin: maximal frequent subgraph mining. In: ICDM, pp 1097–1101 Thomas L, Valluri S, Karlapalem K (2006) Margin: maximal frequent subgraph mining. In: ICDM, pp 1097–1101
37.
Zurück zum Zitat Krishnan A, Padmanabhan D, Ranu S, Mehta S (2016) Select, link and rank: diversified query expansion and entity ranking using wikipedia. In: International conference on web information systems engineering, pp 157–173 Krishnan A, Padmanabhan D, Ranu S, Mehta S (2016) Select, link and rank: diversified query expansion and entity ranking using wikipedia. In: International conference on web information systems engineering, pp 157–173
Metadaten
Titel
Resling: a scalable and generic framework to mine top-k representative subgraph patterns
verfasst von
Dheepikaa Natarajan
Sayan Ranu
Publikationsdatum
08.11.2017
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 1/2018
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-017-1129-y

Weitere Artikel der Ausgabe 1/2018

Knowledge and Information Systems 1/2018 Zur Ausgabe