Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 4/2020

05.05.2020

Counting frequent patterns in large labeled graphs: a hypergraph-based approach

verfasst von: Jinghan Meng, Napath Pitaksirianan, Yi-Cheng Tu

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 4/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In recent years, the popularity of graph databases has grown rapidly. This paper focuses on single-graph as an effective model to represent information and its related graph mining techniques. In frequent pattern mining in a single-graph setting, there are two main problems: support measure and search scheme. In this paper, we propose a novel framework for designing support measures that brings together existing minimum-image-based and overlap-graph-based support measures. Our framework is built on the concept of occurrence/instance hypergraphs. Based on such, we are able to design a series of new support measures: minimum instance (MI) measure, and minimum vertex cover (MVC) measure, that combine the advantages of existing measures. More importantly, we show that the existing minimum-image-based support measure is an upper bound of the MI measure, which is also linear-time computable and results in counts that are close to number of instances of a pattern. We show that not only most major existing support measures and new measures proposed in this paper can be mapped into the new framework, but also they occupy different locations of the frequency spectrum. By taking advantage of the new framework, we discover that MVC can be approximated to a constant factor (in terms of number of pattern nodes) in polynomial time. In contrast to common belief, we demonstrate that the state-of-the-art overlap-graph-based maximum independent set (MIS) measure also has constant approximation algorithms. We further show that using standard linear programming and semidefinite programming techniques, polynomial-time relaxations for both MVC and MIS measures can be developed and their counts stand between MVC and MIS. In addition, we point out that MVC, MIS, and their relaxations are bounded within constant factor. In summary, all major support measures are unified in the new hypergraph-based framework which helps reveal their bounding relations and hardness properties.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
For that, we use the words frequency and support interchangeably in this paper. We also use the word support and the phrase support measure in the same way.
 
2
In this paper, following conventions of this field, computing time of support measures does not include that for constructing the framework (e.g., overlap graph in the MIS case).
 
Literatur
Zurück zum Zitat Bringmann B, Nijssen S (2008) What is frequent in a single graph? In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 858–863 Bringmann B, Nijssen S (2008) What is frequent in a single graph? In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 858–863
Zurück zum Zitat Calders T, Ramon J, Van yck D (2008) Anti-monotonic overlap-graph support measures. In: 2008 eighth IEEE international conference on data mining. IEEE, pp 73–82 Calders T, Ramon J, Van yck D (2008) Anti-monotonic overlap-graph support measures. In: 2008 eighth IEEE international conference on data mining. IEEE, pp 73–82
Zurück zum Zitat Chan YH, Lau LC (2010) On linear and semidefinite programming relaxations for hypergraph matching. In: Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, pp 1500–1511 Chan YH, Lau LC (2010) On linear and semidefinite programming relaxations for hypergraph matching. In: Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, pp 1500–1511
Zurück zum Zitat Cygan M (2013) Improved approximation for 3-dimensional matching via bounded pathwidth local search. In: 2013 IEEE 54th annual symposium on foundations of computer science (FOCS). IEEE, pp 509–518 Cygan M (2013) Improved approximation for 3-dimensional matching via bounded pathwidth local search. In: 2013 IEEE 54th annual symposium on foundations of computer science (FOCS). IEEE, pp 509–518
Zurück zum Zitat Elseidy M, Abdelhamid E, Skiadopoulos S, Kalnis P (2014) Grami: frequent subgraph and pattern mining in a single large graph. Proc VLDB Endow 7(7):517–528CrossRef Elseidy M, Abdelhamid E, Skiadopoulos S, Kalnis P (2014) Grami: frequent subgraph and pattern mining in a single large graph. Proc VLDB Endow 7(7):517–528CrossRef
Zurück zum Zitat Fiedler M, Borgelt C (2007) Support computation for mining frequent subgraphs in a single graph. In: MLG, Citeseer Fiedler M, Borgelt C (2007) Support computation for mining frequent subgraphs in a single graph. In: MLG, Citeseer
Zurück zum Zitat Füredi Z, Kahn J, Seymour PD (1993) On the fractional matching polytope of a hypergraph. Combinatorica 13(2):167–180MathSciNetCrossRef Füredi Z, Kahn J, Seymour PD (1993) On the fractional matching polytope of a hypergraph. Combinatorica 13(2):167–180MathSciNetCrossRef
Zurück zum Zitat Hong M, Zhou H, Wang W, Shi B (2003) An efficient algorithm of frequent connected subgraph extraction. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 40–51 Hong M, Zhou H, Wang W, Shi B (2003) An efficient algorithm of frequent connected subgraph extraction. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 40–51
Zurück zum Zitat Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: Third IEEE international conference on data mining, 2003. ICDM 2003. IEEE, pp 549–552 Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: Third IEEE international conference on data mining, 2003. ICDM 2003. IEEE, pp 549–552
Zurück zum Zitat IBM (2011) IBM ILOG CPLEX optimization studio CPLEX user’s manual IBM (2011) IBM ILOG CPLEX optimization studio CPLEX user’s manual
Zurück zum Zitat Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: mining graph data. Mach Learn 50(3):321–354CrossRef Inokuchi A, Washio T, Motoda H (2003) Complete mining of frequent patterns from graphs: mining graph data. Mach Learn 50(3):321–354CrossRef
Zurück zum Zitat Karp RM (1972) Reducibility among combinatorial problems. In: Miller R (ed) Complexity of computer computations. Springer, New York, pp 85–103CrossRef Karp RM (1972) Reducibility among combinatorial problems. In: Miller R (ed) Complexity of computer computations. Springer, New York, pp 85–103CrossRef
Zurück zum Zitat Kuramochi M, Karypis G (2004a) An efficient algorithm for discovering frequent subgraphs. IEEE Trans Knowl Data Eng 16(9):1038–1051CrossRef Kuramochi M, Karypis G (2004a) An efficient algorithm for discovering frequent subgraphs. IEEE Trans Knowl Data Eng 16(9):1038–1051CrossRef
Zurück zum Zitat Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Discov 11(3):243–271MathSciNetCrossRef Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Discov 11(3):243–271MathSciNetCrossRef
Zurück zum Zitat Kuramochi M, Karypis G (2004b) Grew-a scalable frequent subgraph discovery algorithm. In: Fourth IEEE international conference on data mining, 2004, ICDM’04. IEEE, pp 439–442 Kuramochi M, Karypis G (2004b) Grew-a scalable frequent subgraph discovery algorithm. In: Fourth IEEE international conference on data mining, 2004, ICDM’04. IEEE, pp 439–442
Zurück zum Zitat Meng J, Tu Yc (2017) Flexible and feasible support measures for mining frequent patterns in large labeled graphs. In: Proceedings of the 2017 ACM international conference on management of data. ACM, New York, SIGMOD ’17, pp 391–402. https://doi.org/10.1145/3035918.3035936 Meng J, Tu Yc (2017) Flexible and feasible support measures for mining frequent patterns in large labeled graphs. In: Proceedings of the 2017 ACM international conference on management of data. ACM, New York, SIGMOD ’17, pp 391–402. https://​doi.​org/​10.​1145/​3035918.​3035936
Zurück zum Zitat Pach J, Agarwal PK (2011) Combinatorial geometry, vol 37. Wiley, New YorkMATH Pach J, Agarwal PK (2011) Combinatorial geometry, vol 37. Wiley, New YorkMATH
Zurück zum Zitat Talukder N, Zaki MJ (2016) A distributed approach for graph mining in massive networks. Data Min Knowl Discov 30(5):1024–1052MathSciNetCrossRef Talukder N, Zaki MJ (2016) A distributed approach for graph mining in massive networks. Data Min Knowl Discov 30(5):1024–1052MathSciNetCrossRef
Zurück zum Zitat Vanetik N, Gudes E, Shimony SE (2002) Computing frequent graph patterns from semistructured data. In: Proceedings of the 2002 IEEE international conference on data mining. IEEE Computer Society, Washington, ICDM ’02, pp 458–465 Vanetik N, Gudes E, Shimony SE (2002) Computing frequent graph patterns from semistructured data. In: Proceedings of the 2002 IEEE international conference on data mining. IEEE Computer Society, Washington, ICDM ’02, pp 458–465
Zurück zum Zitat Wang Y, Ramon J, Fannes T (2013) An efficiently computable subgraph pattern support measure: counting independent observations. Data Min Knowl Discov 27(3):444–477MathSciNetCrossRef Wang Y, Ramon J, Fannes T (2013) An efficiently computable subgraph pattern support measure: counting independent observations. Data Min Knowl Discov 27(3):444–477MathSciNetCrossRef
Zurück zum Zitat Wang Y, Ramon J (2012) An efficiently computable support measure for frequent subgraph pattern mining. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 362–377 Wang Y, Ramon J (2012) An efficiently computable support measure for frequent subgraph pattern mining. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 362–377
Zurück zum Zitat Yan X, Han J (2003) Closegraph: mining closed frequent graph patterns. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 286–295 Yan X, Han J (2003) Closegraph: mining closed frequent graph patterns. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 286–295
Metadaten
Titel
Counting frequent patterns in large labeled graphs: a hypergraph-based approach
verfasst von
Jinghan Meng
Napath Pitaksirianan
Yi-Cheng Tu
Publikationsdatum
05.05.2020
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 4/2020
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-020-00686-9

Weitere Artikel der Ausgabe 4/2020

Data Mining and Knowledge Discovery 4/2020 Zur Ausgabe

Premium Partner