Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 5/2016

01.09.2016

A distributed approach for graph mining in massive networks

verfasst von: N. Talukder, M. J. Zaki

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 5/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a novel distributed algorithm for mining frequent subgraphs from a single, very large, labeled network. Our approach is the first distributed method to mine a massive input graph that is too large to fit in the memory of any individual compute node. The input graph thus has to be partitioned among the nodes, which can lead to potential false negatives. Furthermore, for scalable performance it is crucial to minimize the communication among the compute nodes. Our algorithm, DistGraph, ensures that there are no false negatives, and uses a set of optimizations and efficient collective communication operations to minimize information exchange. To our knowledge DistGraph is the first approach demonstrated to scale to graphs with over a billion vertices and edges. Scalability results on up to 2048 IBM Blue Gene/Q compute nodes, with 16 cores each, show very good speedup.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Afrati FN, Fotakis D, Ullman JD (2013) Enumerating subgraph instances using map-reduce. In: IEEE international conference on data engineering Afrati FN, Fotakis D, Ullman JD (2013) Enumerating subgraph instances using map-reduce. In: IEEE international conference on data engineering
Zurück zum Zitat Bhuiyan M, Al Hasan M (2015) An iterative mapreduce based frequent subgraph mining algorithm. IEEE Trans Knowl Data Eng 27(3):608–620CrossRef Bhuiyan M, Al Hasan M (2015) An iterative mapreduce based frequent subgraph mining algorithm. IEEE Trans Knowl Data Eng 27(3):608–620CrossRef
Zurück zum Zitat Bringmann B, Nijssen S (2008) What is frequent in a single graph? In: Pacific-Asia conference on advances in knowledge discovery and data mining Bringmann B, Nijssen S (2008) What is frequent in a single graph? In: Pacific-Asia conference on advances in knowledge discovery and data mining
Zurück zum Zitat Buehrer G, Parthasarathy S, Chen Y-K (2006) Adaptive parallel graph mining for cmp architectures. In: IEEE international conference on data mining Buehrer G, Parthasarathy S, Chen Y-K (2006) Adaptive parallel graph mining for cmp architectures. In: IEEE international conference on data mining
Zurück zum Zitat Elseidy M, Abdelhamid E, Skiadopoulos S, Kalnis P (2014) Grami: frequent subgraph and pattern mining in a single large graph. Proc VLDB Endow 7:517–528CrossRef Elseidy M, Abdelhamid E, Skiadopoulos S, Kalnis P (2014) Grami: frequent subgraph and pattern mining in a single large graph. Proc VLDB Endow 7:517–528CrossRef
Zurück zum Zitat Fatta GD, Berthold MR (2006) Dynamic load balancing for the distributed mining of molecular structures. IEEE Trans Parallel Distrib Syst 17(8):773–785CrossRef Fatta GD, Berthold MR (2006) Dynamic load balancing for the distributed mining of molecular structures. IEEE Trans Parallel Distrib Syst 17(8):773–785CrossRef
Zurück zum Zitat Hill S, Srichandan B, Sunderraman R (2012) An iterative mapreduce approach to frequent subgraph mining in biological datasets. In: ACM conference on bioinformatics, computational biology and biomedicine Hill S, Srichandan B, Sunderraman R (2012) An iterative mapreduce approach to frequent subgraph mining in biological datasets. In: ACM conference on bioinformatics, computational biology and biomedicine
Zurück zum Zitat Holder LB, Cook DJ (1993) Discovery of inexact concepts from structural data. IEEE Trans Knowl Data Eng 5(6):992–994CrossRef Holder LB, Cook DJ (1993) Discovery of inexact concepts from structural data. IEEE Trans Knowl Data Eng 5(6):992–994CrossRef
Zurück zum Zitat Huan J, Wang W, Prins J(2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: IEEE international conference on data mining Huan J, Wang W, Prins J(2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: IEEE international conference on data mining
Zurück zum Zitat Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Principles of data mining and knowledge discovery. LNCS vol. 1910. Springer, pp 13–23 Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Principles of data mining and knowledge discovery. LNCS vol. 1910. Springer, pp 13–23
Zurück zum Zitat Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefMATH Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20(1):359–392MathSciNetCrossRefMATH
Zurück zum Zitat Kessl R, Talukder N, Anchuri P, Zaki MJ (2014) Parallel graph mining with GPUs. Proceedings of the BigMine workshop (ACM SIGKDD), Journal of Machine Learning Research: conference and workshop proceedings, pp 36:1–16 Kessl R, Talukder N, Anchuri P, Zaki MJ (2014) Parallel graph mining with GPUs. Proceedings of the BigMine workshop (ACM SIGKDD), Journal of Machine Learning Research: conference and workshop proceedings, pp 36:1–16
Zurück zum Zitat Kimelfeld B, Kolaitis PG (2014) The complexity of mining maximal frequent subgraphs. ACM Trans Database Syst (TODS) 39(4):32MathSciNetCrossRef Kimelfeld B, Kolaitis PG (2014) The complexity of mining maximal frequent subgraphs. ACM Trans Database Syst (TODS) 39(4):32MathSciNetCrossRef
Zurück zum Zitat Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: IEEE international conference on data mining Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: IEEE international conference on data mining
Zurück zum Zitat Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Discov 11(3):243–271MathSciNetCrossRef Kuramochi M, Karypis G (2005) Finding frequent patterns in a large sparse graph. Data Min Knowl Discov 11(3):243–271MathSciNetCrossRef
Zurück zum Zitat Lin W, Xiao X, Ghinita G (2014) Large-scale frequent subgraph mining in mapreduce. In: IEEE international conference on data engineering Lin W, Xiao X, Ghinita G (2014) Large-scale frequent subgraph mining in mapreduce. In: IEEE international conference on data engineering
Zurück zum Zitat Liu Y, Jiang X, Chen H, Ma J, Zhang X (2009) Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network. In: Advanced parallel processing technologies, LNCS vol. 5737. Springer, pp 341–355 Liu Y, Jiang X, Chen H, Ma J, Zhang X (2009) Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network. In: Advanced parallel processing technologies, LNCS vol. 5737. Springer, pp 341–355
Zurück zum Zitat Lu W, Chen G, Tung A, Zhao F(2013) Efficiently extracting frequent subgraphs using mapreduce. In: IEEE international conference on big data Lu W, Chen G, Tung A, Zhao F(2013) Efficiently extracting frequent subgraphs using mapreduce. In: IEEE international conference on big data
Zurück zum Zitat Meinl T, Wörlein M, Fischer I, Philippsen M (2006) Mining molecular datasets on symmetric multiprocessor systems. In: IEEE international conference on systems, man and cybernetics, vol 2 Meinl T, Wörlein M, Fischer I, Philippsen M (2006) Mining molecular datasets on symmetric multiprocessor systems. In: IEEE international conference on systems, man and cybernetics, vol 2
Zurück zum Zitat Reinhardt S, Karypis G (2007) A multi-level parallel implementation of a program for finding frequent patterns in a large sparse graph. In: IEEE international parallel and distributed processing symposium Reinhardt S, Karypis G (2007) A multi-level parallel implementation of a program for finding frequent patterns in a large sparse graph. In: IEEE international parallel and distributed processing symposium
Zurück zum Zitat Shahrivari S, Jalili S (2015) Distributed discovery of frequent subgraphs of a network using MapReduce. Computing 97(11):1101–1120MathSciNetCrossRefMATH Shahrivari S, Jalili S (2015) Distributed discovery of frequent subgraphs of a network using MapReduce. Computing 97(11):1101–1120MathSciNetCrossRefMATH
Zurück zum Zitat Shao Y, Cui B, Chen L, Ma L, Yao J, Xu N (2014) Parallel subgraph listing in a large-scale graph. In: ACM SIGMOD international conference on management of data Shao Y, Cui B, Chen L, Ma L, Yao J, Xu N (2014) Parallel subgraph listing in a large-scale graph. In: ACM SIGMOD international conference on management of data
Zurück zum Zitat Sun Z, Wang H, Wang H, Shao B, Li J (2012) Efficient subgraph matching on billion node graphs. Proc VLDB Endow 5(9):788–799CrossRef Sun Z, Wang H, Wang H, Shao B, Li J (2012) Efficient subgraph matching on billion node graphs. Proc VLDB Endow 5(9):788–799CrossRef
Zurück zum Zitat Teixeira CHC, Fonseca AJ, Serafini M, Siganos G, Zaki MJ, Aboulnaga A (2015) Arabesque: a system for distributed graph pattern mining. In: 25th ACM symposium on operating systems principles Teixeira CHC, Fonseca AJ, Serafini M, Siganos G, Zaki MJ, Aboulnaga A (2015) Arabesque: a system for distributed graph pattern mining. In: 25th ACM symposium on operating systems principles
Zurück zum Zitat Ucar D, Asur S, Catalyurek U, Parthasarathy S (2006) Improving functional modularity in protein–protein interactions graphs using hub-induced subgraphs. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) Knowledge discovery in databases: PKDD 2006. Springer, Berlin, pp 371–382CrossRef Ucar D, Asur S, Catalyurek U, Parthasarathy S (2006) Improving functional modularity in protein–protein interactions graphs using hub-induced subgraphs. In: Fürnkranz J, Scheffer T, Spiliopoulou M (eds) Knowledge discovery in databases: PKDD 2006. Springer, Berlin, pp 371–382CrossRef
Zurück zum Zitat Wu B, Bai Y (2010) An efficient distributed subgraph mining algorithm in extreme large graphs. In: International conference on artificial intelligence and computational intelligence: part I Wu B, Bai Y (2010) An efficient distributed subgraph mining algorithm in extreme large graphs. In: International conference on artificial intelligence and computational intelligence: part I
Zurück zum Zitat Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: IEEE international conference on data mining Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: IEEE international conference on data mining
Zurück zum Zitat Yang G (2004) The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 344–353 Yang G (2004) The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 344–353
Metadaten
Titel
A distributed approach for graph mining in massive networks
verfasst von
N. Talukder
M. J. Zaki
Publikationsdatum
01.09.2016
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 5/2016
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-016-0466-x

Weitere Artikel der Ausgabe 5/2016

Data Mining and Knowledge Discovery 5/2016 Zur Ausgabe

Premium Partner