Top

Knowledge and Information Systems

Published in:

06-10-2017 | Regular Paper

Mining frequent subgraphs from tremendous amount of small graphs using MapReduce

Authors: Zhe Peng, Tongtong Wang, Wei Lu, Hao Huang, Xiaoyong Du, Feng Zhao, Anthony K. H. Tung

Published in: Knowledge and Information Systems | Issue 3/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Frequent subgraph mining from a tremendous amount of small graphs is a primitive operation for many data mining applications. Existing approaches mainly focus on centralized systems and suffer from the scalability issue. Consider the increasing volume of graph data and mining frequent subgraphs is a memory-intensive task, it is difficult to tackle this problem on a centralized machine efficiently. In this paper, we therefore propose an efficient and scalable solution, called MRFSE, using MapReduce. MRFSE adopts the breadth-first search strategy to iteratively extract frequent subgraphs, i.e., all frequent subgraphs with \(i+1\) edges are generated based on frequent subgraphs with i edges at the ith iteration. In our design, existing frequent subgraph mining techniques in centralized systems can be easily extended and integrated. More importantly, new frequent subgraphs are generated without performing any isomorphism test which is costly and imperative in existing frequent subgraph mining techniques. Besides, various optimization techniques are proposed to further reduce the communication and I/O cost. Extensive experiments conducted on our in-house clusters demonstrate the superiority of our proposed solution in terms of both scalability and efficiency.

previous article Overlapping community detection in social networks using coalitional games

next article Subspace histograms for outlier detection in linear time

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

In the remainder of this paper, an i-subgraph is referred to as a subgraph with i edges.

http://dtp.nci.nih.gov/docs/aids/aids_data.html.

http://www.cas.org.

For every \(g \in D\), we maintain all frequent i-subgraphs associated with the corresponding embeddings.

https://github.com/apache/giraph.

http://pubchem.ncbi.nlm.nih.gov.

Aridhi S, d’Orazio L, Maddouri M, Nguifo EM (2015) Density-based data partitioning strategy to approximate large-scale subgraph mining. Inf Syst 48:213–223CrossRef

Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28:235–242CrossRef

Bhuiyan M, Hasan MA (2015) An iterative mapreduce based frequent subgraph mining algorithm. IEEE Trans Knowl Data Eng 27(3):608–620CrossRef

Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: ICDM, pp 51–58

Chaoji V, Hasan MA, Salem S, Zaki MJ (2008) An integrated, generic approach to pattern mining: data mining template library. Data Min Knowl Discov 17(3):457–495MathSciNetCrossRef

Cheng J, Ke Y, Ng W (2009) Efficient query processing on graph databases. ACM Trans Database Syst 34(1):2CrossRef

Cheng J, Ke Y, Ng W, Lu A(2007) Fg-index: towards verification-free query processing on graph databases. In: SIGMOD conference, pp 857–872

Dean J, Ghemawat S (2004) MapReduce: simplified data processing on large clusters. In: OSDI, pp 137–150

Han J (2005) Data mining: concepts and techniques. Morgan Kaufmann Publishers Inc., San Francisco

10.

Hill S, Srichandan B, Sunderraman R (2012) An iterative mapreduce approach to frequent subgraph mining in biological datasets. In: BCB, pp 661–666

11.

Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: ICDM, pp 549–552

12.

Huan J, Wang W, Prins J, Yang J (2004) Spin: mining maximal frequent subgraphs from graph databases. In: KDD, pp 581–586

13.

Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: PKDD, pp 13–23

14.

Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30CrossRef

15.

Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: ICDM, pp 313–320

16.

Lin W, Xiao X, Ghinita G (2014) Large-scale frequent subgraph mining in mapreduce. In: IEEE 30th international conference on data engineering, Chicago, ICDE 2014, IL, USA, 31 March–4 April, pp 844–855

17.

Liu Y, Jiang X, Chen H, Ma J, Zhang X (2009) Mapreduce-based pattern finding algorithm applied in motif detection for prescription compatibility network. In: Advanced parallel processing technologies, 8th international symposium, APPT 2009, Rapperswil, Switzerland, Proceedings, 24–25 Aug, pp 341–355

18.

Lowe DG (2001) Local feature view clustering for 3D object recognition. In: CVPR, pp 682–688

19.

Lu W, Chen G, Tung AKH, Zhao F (2013) Efficiently extracting frequent subgraphs using mapreduce. In: Proceedings of the 2013 IEEE international conference on big data, Santa Clara, CA, USA, 6–9 Oct 2013, pp 639–647

20.

National library of medicine. http://chem.sis.nlm.nih.gov/chemidplus

21.

Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: KDD, pp 647–652

22.

Petrakis EGM, Faloutsos C (1997) Similarity searching in medical image databases. IEEE Trans Knowl Data Eng 9(3):435–447CrossRef

23.

Wang C, Wang W, Pei J, Zhu Y, Shi B (2004) Scalable mining of large disk-based graph databases. In: KDD, pp 316–325

24.

Yan X, Han J (2002) gspan: graph-based substructure pattern mining. In: ICDM, pp 721–724

25.

Yan X, Yu PS, Han J (2004) Graph indexing: a frequent structure-based approach. In: SIGMOD conference, pp 335–346

Title: Mining frequent subgraphs from tremendous amount of small graphs using MapReduce
Authors: Zhe Peng
Tongtong Wang
Wei Lu
Hao Huang
Xiaoyong Du
Feng Zhao
Anthony K. H. Tung
Publication date: 06-10-2017
Publisher: Springer London
Published in: Knowledge and Information Systems / Issue 3/2018
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-017-1104-7

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 3/2018

A time-aware trajectory embedding model for next-location recommendation

Subspace histograms for outlier detection in linear time

Differentially private multidimensional data publishing

Overlapping community detection in social networks using coalitional games

From location to location pattern privacy in location-based services

Localized user-driven topic discovery via boosted ensemble of nonnegative matrix factorization

Premium Partner