Skip to main content
Top

2019 | OriginalPaper | Chapter

Prefix Tree Based MapReduce Approach for Mining Frequent Subgraphs

Authors : Supriya Movva, Saketh Prata, Sai Sampath, R. G. Gayathri

Published in: Ubiquitous Communications and Network Computing

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The frequent subgraphs are the subgraphs which appear in a number, more than or equal to a user-defined threshold. Many algorithms assume that the apriori based approach yields an efficient result for finding frequent subgraphs, but in our research, we found out that Apriori algorithm lacks scalability with the main memory. Frequent subgraph mining using Apriori algorithm with FS tree uses adjacency list representation. FS tree is a prefix tree data structure. It implements the algorithm in two phases. In the first phase, it uses the Apriori algorithm to find frequent two edge subgraphs. In the second phase, it uses FS-tree algorithm to search all the frequent subgraphs from frequent two edge subgraphs. Scanning the dataset for every candidate is the drawback of the Apriori algorithm, so the Apriori algorithm with FS-tree is used to overcome the multiple scanning. This algorithm is also implemented in an assumption that the data set fits well in memory. In this paper, we propose parallel map-reduce based frequent subgraph mining technique performed in a distributed environment on the Hadoop framework. The experiments validate the efficiency of the algorithm for generating frequent subgraphs in large graph datasets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Barabási, A., Oltvai, Z.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004)CrossRef Barabási, A., Oltvai, Z.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 (2004)CrossRef
2.
go back to reference Lacroix, V., Fernandes, C., Sagot, M.-F.: Motif search in graphs: pplication to metabolic networks. Trans. Comput. Biol. Bioinform. 3, 360–368 (2006)CrossRef Lacroix, V., Fernandes, C., Sagot, M.-F.: Motif search in graphs: pplication to metabolic networks. Trans. Comput. Biol. Bioinform. 3, 360–368 (2006)CrossRef
3.
go back to reference Borgelt, C., Berhold, M.R.: Mining molecular fragments: finding relevant substructures of molecules. In: Proceedings of International Conference on Data Mining 2002 (2002) Borgelt, C., Berhold, M.R.: Mining molecular fragments: finding relevant substructures of molecules. In: Proceedings of International Conference on Data Mining 2002 (2002)
4.
go back to reference Handcock, M., Raftery, A., Tantrum, J.: Model-based clustering for social networks. J. R. Stat. Soc. Ser. (Stat. Soc.) 170(2), 301–354 (2007)MathSciNetCrossRef Handcock, M., Raftery, A., Tantrum, J.: Model-based clustering for social networks. J. R. Stat. Soc. Ser. (Stat. Soc.) 170(2), 301–354 (2007)MathSciNetCrossRef
5.
go back to reference Kuramochi,M., Karypis, G.: Frequent subgraph discovery. In: ICDM01. FSM (2001) Kuramochi,M., Karypis, G.: Frequent subgraph discovery. In: ICDM01. FSM (2001)
6.
go back to reference Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. Artif. Intell. Res. 1, 231–255 (1994). 3rd edCrossRef Cook, D.J., Holder, L.B.: Substructure discovery using minimum description length and background knowledge. J. Artif. Intell. Res. 1, 231–255 (1994). 3rd edCrossRef
7.
go back to reference Praveena, A., Anitha, B., Rohini, R.: An efficient parallel iterative mapreduce based frequent subgraph mining algorithm. Middle-East J. Sci. Res. 24 (Tech. Algorithms Emerg. Technol.), 524–531 (2016) Praveena, A., Anitha, B., Rohini, R.: An efficient parallel iterative mapreduce based frequent subgraph mining algorithm. Middle-East J. Sci. Res. 24 (Tech. Algorithms Emerg. Technol.), 524–531 (2016)
9.
go back to reference Vanetik, N., et al.: Computing frequent graph patterns from semi structured data. In: Proceedings 2002 IEEE International Conference on Data Mining, ICDM-2002 (2002) Vanetik, N., et al.: Computing frequent graph patterns from semi structured data. In: Proceedings 2002 IEEE International Conference on Data Mining, ICDM-2002 (2002)
10.
go back to reference Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraph in the presence of isomorphism. UNC computer science Technique report TR03-021 (2003). FFSM Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraph in the presence of isomorphism. UNC computer science Technique report TR03-021 (2003). FFSM
11.
go back to reference Nguyen, S.N., Orlowska, M.E., Li, X.: Graph mining based on a data partitioning. In: Nineteenth Australasian Database Conference (ADC 2008) (2008) Nguyen, S.N., Orlowska, M.E., Li, X.: Graph mining based on a data partitioning. In: Nineteenth Australasian Database Conference (ADC 2008) (2008)
12.
go back to reference Bhuvaneswari, M., Rohini, R., Preetha, B.: A survey on privacy preserving public auditing for secure data storage. Int. J. Eng. Res. Technol. (2013) Bhuvaneswari, M., Rohini, R., Preetha, B.: A survey on privacy preserving public auditing for secure data storage. Int. J. Eng. Res. Technol. (2013)
13.
go back to reference Huan, J., Wang, W., Prins, J., Yang, J.: Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 581–586 (2004) Huan, J., Wang, W., Prins, J., Yang, J.: Spin: mining maximal frequent subgraphs from graph databases. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 581–586 (2004)
14.
go back to reference Hsieh, H.-P., Li, C.-T.: Mining temporal subgraph patterns in heterogeneous information networks. In: IEEE International Conference on Social Computing/IEEE International Conference on Privacy, Security, Risk and Trust (2010) Hsieh, H.-P., Li, C.-T.: Mining temporal subgraph patterns in heterogeneous information networks. In: IEEE International Conference on Social Computing/IEEE International Conference on Privacy, Security, Risk and Trust (2010)
15.
go back to reference Thomas, S., Nair, J.J.: Improvised Apriori with frequent subgraph tree for extracting frequent subgraphs. J. Intell. Fuzzy Syst. 32(4), 3209–3219 (2017)CrossRef Thomas, S., Nair, J.J.: Improvised Apriori with frequent subgraph tree for extracting frequent subgraphs. J. Intell. Fuzzy Syst. 32(4), 3209–3219 (2017)CrossRef
16.
go back to reference Yan, X., Han, J.: gSpan: graph based sustructure pattern mining. In: Proceedings of 2nd IEEE International Conference on Data Mining, ICDM 2002 (2002) Yan, X., Han, J.: gSpan: graph based sustructure pattern mining. In: Proceedings of 2nd IEEE International Conference on Data Mining, ICDM 2002 (2002)
17.
go back to reference Thomas, S., Nair, J.J.: A survey on extracting frequent subgraphs. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI-2016) (2016) Thomas, S., Nair, J.J.: A survey on extracting frequent subgraphs. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI-2016) (2016)
18.
go back to reference Jeong, B.S., Choi, H.J., Hossain, M.A., Rashid, M.M., Karim, M.R.: A MapReduce framework for mining maximal contiguous frequent patterns in large DNA sequence datasets. IETE Tech. Rev. 29, 162–168 (2012)CrossRef Jeong, B.S., Choi, H.J., Hossain, M.A., Rashid, M.M., Karim, M.R.: A MapReduce framework for mining maximal contiguous frequent patterns in large DNA sequence datasets. IETE Tech. Rev. 29, 162–168 (2012)CrossRef
19.
go back to reference Hill, S., Srichandan, B., Sunderraman, R.: An iterative mapreduce approach to frequent subgraph mining in biological datasets. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (2012) Hill, S., Srichandan, B., Sunderraman, R.: An iterative mapreduce approach to frequent subgraph mining in biological datasets. In: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (2012)
21.
go back to reference Gayathri, S., Radhika, N.: Greedy hop algorithm for detecting shortest path in vehicular networks. Int. J. Control. Theory Appl. 9, 1125–1133 (2016) Gayathri, S., Radhika, N.: Greedy hop algorithm for detecting shortest path in vehicular networks. Int. J. Control. Theory Appl. 9, 1125–1133 (2016)
22.
23.
go back to reference Di Fatta, G., Berthold, M.: Dynamic load balancing for the distributed mining of molecular structures. IEEE Trans. Parallel Distrib. Syst. 17, 773–785 (2006)CrossRef Di Fatta, G., Berthold, M.: Dynamic load balancing for the distributed mining of molecular structures. IEEE Trans. Parallel Distrib. Syst. 17, 773–785 (2006)CrossRef
24.
go back to reference Lin, J., Dyer, C.: Data-intensive text processing with MapReduce (2010) Lin, J., Dyer, C.: Data-intensive text processing with MapReduce (2010)
25.
go back to reference Gayathri, R., Nair, J.J.: ex-FTCD: a novel mapreduce model for distributed multi source shortest path problem. J. Intell. Fuzzy Syst. 34(3), 16431652 (2018) Gayathri, R., Nair, J.J.: ex-FTCD: a novel mapreduce model for distributed multi source shortest path problem. J. Intell. Fuzzy Syst. 34(3), 16431652 (2018)
Metadata
Title
Prefix Tree Based MapReduce Approach for Mining Frequent Subgraphs
Authors
Supriya Movva
Saketh Prata
Sai Sampath
R. G. Gayathri
Copyright Year
2019
DOI
https://doi.org/10.1007/978-3-030-20615-4_17

Premium Partner