Skip to main content
Top
Published in: Social Network Analysis and Mining 4/2013

01-12-2013 | Original Article

Mining and generating large-scaled social networks via MapReduce

Authors: Yi-Chen Lo, Hung-Che Lai, Cheng-Te Li, Shou-De Lin

Published in: Social Network Analysis and Mining | Issue 4/2013

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The computational efficiency is usually a concern when dealing with large-scale social network mining tasks containing billions of entities. Cloud computing is widely regarded as a feasible solution to this problem. In this work, we present an open-source graph mining library called the MapReduce Graph Mining Framework (MGMF) to be a robust and efficient MapReduce-based graph mining tool. We start from dividing graph mining algorithms into several categories and design a MapReduce framework for algorithms in each category. The experimental results show that MGMF is 3–20 times more efficient than PEGASUS, a state-of-the-art library for graph mining on MapReduce. Moreover, it provides broader coverage of a variety of graph mining algorithms. Furthermore, we designed a model to generate large-scale social networks capturing the power-law degree distribution property by parallelizing the mechanism of preferential attachment so that it is possible to produce billion-sized scale-free network in minutes. Our implemented open-source library can be downloaded from http://​mslab.​csie.​ntu.​edu.​tw/​~noahsark/​MGMF/​.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Footnotes
1
Note that, the distributed implementation of R-MAT in parallel BGL is not described in the corresponding paper. We obtained knowledge about their parallelization schema by tracing their open-source code.
 
Literature
go back to reference Albert R, Barabási A-L (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47–97CrossRefMATH Albert R, Barabási A-L (2002) Statistical mechanics of complex networks. Rev Mod Phys 74:47–97CrossRefMATH
go back to reference Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177CrossRefMATH Brandes U (2001) A faster algorithm for betweenness centrality. J Math Sociol 25(2):163–177CrossRefMATH
go back to reference Chakrabarti D, Zhan Y, Faloutsos C (2004) R-mat: a recursive model for graph mining. In: Proceedings of SDM, 2004 Chakrabarti D, Zhan Y, Faloutsos C (2004) R-mat: a recursive model for graph mining. In: Proceedings of SDM, 2004
go back to reference Chen R, Weng X, He B, Yang M, Choi B, Li X (2010) On the Efficiency and Programmability of Large Graph Processing in the Cloud, Microsoft Research TechReport Chen R, Weng X, He B, Yang M, Choi B, Li X (2010) On the Efficiency and Programmability of Large Graph Processing in the Cloud, Microsoft Research TechReport
go back to reference Cohen J (2009) Graph twiddling in a mapreduce world. Comput Sci Eng 11(4):29–41CrossRef Cohen J (2009) Graph twiddling in a mapreduce world. Comput Sci Eng 11(4):29–41CrossRef
go back to reference Cormen T (2001) Introduction to algorithms. The MIT press, MassachusettsMATH Cormen T (2001) Introduction to algorithms. The MIT press, MassachusettsMATH
go back to reference Dean J, Ghemawat S (2004) Mapreduce: Simplified data processing on large clusters. In: Proceedings of OSDI, 2004, pp 137–150 Dean J, Ghemawat S (2004) Mapreduce: Simplified data processing on large clusters. In: Proceedings of OSDI, 2004, pp 137–150
go back to reference Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRef
go back to reference Erdős P, Rényi A (1960) On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Academy of Science, Hungary, pp 17–61 Erdős P, Rényi A (1960) On the evolution of random graphs. Publication of the Mathematical Institute of the Hungarian Academy of Science, Hungary, pp 17–61
go back to reference Ghemawat S, Gobioff H, Leung S-T (2003) The google file system. In: Proceedings of the nineteenth ACM symposium on Operating systems principles (SOSP ‘03), ACM, New York, 2003, pp 29–43 Ghemawat S, Gobioff H, Leung S-T (2003) The google file system. In: Proceedings of the nineteenth ACM symposium on Operating systems principles (SOSP ‘03), ACM, New York, 2003, pp 29–43
go back to reference Gregor D, Lumsdaine A (2005) The parallel BGL: a generic library for distributed graph computations. In: Parallel Object-Oriented Scientific Computing (POOSC) Gregor D, Lumsdaine A (2005) The parallel BGL: a generic library for distributed graph computations. In: Parallel Object-Oriented Scientific Computing (POOSC)
go back to reference Kang U, Tsourakakis CE, Faloutsos C (2009) Pegasus: A peta-scale graph mining system. In: Proceedings of IEEE International Conference on Data Mining, pp 229–238 Kang U, Tsourakakis CE, Faloutsos C (2009) Pegasus: A peta-scale graph mining system. In: Proceedings of IEEE International Conference on Data Mining, pp 229–238
go back to reference Kang U, Tsourakakis C, Appel AP, Faloutsos C, Leskovec J (2011) Radius plots for mining tera-byte scale graphs: algorithms, patterns, and observations. In: Proceedings of SIAM International Conference on Data Mining Kang U, Tsourakakis C, Appel AP, Faloutsos C, Leskovec J (2011) Radius plots for mining tera-byte scale graphs: algorithms, patterns, and observations. In: Proceedings of SIAM International Conference on Data Mining
go back to reference Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th international conference on World wide web, ACM, 2010, pp 591–600 Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th international conference on World wide web, ACM, 2010, pp 591–600
go back to reference Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C (2005) Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication. In: Proceedings of Knowledge Discovery in Databases: PKDD 2005, pp 133–145 Leskovec J, Chakrabarti D, Kleinberg J, Faloutsos C (2005) Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication. In: Proceedings of Knowledge Discovery in Databases: PKDD 2005, pp 133–145
go back to reference Lin J, Dyer C (2009) Data-intensive text processing with MapReduce. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts, Association for Computational Linguistics, pp 1–2 Lin J, Dyer C (2009) Data-intensive text processing with MapReduce. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts, Association for Computational Linguistics, pp 1–2
go back to reference Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: A system for large-scale graph processing. In: Proceedings of the 36th SIGMOD international conference on Management of data (SIGMOD’10), ACM, New York, 2010 Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: A system for large-scale graph processing. In: Proceedings of the 36th SIGMOD international conference on Management of data (SIGMOD’10), ACM, New York, 2010
go back to reference Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: Bringing order to the web., Technical Report 1999-66, Stanford InfoLab Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: Bringing order to the web., Technical Report 1999-66, Stanford InfoLab
go back to reference Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Mateo Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Mateo
go back to reference Romik D (2000) Stirling’s approximation for n!: the ultimate short proof? Am Math Mon 107(6):556–557 Romik D (2000) Stirling’s approximation for n!: the ultimate short proof? Am Math Mon 107(6):556–557
go back to reference Schank T (2007) Algorithmic aspects of triangle-based network analysis. Ph. d. thesis, University Karlsruhe, Karlsruhe Schank T (2007) Algorithmic aspects of triangle-based network analysis. Ph. d. thesis, University Karlsruhe, Karlsruhe
go back to reference Scott J (2000) Social network analysis: A Handbook, 2nd edn. Sage Publications, New York Scott J (2000) Social network analysis: A Handbook, 2nd edn. Sage Publications, New York
go back to reference Tong H, Faloutsos C, Pan J-Y (2006) Fast random walk with restart and its applications. In: Proceedings of the Sixth International Conference on Data Mining (ICDM ‘06), IEEE Computer Society, Washington, 2006, pp 613–622 Tong H, Faloutsos C, Pan J-Y (2006) Fast random walk with restart and its applications. In: Proceedings of the Sixth International Conference on Data Mining (ICDM ‘06), IEEE Computer Society, Washington, 2006, pp 613–622
go back to reference Wallis J (1969) Computation of π by successive interpolations. In: Struik DJ (ed) A source book in mathematics, 1200–1800. Harvard University Press, Cambridge, MA, pp 224–253 Wallis J (1969) Computation of π by successive interpolations. In: Struik DJ (ed) A source book in mathematics, 1200–1800. Harvard University Press, Cambridge, MA, pp 224–253
go back to reference Watts D, Strogatz S (1998) Collective dynamics of small-world networks. Nature 363:202–204 Watts D, Strogatz S (1998) Collective dynamics of small-world networks. Nature 363:202–204
go back to reference Yoo A, Henderson KW (2010) Parallel generation of massive scale-free graphs. In: Proceedings of CoRR, 2010 Yoo A, Henderson KW (2010) Parallel generation of massive scale-free graphs. In: Proceedings of CoRR, 2010
Metadata
Title
Mining and generating large-scaled social networks via MapReduce
Authors
Yi-Chen Lo
Hung-Che Lai
Cheng-Te Li
Shou-De Lin
Publication date
01-12-2013
Publisher
Springer Vienna
Published in
Social Network Analysis and Mining / Issue 4/2013
Print ISSN: 1869-5450
Electronic ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-013-0124-x

Other articles of this Issue 4/2013

Social Network Analysis and Mining 4/2013 Go to the issue

Premium Partner