Skip to main content
Top
Published in: Cluster Computing 3/2015

01-09-2015

Community structure mining in big data social media networks with MapReduce

Authors: Songchang Jin, Wangqun Lin, Hong Yin, Shuqiang Yang, Aiping Li, Bo Deng

Published in: Cluster Computing | Issue 3/2015

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Social media networks are playing increasingly prominent role in people’s daily life. Community structure is one of the salient features of social media network and has been applied to practical applications, such as recommendation system and network marketing. With the rapid expansion of social media size and surge of tremendous amount of information, how to identify the communities in big data scenarios has become a challenge. Based on our previous work and the map equation (an equation from information theory for community mining), we develop a novel distributed community structure mining framework. In the framework, (1) we propose a new link information update method to try to avoid data writing related operations and try to speedup the process. (2) We use the local information from the nodes and their neighbors, instead of the pagerank, to calculate the probability distribution of the nodes. (3) We exclude the network partitioning process from our previous work and try to run the map equation directly on MapReduce. Empirical results on real-world social media networks and artificial networks show that the new framework outperforms our previous work and some well-known algorithms, such as Radetal, FastGN, in accuracy, velocity and scalability.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
3.
go back to reference Cambria, E., Rajagopal, D., Olsher, D., Das, D.: Big social data analysis. Big Data Comput. 401–414 (2013) Cambria, E., Rajagopal, D., Olsher, D., Das, D.: Big social data analysis. Big Data Comput. 401–414 (2013)
4.
go back to reference Chen, Y., Huang, C., Zhai, K.: Scalable community detection algorithm with MapReduce. Commun. ACM 53, 359–366 (2009) Chen, Y., Huang, C., Zhai, K.: Scalable community detection algorithm with MapReduce. Commun. ACM 53, 359–366 (2009)
5.
go back to reference Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)CrossRef Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)CrossRef
7.
go back to reference Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
9.
10.
go back to reference Gleiser, P.M., Danon, L.: Community structure in jazz. Adv. Complex Syst. 6(04), 565–573 (2003)CrossRef Gleiser, P.M., Danon, L.: Community structure in jazz. Adv. Complex Syst. 6(04), 565–573 (2003)CrossRef
11.
go back to reference Huffman, D.A.: A method for the construction of minimum redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)CrossRef Huffman, D.A.: A method for the construction of minimum redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)CrossRef
12.
13.
go back to reference Jin, S., Li, A., Yang, S., Lin, W., Deng, B., Li, S.: A MapReduce and information compression based social community structure mining method, IEEE 16th International Conference on Computational Science and Engineering (CSE), 2013, pp. 971–980. (2013) Jin, S., Li, A., Yang, S., Lin, W., Deng, B., Li, S.: A MapReduce and information compression based social community structure mining method, IEEE 16th International Conference on Computational Science and Engineering (CSE), 2013, pp. 971–980. (2013)
15.
go back to reference Kalyanaraman, R.A.: An efficient MapReduce algorithm for parallelizing large-scale graph clustering,In: ParGraph—Workshop on Parallel Algorithms and Software for Analysis of Massive Graphs, Held in conjunction with HiPC’11. Bengaluru, India (2011) Kalyanaraman, R.A.: An efficient MapReduce algorithm for parallelizing large-scale graph clustering,In: ParGraph—Workshop on Parallel Algorithms and Software for Analysis of Massive Graphs, Held in conjunction with HiPC’11. Bengaluru, India (2011)
16.
go back to reference Kernighan, B.W., Lin, S.: An efficient Heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–308 (1970)MATHCrossRef Kernighan, B.W., Lin, S.: An efficient Heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–308 (1970)MATHCrossRef
17.
go back to reference Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)CrossRef Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)CrossRef
18.
go back to reference Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80(5), 056117 (2009)CrossRef Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80(5), 056117 (2009)CrossRef
19.
go back to reference Leskovec, J., Lang, K. J., & Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on World wide web, 631–640 (2010) Leskovec, J., Lang, K. J., & Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on World wide web, 631–640 (2010)
20.
go back to reference MacQueen, J.: Some methods for classification and analysis of multivariate observations, In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. 1(14), 281–297 (1967) MacQueen, J.: Some methods for classification and analysis of multivariate observations, In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. 1(14), 281–297 (1967)
21.
go back to reference Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)CrossRef Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)CrossRef
22.
go back to reference Orman, G.K., Labatut, V., Cherifi, H.: Comparative evaluation of community detection algorithms: a topological approach. J. Stat. Mech. Theory Exp. 2012(08), P08001 (2012)CrossRef Orman, G.K., Labatut, V., Cherifi, H.: Comparative evaluation of community detection algorithms: a topological approach. J. Stat. Mech. Theory Exp. 2012(08), P08001 (2012)CrossRef
23.
go back to reference Pasco, R.C.: Source coding algorithms for fast data compression. Stanford University, Ph.D. dissertation (1976) Pasco, R.C.: Source coding algorithms for fast data compression. Stanford University, Ph.D. dissertation (1976)
24.
go back to reference Plantié, M., Michel, C.: Survey on Social Community Detection, Social Media Retrieval. Springer, London (2013) Plantié, M., Michel, C.: Survey on Social Community Detection, Social Media Retrieval. Springer, London (2013)
25.
go back to reference Pothen, A., Simon, H.D., Liou, K.P.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11(3), 430 (1990)MATHMathSciNetCrossRef Pothen, A., Simon, H.D., Liou, K.P.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11(3), 430 (1990)MATHMathSciNetCrossRef
26.
go back to reference Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. Proc. Natl. Acad. Sci. USA 101(9), 2658–2663 (2004)CrossRef Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. Proc. Natl. Acad. Sci. USA 101(9), 2658–2663 (2004)CrossRef
27.
go back to reference Riedy, E.J., Meyerhenke, H., Ediger, D., Bader, D.A.: Parallel community detection for massive graphs. In: Parallel Processing and Applied Mathematics, pp. 286–296. Springer, Berlin Heidelberg (2012) Riedy, E.J., Meyerhenke, H., Ediger, D., Bader, D.A.: Parallel community detection for massive graphs. In: Parallel Processing and Applied Mathematics, pp. 286–296. Springer, Berlin Heidelberg (2012)
28.
go back to reference Rosvall, M., Esquivel, A., Lancichinetti, A., West, J., Lambiotte, R.: Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 2014, doi:10.1038/ncomms5630 Rosvall, M., Esquivel, A., Lancichinetti, A., West, J., Lambiotte, R.: Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 2014, doi:10.​1038/​ncomms5630
29.
go back to reference Rosvall, M., Bergstrom, C.T.: An information-theoretic framework for resolving community structure in complex networks. Proc. Natl. Acad. Sci. 104(18), 7327–7331 (2007)CrossRef Rosvall, M., Bergstrom, C.T.: An information-theoretic framework for resolving community structure in complex networks. Proc. Natl. Acad. Sci. 104(18), 7327–7331 (2007)CrossRef
30.
go back to reference Rosvall, M., Bergstrom, C.T.: The map equation. Eur. Phys. J. Spec. Top. 178(1), 13–23 (2009)CrossRef Rosvall, M., Bergstrom, C.T.: The map equation. Eur. Phys. J. Spec. Top. 178(1), 13–23 (2009)CrossRef
31.
go back to reference Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mobile Comput. Commun. Rev. 5(1), 3–55 (2001)CrossRef Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mobile Comput. Commun. Rev. 5(1), 3–55 (2001)CrossRef
32.
33.
go back to reference Yang, B., Liu, D., Liu, J.: Discovering communities from social networks: methodologies and applications. In: Furht, B. (ed.) Handbook of Social Network Technologies and Applications, pp. 331–346. Springer, New York, USA (2010)CrossRef Yang, B., Liu, D., Liu, J.: Discovering communities from social networks: methodologies and applications. In: Furht, B. (ed.) Handbook of Social Network Technologies and Applications, pp. 331–346. Springer, New York, USA (2010)CrossRef
Metadata
Title
Community structure mining in big data social media networks with MapReduce
Authors
Songchang Jin
Wangqun Lin
Hong Yin
Shuqiang Yang
Aiping Li
Bo Deng
Publication date
01-09-2015
Publisher
Springer US
Published in
Cluster Computing / Issue 3/2015
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-015-0452-x

Other articles of this Issue 3/2015

Cluster Computing 3/2015 Go to the issue

Premium Partner