Skip to main content
Erschienen in: Cluster Computing 3/2015

01.09.2015

Community structure mining in big data social media networks with MapReduce

verfasst von: Songchang Jin, Wangqun Lin, Hong Yin, Shuqiang Yang, Aiping Li, Bo Deng

Erschienen in: Cluster Computing | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Social media networks are playing increasingly prominent role in people’s daily life. Community structure is one of the salient features of social media network and has been applied to practical applications, such as recommendation system and network marketing. With the rapid expansion of social media size and surge of tremendous amount of information, how to identify the communities in big data scenarios has become a challenge. Based on our previous work and the map equation (an equation from information theory for community mining), we develop a novel distributed community structure mining framework. In the framework, (1) we propose a new link information update method to try to avoid data writing related operations and try to speedup the process. (2) We use the local information from the nodes and their neighbors, instead of the pagerank, to calculate the probability distribution of the nodes. (3) We exclude the network partitioning process from our previous work and try to run the map equation directly on MapReduce. Empirical results on real-world social media networks and artificial networks show that the new framework outperforms our previous work and some well-known algorithms, such as Radetal, FastGN, in accuracy, velocity and scalability.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Cambria, E., Rajagopal, D., Olsher, D., Das, D.: Big social data analysis. Big Data Comput. 401–414 (2013) Cambria, E., Rajagopal, D., Olsher, D., Das, D.: Big social data analysis. Big Data Comput. 401–414 (2013)
4.
Zurück zum Zitat Chen, Y., Huang, C., Zhai, K.: Scalable community detection algorithm with MapReduce. Commun. ACM 53, 359–366 (2009) Chen, Y., Huang, C., Zhai, K.: Scalable community detection algorithm with MapReduce. Commun. ACM 53, 359–366 (2009)
5.
Zurück zum Zitat Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)CrossRef Clauset, A., Newman, M.E., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70(6), 066111 (2004)CrossRef
7.
Zurück zum Zitat Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
9.
Zurück zum Zitat Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)MATHMathSciNetCrossRef Girvan, M., Newman, M.E.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)MATHMathSciNetCrossRef
10.
Zurück zum Zitat Gleiser, P.M., Danon, L.: Community structure in jazz. Adv. Complex Syst. 6(04), 565–573 (2003)CrossRef Gleiser, P.M., Danon, L.: Community structure in jazz. Adv. Complex Syst. 6(04), 565–573 (2003)CrossRef
11.
Zurück zum Zitat Huffman, D.A.: A method for the construction of minimum redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)CrossRef Huffman, D.A.: A method for the construction of minimum redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)CrossRef
12.
Zurück zum Zitat Ihara, S.: Information Theory for Continuous Systems. World Scientific, Singapore (1993)MATHCrossRef Ihara, S.: Information Theory for Continuous Systems. World Scientific, Singapore (1993)MATHCrossRef
13.
Zurück zum Zitat Jin, S., Li, A., Yang, S., Lin, W., Deng, B., Li, S.: A MapReduce and information compression based social community structure mining method, IEEE 16th International Conference on Computational Science and Engineering (CSE), 2013, pp. 971–980. (2013) Jin, S., Li, A., Yang, S., Lin, W., Deng, B., Li, S.: A MapReduce and information compression based social community structure mining method, IEEE 16th International Conference on Computational Science and Engineering (CSE), 2013, pp. 971–980. (2013)
15.
Zurück zum Zitat Kalyanaraman, R.A.: An efficient MapReduce algorithm for parallelizing large-scale graph clustering,In: ParGraph—Workshop on Parallel Algorithms and Software for Analysis of Massive Graphs, Held in conjunction with HiPC’11. Bengaluru, India (2011) Kalyanaraman, R.A.: An efficient MapReduce algorithm for parallelizing large-scale graph clustering,In: ParGraph—Workshop on Parallel Algorithms and Software for Analysis of Massive Graphs, Held in conjunction with HiPC’11. Bengaluru, India (2011)
16.
Zurück zum Zitat Kernighan, B.W., Lin, S.: An efficient Heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–308 (1970)MATHCrossRef Kernighan, B.W., Lin, S.: An efficient Heuristic procedure for partitioning graphs. Bell Syst. Tech. J. 49(2), 291–308 (1970)MATHCrossRef
17.
Zurück zum Zitat Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)CrossRef Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)CrossRef
18.
Zurück zum Zitat Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80(5), 056117 (2009)CrossRef Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E 80(5), 056117 (2009)CrossRef
19.
Zurück zum Zitat Leskovec, J., Lang, K. J., & Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on World wide web, 631–640 (2010) Leskovec, J., Lang, K. J., & Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th international conference on World wide web, 631–640 (2010)
20.
Zurück zum Zitat MacQueen, J.: Some methods for classification and analysis of multivariate observations, In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. 1(14), 281–297 (1967) MacQueen, J.: Some methods for classification and analysis of multivariate observations, In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. 1(14), 281–297 (1967)
21.
Zurück zum Zitat Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)CrossRef Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 066133 (2004)CrossRef
22.
Zurück zum Zitat Orman, G.K., Labatut, V., Cherifi, H.: Comparative evaluation of community detection algorithms: a topological approach. J. Stat. Mech. Theory Exp. 2012(08), P08001 (2012)CrossRef Orman, G.K., Labatut, V., Cherifi, H.: Comparative evaluation of community detection algorithms: a topological approach. J. Stat. Mech. Theory Exp. 2012(08), P08001 (2012)CrossRef
23.
Zurück zum Zitat Pasco, R.C.: Source coding algorithms for fast data compression. Stanford University, Ph.D. dissertation (1976) Pasco, R.C.: Source coding algorithms for fast data compression. Stanford University, Ph.D. dissertation (1976)
24.
Zurück zum Zitat Plantié, M., Michel, C.: Survey on Social Community Detection, Social Media Retrieval. Springer, London (2013) Plantié, M., Michel, C.: Survey on Social Community Detection, Social Media Retrieval. Springer, London (2013)
25.
Zurück zum Zitat Pothen, A., Simon, H.D., Liou, K.P.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11(3), 430 (1990)MATHMathSciNetCrossRef Pothen, A., Simon, H.D., Liou, K.P.: Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11(3), 430 (1990)MATHMathSciNetCrossRef
26.
Zurück zum Zitat Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. Proc. Natl. Acad. Sci. USA 101(9), 2658–2663 (2004)CrossRef Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. Proc. Natl. Acad. Sci. USA 101(9), 2658–2663 (2004)CrossRef
27.
Zurück zum Zitat Riedy, E.J., Meyerhenke, H., Ediger, D., Bader, D.A.: Parallel community detection for massive graphs. In: Parallel Processing and Applied Mathematics, pp. 286–296. Springer, Berlin Heidelberg (2012) Riedy, E.J., Meyerhenke, H., Ediger, D., Bader, D.A.: Parallel community detection for massive graphs. In: Parallel Processing and Applied Mathematics, pp. 286–296. Springer, Berlin Heidelberg (2012)
28.
Zurück zum Zitat Rosvall, M., Esquivel, A., Lancichinetti, A., West, J., Lambiotte, R.: Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 2014, doi:10.1038/ncomms5630 Rosvall, M., Esquivel, A., Lancichinetti, A., West, J., Lambiotte, R.: Memory in network flows and its effects on spreading dynamics and community detection. Nat. Commun. 5, 2014, doi:10.​1038/​ncomms5630
29.
Zurück zum Zitat Rosvall, M., Bergstrom, C.T.: An information-theoretic framework for resolving community structure in complex networks. Proc. Natl. Acad. Sci. 104(18), 7327–7331 (2007)CrossRef Rosvall, M., Bergstrom, C.T.: An information-theoretic framework for resolving community structure in complex networks. Proc. Natl. Acad. Sci. 104(18), 7327–7331 (2007)CrossRef
30.
Zurück zum Zitat Rosvall, M., Bergstrom, C.T.: The map equation. Eur. Phys. J. Spec. Top. 178(1), 13–23 (2009)CrossRef Rosvall, M., Bergstrom, C.T.: The map equation. Eur. Phys. J. Spec. Top. 178(1), 13–23 (2009)CrossRef
31.
Zurück zum Zitat Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mobile Comput. Commun. Rev. 5(1), 3–55 (2001)CrossRef Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mobile Comput. Commun. Rev. 5(1), 3–55 (2001)CrossRef
32.
Zurück zum Zitat Staudt, C.L., Meyerhenke, H.: Engineering parallel algorithms for community detection in massive networks. arXiv:1304.4453 (2014) Staudt, C.L., Meyerhenke, H.: Engineering parallel algorithms for community detection in massive networks. arXiv:​1304.​4453 (2014)
33.
Zurück zum Zitat Yang, B., Liu, D., Liu, J.: Discovering communities from social networks: methodologies and applications. In: Furht, B. (ed.) Handbook of Social Network Technologies and Applications, pp. 331–346. Springer, New York, USA (2010)CrossRef Yang, B., Liu, D., Liu, J.: Discovering communities from social networks: methodologies and applications. In: Furht, B. (ed.) Handbook of Social Network Technologies and Applications, pp. 331–346. Springer, New York, USA (2010)CrossRef
Metadaten
Titel
Community structure mining in big data social media networks with MapReduce
verfasst von
Songchang Jin
Wangqun Lin
Hong Yin
Shuqiang Yang
Aiping Li
Bo Deng
Publikationsdatum
01.09.2015
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 3/2015
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-015-0452-x

Weitere Artikel der Ausgabe 3/2015

Cluster Computing 3/2015 Zur Ausgabe