Skip to main content
Erschienen in: Cluster Computing 1/2016

01.03.2016

SIMPLE: a simplifying-ensembling framework for parallel community detection from large networks

verfasst von: Zhiang Wu, Guangliang Gao, Zhan Bu, Jie Cao

Erschienen in: Cluster Computing | Ausgabe 1/2016

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Community detection is a classic and very difficult task in complex network analysis. As the increasingly explosion of social media, scaling community detection methods to large networks has attracted considerable recent interests. In this paper, we propose a novel SIMPLifying and Ensembling (SIMPLE) framework for parallel community detection. It employs the random link sampling to simplify the network and obtain basic partitionings on every sampled graphs. Then, the K-means-based Consensus Clustering is used to ensemble a number of basic partitionings to get high-quality community structures. All of phases in SIMPLE, including random sampling, sampled graph partitioning, and consensus clustering, are encapsulated into MapReduce for parallel execution. Experiments on six real-world social networks analyze key parameters and factors inside SIMPLE, and demonstrate both effectiveness and efficiency of the SIMPLE.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Bernard, T., Bui, A., Pilard, L., Sohier, D.: A distributed clustering algorithm for large-scale dynamic networks. Clust. Comput. 15(4), 335–350 (2012)CrossRef Bernard, T., Bui, A., Pilard, L., Sohier, D.: A distributed clustering algorithm for large-scale dynamic networks. Clust. Comput. 15(4), 335–350 (2012)CrossRef
3.
Zurück zum Zitat Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 2008(10), P10008 (2008)CrossRef Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. 2008(10), P10008 (2008)CrossRef
4.
Zurück zum Zitat Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 493–507 (1952) Chernoff, H.: A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 493–507 (1952)
5.
Zurück zum Zitat Cho, E., Myers, S.A., Leskovec, J.: Friendship and mobility: user movement in location-based social networks. In: Proceedings of KDD, pp. 621–628 (2011) Cho, E., Myers, S.A., Leskovec, J.: Friendship and mobility: user movement in location-based social networks. In: Proceedings of KDD, pp. 621–628 (2011)
6.
Zurück zum Zitat Clauset, A.: Finding local community structure in networks. Phys. Rev. E 72, 026132 (2005)CrossRef Clauset, A.: Finding local community structure in networks. Phys. Rev. E 72, 026132 (2005)CrossRef
8.
Zurück zum Zitat Gregori, E., Lenzini, L., Mainardi, S.: Parallel k-clique community detection on large-scale networks. IEEE Trans. Parallel Distrib. Syst. 24(8), 1651–1660 (2013)CrossRef Gregori, E., Lenzini, L., Mainardi, S.: Parallel k-clique community detection on large-scale networks. IEEE Trans. Parallel Distrib. Syst. 24(8), 1651–1660 (2013)CrossRef
9.
Zurück zum Zitat Hubler, C., Kriegel, H.P., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: Proceedings of the 2008 IEEE 7th International Conference on Data Mining, pp. 283–292. IEEE (2008) Hubler, C., Kriegel, H.P., Borgwardt, K., Ghahramani, Z.: Metropolis algorithms for representative subgraph sampling. In: Proceedings of the 2008 IEEE 7th International Conference on Data Mining, pp. 283–292. IEEE (2008)
10.
Zurück zum Zitat Hui, P., Yoneki, E., Chan, S.Y., Crowcroft, J.: Distributed community detection in delay tolerant networks. In: Proceedings of 2nd ACM/IEEE International Workshop on Mobility in the Evolving Internet Architecture, p. 7. ACM (2007) Hui, P., Yoneki, E., Chan, S.Y., Crowcroft, J.: Distributed community detection in delay tolerant networks. In: Proceedings of 2nd ACM/IEEE International Workshop on Mobility in the Evolving Internet Architecture, p. 7. ACM (2007)
11.
Zurück zum Zitat Karypis, G., Kumar, V.: Multilevel k-way hypergraph partitioning. In: Proceedings of the 36th Conference on Design Automation, pp. 343–348 (1999) Karypis, G., Kumar, V.: Multilevel k-way hypergraph partitioning. In: Proceedings of the 36th Conference on Design Automation, pp. 343–348 (1999)
12.
Zurück zum Zitat LaSalle, D., Karypis, G.: Multi-threaded modularity based graph clustering using the multilevel paradigm. J. Parallel Distrib. Comput. 76, 66–80 (2015)CrossRef LaSalle, D., Karypis, G.: Multi-threaded modularity based graph clustering using the multilevel paradigm. J. Parallel Distrib. Comput. 76, 66–80 (2015)CrossRef
13.
Zurück zum Zitat Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636. ACM (2006) Leskovec, J., Faloutsos, C.: Sampling from large graphs. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 631–636. ACM (2006)
14.
Zurück zum Zitat Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD ’05, pp. 177–187. ACM (2005) Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over time: densification laws, shrinking diameters and possible explanations. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, KDD ’05, pp. 177–187. ACM (2005)
15.
Zurück zum Zitat Leskovec, J., Mcauley, J.J.: Learning to discover social circles in ego networks. In: Advances in Neural Information Processing Systems, pp. 539–547 (2012) Leskovec, J., Mcauley, J.J.: Learning to discover social circles in ego networks. In: Advances in Neural Information Processing Systems, pp. 539–547 (2012)
16.
Zurück zum Zitat Li, F., Ooi, B.C., Ozsu, M., Wu, S.: Distributed data management using mapreduce. ACM Comput. Surv. 46(3), 31 (2013) Li, F., Ooi, B.C., Ozsu, M., Wu, S.: Distributed data management using mapreduce. ACM Comput. Surv. 46(3), 31 (2013)
17.
Zurück zum Zitat Moon, S., Lee, J.G., Kang, M.: Scalable community detection from networks by computing edge betweenness on mapreduce. In: International Conference on Big Data and Smart Computing (BIGCOMP), pp. 145–148. IEEE (2014) Moon, S., Lee, J.G., Kang, M.: Scalable community detection from networks by computing edge betweenness on mapreduce. In: International Conference on Big Data and Smart Computing (BIGCOMP), pp. 145–148. IEEE (2014)
18.
Zurück zum Zitat Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 66–113 (2004)CrossRef Newman, M.E.: Fast algorithm for detecting community structure in networks. Phys. Rev. E 69(6), 66–113 (2004)CrossRef
19.
Zurück zum Zitat Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006)CrossRef Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103(23), 8577–8582 (2006)CrossRef
20.
21.
Zurück zum Zitat Palla, G., Derenyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814–818 (2005)CrossRef Palla, G., Derenyi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435, 814–818 (2005)CrossRef
22.
Zurück zum Zitat Papadopoulos, S., Kompatsiaris, Y., Vakali, A., Spyridonos, P.: Community detection in social media. Data Min. Knowl. Discov. 24(3), 515–554 (2012) Papadopoulos, S., Kompatsiaris, Y., Vakali, A., Spyridonos, P.: Community detection in social media. Data Min. Knowl. Discov. 24(3), 515–554 (2012)
23.
Zurück zum Zitat Prat-Pérez, A., Dominguez-Sal, D., Brunat, J.M., Larriba-Pey, J.L.: Shaping communities out of triangles. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1677–1681. ACM (2012) Prat-Pérez, A., Dominguez-Sal, D., Brunat, J.M., Larriba-Pey, J.L.: Shaping communities out of triangles. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1677–1681. ACM (2012)
24.
Zurück zum Zitat Prat-Pérez, A., Dominguez-Sal, D., Larriba-Pey, J.: High quality, scalable and parallel community detection for large real graphs. In: 23rd International World Wide Web Conference, WWW ’14, Seoul, 7–11 April, pp. 225–236 (2014) Prat-Pérez, A., Dominguez-Sal, D., Larriba-Pey, J.: High quality, scalable and parallel community detection for large real graphs. In: 23rd International World Wide Web Conference, WWW ’14, Seoul, 7–11 April, pp. 225–236 (2014)
25.
Zurück zum Zitat Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036–106 (2007)CrossRef Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036–106 (2007)CrossRef
26.
Zurück zum Zitat Serrano, M.Á., Boguñá, M., Vespignani, A.: Extracting the multiscale backbone of complex weighted networks. Proc. Natl. Acad. Sci. 106(16), 6483–6488 (2009)CrossRef Serrano, M.Á., Boguñá, M., Vespignani, A.: Extracting the multiscale backbone of complex weighted networks. Proc. Natl. Acad. Sci. 106(16), 6483–6488 (2009)CrossRef
27.
Zurück zum Zitat Shi, J., Xue, W., Wang, W., Zhang, Y., Yang, B., Li, J.: Scalable community detection in massive social networks using mapreduce. IBM J. Res. Dev. 57(3/4), 12-1 (2013) Shi, J., Xue, W., Wang, W., Zhang, Y., Yang, B., Li, J.: Scalable community detection in massive social networks using mapreduce. IBM J. Res. Dev. 57(3/4), 12-1 (2013)
28.
Zurück zum Zitat Tang, L., Liu, H.: Community Detection and Mining in Social Media. Morgan & Claypool Publishers, San Rafael (2010) Tang, L., Liu, H.: Community Detection and Mining in Social Media. Morgan & Claypool Publishers, San Rafael (2010)
29.
Zurück zum Zitat Traud, A.L., Kelsic, E.D., Mucha, P.J., Porter, M.A.: Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 53(3), 526–543 (2011)MathSciNetCrossRef Traud, A.L., Kelsic, E.D., Mucha, P.J., Porter, M.A.: Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 53(3), 526–543 (2011)MathSciNetCrossRef
30.
Zurück zum Zitat White, T.: Hadoop: The Definitive Guide: The Definitive Guide. O’Reilly Media (2009) White, T.: Hadoop: The Definitive Guide: The Definitive Guide. O’Reilly Media (2009)
31.
Zurück zum Zitat Wu, J., Liu, H., Xiong, H., Cao, J.: A theoretic framework of k-means-based consensus clustering. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1799–1805. AAAI Press (2013) Wu, J., Liu, H., Xiong, H., Cao, J.: A theoretic framework of k-means-based consensus clustering. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1799–1805. AAAI Press (2013)
32.
Zurück zum Zitat Wu, Z., Cao, J., Wu, J., Wang, Y., Liu, C.: Detecting genuine communities from large-scale social networks: a pattern-based method. Comput. J. 57(9), 1343–1357 (2014)CrossRef Wu, Z., Cao, J., Wu, J., Wang, Y., Liu, C.: Detecting genuine communities from large-scale social networks: a pattern-based method. Comput. J. 57(9), 1343–1357 (2014)CrossRef
33.
Zurück zum Zitat Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: Proceedings of ICDM, pp. 745–754 (2012) Yang, J., Leskovec, J.: Defining and evaluating network communities based on ground-truth. In: Proceedings of ICDM, pp. 745–754 (2012)
34.
Zurück zum Zitat Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the sixth ACM international conference on Web search and data mining, pp. 587–596. ACM (2013) Yang, J., Leskovec, J.: Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Proceedings of the sixth ACM international conference on Web search and data mining, pp. 587–596. ACM (2013)
35.
Zurück zum Zitat Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. NSDI’12, pp. 2–2. USENIX Association, Berkeley, CA (2012) Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. NSDI’12, pp. 2–2. USENIX Association, Berkeley, CA (2012)
36.
Zurück zum Zitat Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot topics in Cloud Computing, vol. 10, p. 10 (2010) Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot topics in Cloud Computing, vol. 10, p. 10 (2010)
37.
Zurück zum Zitat Zhang, Y., Wang, J., Wang, Y., Zhou, L.: Parallel community detection on large networks with propinquity dynamics. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 997–1006. ACM (2009) Zhang, Y., Wang, J., Wang, Y., Zhou, L.: Parallel community detection on large networks with propinquity dynamics. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 997–1006. ACM (2009)
38.
Zurück zum Zitat Zhao, W., Ma, H., He, Q.: Parallel k-means clustering based on mapreduce. In: Proceedings of the 1st International Conference on Cloud Computing, CloudCom ’09, pp. 674–679. Springer (2009) Zhao, W., Ma, H., He, Q.: Parallel k-means clustering based on mapreduce. In: Proceedings of the 1st International Conference on Cloud Computing, CloudCom ’09, pp. 674–679. Springer (2009)
Metadaten
Titel
SIMPLE: a simplifying-ensembling framework for parallel community detection from large networks
verfasst von
Zhiang Wu
Guangliang Gao
Zhan Bu
Jie Cao
Publikationsdatum
01.03.2016
Verlag
Springer US
Erschienen in
Cluster Computing / Ausgabe 1/2016
Print ISSN: 1386-7857
Elektronische ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-015-0504-2

Weitere Artikel der Ausgabe 1/2016

Cluster Computing 1/2016 Zur Ausgabe