Skip to main content
Erschienen in: International Journal of Machine Learning and Cybernetics 2/2017

22.02.2015 | Original Article

Efficiently detecting overlapping communities using seeding and semi-supervised learning

verfasst von: Changxing Shang, Shengzhong Feng, Zhongying Zhao, Jianping Fan

Erschienen in: International Journal of Machine Learning and Cybernetics | Ausgabe 2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A common scheme for discovering overlapping communities in a network is to use a seeding process followed by an expansion process. Most seeding methods are either too complex to scale to large networks or too simple to select high-quality seeds. Additionally, the non-principled functions used by most expansion methods lead to poor performances when applied to diverse networks. This paper proposes a new method that transforms a network into a corpus. Each edge is treated as a document, and all the network nodes are treated as terms of the corpus. We propose an effective seeding method that selects seeds as a training set, and a principled expansion method based on semi-supervised learning that classifies the edges. We compared our new algorithm with four other community detection algorithms on a wide range of synthetic and empirical networks. Our experimental results show that the new algorithm significantly improved the clustering performance in most cases. Furthermore, the time complexity of the new algorithm is linear with respect to the number of edges, which means that the technique can be scaled to large networks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Weitere Produktempfehlungen anzeigen
Fußnoten
1
Borgatti [13] proposed the Jaccard similarity matrix, but the Jaccard matrix in this paper is different from Borgatti’s Jaccard similarity matrix. Using the definitions given by Borgatti [13], the Jaccard matrix is a 2-mode matrix derived from a 1-mode adjacent matrix. Borgatti’s Jaccard similarity matrix is a 1-mode matrix derived from a 2-mode women-by-events matrix.
 
3
All experiments were conducted on a workstation that has eight Intel Xeon 2.27GHz cores and 12G RAM. ITEM and SMC were parallelized running on eight cores using openMP. The other algorithms were not parallelized and ran on one core.
 
Literatur
2.
Zurück zum Zitat Borgs C, Chayes J, Mahdian M, Saberi A (2004) Exploring the community structure of newsgroups. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 783–787, 2004 Borgs C, Chayes J, Mahdian M, Saberi A (2004) Exploring the community structure of newsgroups. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 783–787, 2004
3.
Zurück zum Zitat Lee C, Reid F, McDaid A, Hurley N (2010) Detecting highly overlapping community structure by greedy clique expansion. In: SNA-KDD’10: Proceedings of the 4th Workshop on Social Network Mining and Analysis, 2010 Lee C, Reid F, McDaid A, Hurley N (2010) Detecting highly overlapping community structure by greedy clique expansion. In: SNA-KDD’10: Proceedings of the 4th Workshop on Social Network Mining and Analysis, 2010
4.
Zurück zum Zitat Lancichinetti A, Fortunato S, Kertész J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 11(3):033015CrossRef Lancichinetti A, Fortunato S, Kertész J (2009) Detecting the overlapping and hierarchical community structure in complex networks. New J Phys 11(3):033015CrossRef
5.
Zurück zum Zitat Lee C, Reid F, McDaid A, Hurley N (2011) Seeding for pervasively overlapping communities. Phys Rev E 83(6):066107CrossRef Lee C, Reid F, McDaid A, Hurley N (2011) Seeding for pervasively overlapping communities. Phys Rev E 83(6):066107CrossRef
6.
Zurück zum Zitat Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E 80(5):056117CrossRef Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E 80(5):056117CrossRef
7.
Zurück zum Zitat Newman ME, Leicht EA (2007) Mixture models and exploratory analysis in networks. Proc Natl Acad Sci 104(23):9564CrossRefMATH Newman ME, Leicht EA (2007) Mixture models and exploratory analysis in networks. Proc Natl Acad Sci 104(23):9564CrossRefMATH
8.
Zurück zum Zitat Shen H, Cheng X, Cai K, Hu MB (2009) Detect overlapping and hierarchical community structure in networks. Phys A Stat Mech Appl 388(8):1706CrossRef Shen H, Cheng X, Cai K, Hu MB (2009) Detect overlapping and hierarchical community structure in networks. Phys A Stat Mech Appl 388(8):1706CrossRef
9.
Zurück zum Zitat Bron C, Kerbosch J (1973) Algorithm 457: finding all cliques of an undirected graph. Commun ACM 16(9):575CrossRefMATH Bron C, Kerbosch J (1973) Algorithm 457: finding all cliques of an undirected graph. Commun ACM 16(9):575CrossRefMATH
10.
Zurück zum Zitat Baumes J, Goldberg MK, Krishnamoorthy MS, Magdon-Ismail M, Preston N (2005) Finding communities by clustering a graph into overlapping subgraphs. IADIS AC 5:97 Baumes J, Goldberg MK, Krishnamoorthy MS, Magdon-Ismail M, Preston N (2005) Finding communities by clustering a graph into overlapping subgraphs. IADIS AC 5:97
11.
Zurück zum Zitat Baumes J, Goldberg M, Magdon-Ismail M (2005) Intelligence and security informatics. Springer, New York, pp 27–36 Baumes J, Goldberg M, Magdon-Ismail M (2005) Intelligence and security informatics. Springer, New York, pp 27–36
12.
Zurück zum Zitat Yang J, Leskovec J (2012) Proceedings of the ACM SIGKDD Workshop on mining data semantics, ACM, p 3, 2012 Yang J, Leskovec J (2012) Proceedings of the ACM SIGKDD Workshop on mining data semantics, ACM, p 3, 2012
13.
Zurück zum Zitat Borgatti SP (2012) Computational complexity—theory, techniques, and applications. In: Meyers RA (ed). Springer, New York, pp 2912–2924 Borgatti SP (2012) Computational complexity—theory, techniques, and applications. In: Meyers RA (ed). Springer, New York, pp 2912–2924
14.
Zurück zum Zitat Berry MW, Castellanos M (2004) Survey of text mining. Springer, New York Berry MW, Castellanos M (2004) Survey of text mining. Springer, New York
15.
Zurück zum Zitat Koller D, Sahami M (1997) Proceedings of ICML-97, 14th International Conference on machine learning. Morgan Kaufmann Publishers, Burlington, pp 170–178 Koller D, Sahami M (1997) Proceedings of ICML-97, 14th International Conference on machine learning. Morgan Kaufmann Publishers, Burlington, pp 170–178
16.
Zurück zum Zitat Robertson S (2004) Understanding inverse document frequency: on theoretical arguments for IDF. J Doc 60(5):503CrossRef Robertson S (2004) Understanding inverse document frequency: on theoretical arguments for IDF. J Doc 60(5):503CrossRef
17.
Zurück zum Zitat Zhu X (2006) Semi-supervised learning literature survey. Comput Sci Univ Wis Madison 2:3 Zhu X (2006) Semi-supervised learning literature survey. Comput Sci Univ Wis Madison 2:3
18.
Zurück zum Zitat Jiang J, Yan X, Yu Z, Guo J, Tian W (2014) A Chinese expert disambiguation method based on semi-supervised graph clustering. Int J Mac Learn Cybern :1–8 (2014) Jiang J, Yan X, Yu Z, Guo J, Tian W (2014) A Chinese expert disambiguation method based on semi-supervised graph clustering. Int J Mac Learn Cybern :1–8 (2014)
19.
Zurück zum Zitat Maulik U, Chakraborty D (2012) A novel semisupervised SVM for pixel classification of remote sensing imagery. Int J Mac Learn Cybern 3(3):247CrossRef Maulik U, Chakraborty D (2012) A novel semisupervised SVM for pixel classification of remote sensing imagery. Int J Mac Learn Cybern 3(3):247CrossRef
20.
Zurück zum Zitat Chen WJ, Shao YH, Hong N (2014) Laplacian smooth twin support vector machine for semi-supervised classification. Int J Mac Learn Cybern 5(3):459CrossRef Chen WJ, Shao YH, Hong N (2014) Laplacian smooth twin support vector machine for semi-supervised classification. Int J Mac Learn Cybern 5(3):459CrossRef
21.
Zurück zum Zitat Tanha J, van Someren M, Afsarmanesh H (2015) Semi-supervised self-training for decision tree classifiers. Int J Mac Learn Cybern :1–16 Tanha J, van Someren M, Afsarmanesh H (2015) Semi-supervised self-training for decision tree classifiers. Int J Mac Learn Cybern :1–16
22.
Zurück zum Zitat Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29(2–3):103CrossRefMATH Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29(2–3):103CrossRefMATH
23.
Zurück zum Zitat Shang C, Li M, Feng S, Jiang Q, Fan J (2013) Feature selection via maximizing global information gain for text classification. Knowl Based Syst 54:298CrossRef Shang C, Li M, Feng S, Jiang Q, Fan J (2013) Feature selection via maximizing global information gain for text classification. Knowl Based Syst 54:298CrossRef
24.
Zurück zum Zitat Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, ACM, pp 380–388, 2002 Charikar MS (2002) Similarity estimation techniques from rounding algorithms. In: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, ACM, pp 380–388, 2002
25.
Zurück zum Zitat Manku GS, Jain A, Das Sarma A (2007) Detecting near-duplicates for web crawling. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp 141–150, 2007 Manku GS, Jain A, Das Sarma A (2007) Detecting near-duplicates for web crawling. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp 141–150, 2007
26.
Zurück zum Zitat Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 89–98, 2003 Dhillon IS, Mallela S, Modha DS (2003) Information-theoretic co-clustering. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 89–98, 2003
27.
Zurück zum Zitat Mladenic D, Grobelnik M (1999) Feature selection for unbalanced class distribution and naive bayes. ICML 99:258–267 Mladenic D, Grobelnik M (1999) Feature selection for unbalanced class distribution and naive bayes. ICML 99:258–267
28.
Zurück zum Zitat Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(4):046110CrossRef Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(4):046110CrossRef
29.
Zurück zum Zitat Lancichinetti A, Fortunato S (2009) Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E 80(1):016118CrossRef Lancichinetti A, Fortunato S (2009) Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities. Phys Rev E 80(1):016118CrossRef
30.
Zurück zum Zitat Traud AL, Kelsic ED, Mucha PJ, Porter MA (2011) Comparing community structure to characteristics in online collegiate social networks. SIAM Rev 53(3):526MathSciNetCrossRef Traud AL, Kelsic ED, Mucha PJ, Porter MA (2011) Comparing community structure to characteristics in online collegiate social networks. SIAM Rev 53(3):526MathSciNetCrossRef
31.
Zurück zum Zitat Traud AL, Mucha PJ, Porter MA (2012) Social structure of Facebook networks. Phys A Stat Mech Appl 391(16):4165CrossRef Traud AL, Mucha PJ, Porter MA (2012) Social structure of Facebook networks. Phys A Stat Mech Appl 391(16):4165CrossRef
32.
Zurück zum Zitat Lee C, Cunningham P (2014) Community detection: effective evaluation on large social networks. J Comp Netw 2(1):19CrossRef Lee C, Cunningham P (2014) Community detection: effective evaluation on large social networks. J Comp Netw 2(1):19CrossRef
33.
Zurück zum Zitat Gargi U, Lu W, Mirrokni VS, Yoon S (2011) Large-Scale Community Detection on YouTube for Topic Discovery and Exploration. ICWSM Gargi U, Lu W, Mirrokni VS, Yoon S (2011) Large-Scale Community Detection on YouTube for Topic Discovery and Exploration. ICWSM
34.
Zurück zum Zitat Subbian K, Aggarwal CC, Srivastava J, Yu PS (2013) Community Detection with Prior Knowledge. In: Proceedings of the 2013 SIAM International Conference on data mining, SIAM, pp 405–413, 2013 Subbian K, Aggarwal CC, Srivastava J, Yu PS (2013) Community Detection with Prior Knowledge. In: Proceedings of the 2013 SIAM International Conference on data mining, SIAM, pp 405–413, 2013
35.
Zurück zum Zitat Yang T, Jin R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 927–936, 2009 Yang T, Jin R, Chi Y, Zhu S (2009) Combining link and content for community detection: a discriminative approach. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 927–936, 2009
36.
Zurück zum Zitat Gopalan PK, Blei DM (2013) Efficient discovery of overlapping communities in massive networks. Proc Natl Acad Sci 110(36):14534MathSciNetCrossRefMATH Gopalan PK, Blei DM (2013) Efficient discovery of overlapping communities in massive networks. Proc Natl Acad Sci 110(36):14534MathSciNetCrossRefMATH
37.
Zurück zum Zitat Andersen R, Gleich DF, Mirrokni V (2012) Overlapping clusters for distributed computation. In: Proceedings of the fifth ACM international conference on Web search and data mining, ACM, pp 273–282, 2012 Andersen R, Gleich DF, Mirrokni V (2012) Overlapping clusters for distributed computation. In: Proceedings of the fifth ACM international conference on Web search and data mining, ACM, pp 273–282, 2012
38.
Zurück zum Zitat Gleich DF, Seshadhri C (2012) Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM , pp 597–605, 2012 Gleich DF, Seshadhri C (2012) Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, ACM , pp 597–605, 2012
39.
Zurück zum Zitat Xie J, Kelley S, Szymanski EK (2013) Overlapping Community Detection in Networks: The State-of-the-art and Comparative Study. ACM Comput Surv 45(4):43. doi:10.1145/2501654.2501657 Xie J, Kelley S, Szymanski EK (2013) Overlapping Community Detection in Networks: The State-of-the-art and Comparative Study. ACM Comput Surv 45(4):43. doi:10.1145/2501654.2501657
40.
Zurück zum Zitat Xie J, Szymanski BK, Liu X (2011) Uncovering overlapping communities in social networks via a speaker-listener interaction dynamic process. In: Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, IEEE, pp 344–349, 2011 Xie J, Szymanski BK, Liu X (2011) Uncovering overlapping communities in social networks via a speaker-listener interaction dynamic process. In: Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, IEEE, pp 344–349, 2011
41.
Zurück zum Zitat Gregory S (2010) Finding overlapping communities in networks by label propagation. New J Phys 12(10):103018CrossRef Gregory S (2010) Finding overlapping communities in networks by label propagation. New J Phys 12(10):103018CrossRef
42.
Zurück zum Zitat Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S (2011) Finding statistically significant communities in networks. PloS One 6(4):e18961CrossRef Lancichinetti A, Radicchi F, Ramasco JJ, Fortunato S (2011) Finding statistically significant communities in networks. PloS One 6(4):e18961CrossRef
43.
Zurück zum Zitat Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043):814CrossRef Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043):814CrossRef
44.
Zurück zum Zitat Ball B, Karrer B, Newman MEJ (2011) Efficient and principled method for detecting communities in networks. Phys Rev E 84(3):036103CrossRef Ball B, Karrer B, Newman MEJ (2011) Efficient and principled method for detecting communities in networks. Phys Rev E 84(3):036103CrossRef
45.
Zurück zum Zitat Chapelle O, Schölkopf B, Zien A (2006) Risks of semi-supervisedl earning: how unlabeled data can degrade performance of generative classifiers, in semi-supervised learning. MIT Press, Massachusetts , pp 57–72 Chapelle O, Schölkopf B, Zien A (2006) Risks of semi-supervisedl earning: how unlabeled data can degrade performance of generative classifiers, in semi-supervised learning. MIT Press, Massachusetts , pp 57–72
46.
Zurück zum Zitat Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466(7307):761CrossRef Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466(7307):761CrossRef
47.
Zurück zum Zitat Ding C, He X (2002) Cluster merging and splitting in hierarchical clustering algorithms. Data mining, 2002, ICDM 2003. Proceedings. 2002 IEEE International Conference on, IEEE, pp 139–146, 2002 Ding C, He X (2002) Cluster merging and splitting in hierarchical clustering algorithms. Data mining, 2002, ICDM 2003. Proceedings. 2002 IEEE International Conference on, IEEE, pp 139–146, 2002
49.
Zurück zum Zitat Stoffel K, Belkoniene A (1999) Parallel K/h-Means Clustering for Large Data Sets. In: Proceedings of the 5th International Euro-Par Conference on parallel processing. Springer, New York, pp 1451–1454, (Euro-Par ’99) Stoffel K, Belkoniene A (1999) Parallel K/h-Means Clustering for Large Data Sets. In: Proceedings of the 5th International Euro-Par Conference on parallel processing. Springer, New York, pp 1451–1454, (Euro-Par ’99)
Metadaten
Titel
Efficiently detecting overlapping communities using seeding and semi-supervised learning
verfasst von
Changxing Shang
Shengzhong Feng
Zhongying Zhao
Jianping Fan
Publikationsdatum
22.02.2015
Verlag
Springer Berlin Heidelberg
Erschienen in
International Journal of Machine Learning and Cybernetics / Ausgabe 2/2017
Print ISSN: 1868-8071
Elektronische ISSN: 1868-808X
DOI
https://doi.org/10.1007/s13042-015-0338-5

Weitere Artikel der Ausgabe 2/2017

International Journal of Machine Learning and Cybernetics 2/2017 Zur Ausgabe

Neuer Inhalt