Skip to main content
Erschienen in: The Journal of Supercomputing 10/2018

26.07.2018

Improving NMF-based community discovery using distributed robust nonnegative matrix factorization with SimRank similarity measure

verfasst von: Chaobo He, Xiang Fei, Hanchao Li, Yong Tang, Hai Liu, Shuangyin Liu

Erschienen in: The Journal of Supercomputing | Ausgabe 10/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Nonnegative matrix factorization (NMF) has become a powerful model for community discovery in complex networks. Existing NMF-based methods for community discovery often factorize the corresponding adjacent matrix of complex networks to obtain its community indicator matrix. However, the adjacent matrix cannot represent the global structure feature of complex networks very well, and this leads to the performance degradation of community discovery. Besides, most of existing methods are not robust and scalable enough, so they are not effective to deal with complex networks with noises and large scales. Aiming at these problems above, in this paper we propose a method for community discovery using distributed robust NMF with SimRank similarity measure. This method selects SimRank measure to construct the feature matrix, which can more accurately represent the global structure feature of complex networks. To improve the robustness, we select \(\ell _{2,1}\) norm instead of the widely used Frobenius norm to construct its NMF-based community discovery model. In addition, to improve the scalability, we implement its key components by using MapReduce distributed computing framework, including computing SimRank feature matrix and iteratively solving the NMF-based model for community discovery. We conduct extensive experiments on several typical complex networks. The results show that our method has better performance and robustness than other representative NMF-based methods for community discovery. Moreover, our method presents good scalability and hence can be used to discover communities in the large-scale complex networks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113CrossRef Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113CrossRef
2.
Zurück zum Zitat Hao F, Park DS (2018) cSketch: a novel framework for capturing cliques from big graph. J Supercomput 74(3):1202–1214CrossRef Hao F, Park DS (2018) cSketch: a novel framework for capturing cliques from big graph. J Supercomput 74(3):1202–1214CrossRef
3.
Zurück zum Zitat Newman ME (2013) Spectral methods for community detection and graph partitioning. Phys Rev E 88:042822CrossRef Newman ME (2013) Spectral methods for community detection and graph partitioning. Phys Rev E 88:042822CrossRef
4.
Zurück zum Zitat Ianni MD, Gambosi G, Rossi G et al (2016) Min-max communities in graphs: complexity and computational properties. Theor Comput Sci 613:94–114MathSciNetCrossRef Ianni MD, Gambosi G, Rossi G et al (2016) Min-max communities in graphs: complexity and computational properties. Theor Comput Sci 613:94–114MathSciNetCrossRef
5.
Zurück zum Zitat Peng DL, Lei X, Huang T (2015) DICH: a framework for discovering implicit communities hidden in tweets. World Wide Web 18(4):795–818CrossRef Peng DL, Lei X, Huang T (2015) DICH: a framework for discovering implicit communities hidden in tweets. World Wide Web 18(4):795–818CrossRef
6.
Zurück zum Zitat Xiang J, Tao H, Zhang Y, Hu K et al (2016) Local modularity for community detection in complex networks. Physica A 443:451–459CrossRef Xiang J, Tao H, Zhang Y, Hu K et al (2016) Local modularity for community detection in complex networks. Physica A 443:451–459CrossRef
7.
Zurück zum Zitat Shakarian P, Roos P, Callahan D, et al. (2013) Mining for geographically disperse communities in social networks by leveraging distance modularity. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 1402–1409 Shakarian P, Roos P, Callahan D, et al. (2013) Mining for geographically disperse communities in social networks by leveraging distance modularity. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 1402–1409
8.
Zurück zum Zitat Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111CrossRef Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111CrossRef
9.
Zurück zum Zitat Zhu WY, Peng WC, Hung CC et al (2014) Exploring sequential probability tree for movement-based community discovery. IEEE Trans Knowl Data Eng 6(11):2717–2730CrossRef Zhu WY, Peng WC, Hung CC et al (2014) Exploring sequential probability tree for movement-based community discovery. IEEE Trans Knowl Data Eng 6(11):2717–2730CrossRef
10.
Zurück zum Zitat Costa G, Ortale R (2013) Probabilistic analysis of communities and inner roles in networks: bayesian generative models and approximate inference. Soc Netw Anal Min 3(4):1015–1038CrossRef Costa G, Ortale R (2013) Probabilistic analysis of communities and inner roles in networks: bayesian generative models and approximate inference. Soc Netw Anal Min 3(4):1015–1038CrossRef
11.
Zurück zum Zitat Zhang Y, Yeung DY (2012) Overlapping community detection via bounded nonnegative matrix tri-factorization. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 606–614 Zhang Y, Yeung DY (2012) Overlapping community detection via bounded nonnegative matrix tri-factorization. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 606–614
12.
Zurück zum Zitat Ma Y, Hu X, He T et al (2017) Clustering and integrating of heterogeneous microbiome data by joint symmetric nonnegative matrix factorization with laplacian regularization. IEEE/ACM Trans Comput Biol Bioinf 99:1 Ma Y, Hu X, He T et al (2017) Clustering and integrating of heterogeneous microbiome data by joint symmetric nonnegative matrix factorization with laplacian regularization. IEEE/ACM Trans Comput Biol Bioinf 99:1
13.
Zurück zum Zitat He CB, Li HC, Fei X et al (2017) A topic community-based method for friend recommendation in large-scale online social networks. Concurr Comput Pract Exp 29(6):e3924CrossRef He CB, Li HC, Fei X et al (2017) A topic community-based method for friend recommendation in large-scale online social networks. Concurr Comput Pract Exp 29(6):e3924CrossRef
14.
Zurück zum Zitat Liu SY, Wang SH (2017) Trajectory community discovery and recommendation by multi-source diffusion modeling. IEEE Trans Knowl Data Eng 29(4):898–911CrossRef Liu SY, Wang SH (2017) Trajectory community discovery and recommendation by multi-source diffusion modeling. IEEE Trans Knowl Data Eng 29(4):898–911CrossRef
15.
Zurück zum Zitat Yang L, Cao XC, Jin D et al (2015) A unified semi-supervised community detection framework using latent space graph regularization. IEEE Trans Cybern 45(11):2585–2598CrossRef Yang L, Cao XC, Jin D et al (2015) A unified semi-supervised community detection framework using latent space graph regularization. IEEE Trans Cybern 45(11):2585–2598CrossRef
16.
Zurück zum Zitat Li WM, Xie J, Xin MJ et al (2018) An overlapping network community partition algorithm based on semi-supervised matrix factorization and random walk. Expert Syst Appl 91:277–285CrossRef Li WM, Xie J, Xin MJ et al (2018) An overlapping network community partition algorithm based on semi-supervised matrix factorization and random walk. Expert Syst Appl 91:277–285CrossRef
17.
Zurück zum Zitat He CB, Fei X, Li HC, et al. (2017) Community discovery in large-scale complex networks using distributed SimRank nonnegative matrix factorization. In: Proceedings of the 5th International Conference on Advanced Cloud and Big Data. IEEE, pp 226–231 He CB, Fei X, Li HC, et al. (2017) Community discovery in large-scale complex networks using distributed SimRank nonnegative matrix factorization. In: Proceedings of the 5th International Conference on Advanced Cloud and Big Data. IEEE, pp 226–231
18.
Zurück zum Zitat Leicht EA, Holme P, Newman ME (2006) Vertex similarity in networks. Phys Rev E 73:026120CrossRef Leicht EA, Holme P, Newman ME (2006) Vertex similarity in networks. Phys Rev E 73:026120CrossRef
19.
Zurück zum Zitat Kondor RI, Lafferty J (2002) Diffusion kernels on graphs and other discrete input space. In: Proceedings of the 19th International Conference on Machine Learning. IEEE, pp 315–322 Kondor RI, Lafferty J (2002) Diffusion kernels on graphs and other discrete input space. In: Proceedings of the 19th International Conference on Machine Learning. IEEE, pp 315–322
20.
Zurück zum Zitat Liu WP, Lü LY (2010) Link prediction based on local random walk. Europhys Lett 89(5):58007–58012CrossRef Liu WP, Lü LY (2010) Link prediction based on local random walk. Europhys Lett 89(5):58007–58012CrossRef
21.
Zurück zum Zitat Jeh G, Widom J (2002) SimRank: a measure of structural-context similarity. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 538–543 Jeh G, Widom J (2002) SimRank: a measure of structural-context similarity. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 538–543
22.
Zurück zum Zitat Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791CrossRef Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401:788–791CrossRef
23.
Zurück zum Zitat Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In: Proceedings of the 24th Annual Conference on Neural Information Processing Systems. Curran Associates, pp 556–562 Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In: Proceedings of the 24th Annual Conference on Neural Information Processing Systems. Curran Associates, pp 556–562
24.
Zurück zum Zitat Ling Y, Pan XL, Li GR et al (2015) Clinical documents clustering based on medication/symptom names using multi-view nonnegative matrix factorization. IEEE Trans Nanobiosci 14(5):500–504CrossRef Ling Y, Pan XL, Li GR et al (2015) Clinical documents clustering based on medication/symptom names using multi-view nonnegative matrix factorization. IEEE Trans Nanobiosci 14(5):500–504CrossRef
25.
Zurück zum Zitat Trung ND, Schmidt-Thieme L, Trung ND, et al. (2017) On discovering the number of document topics via conceptual latent space. In: Proceedings of the 26th ACM on Conference on Information and Knowledge Management. ACM, pp 2051–2054 Trung ND, Schmidt-Thieme L, Trung ND, et al. (2017) On discovering the number of document topics via conceptual latent space. In: Proceedings of the 26th ACM on Conference on Information and Knowledge Management. ACM, pp 2051–2054
26.
Zurück zum Zitat Mohammadiha N, Smaragdis P, Leijon A (2013) Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans Audio Speech Lang Process 21(10):2140–2151CrossRef Mohammadiha N, Smaragdis P, Leijon A (2013) Supervised and unsupervised speech enhancement using nonnegative matrix factorization. IEEE Trans Audio Speech Lang Process 21(10):2140–2151CrossRef
27.
Zurück zum Zitat Yang ZG, Li Q, Liu WY et al (2017) Dual graph regularized NMF model for social event detection from Flickr data. World Wide Web 20(5):995–1015CrossRef Yang ZG, Li Q, Liu WY et al (2017) Dual graph regularized NMF model for social event detection from Flickr data. World Wide Web 20(5):995–1015CrossRef
28.
Zurück zum Zitat Lai HJ, Yan P, Shu XB et al (2016) Instance-aware hashing for multi-label image retrieval. IEEE Trans Image Process 25(6):2469–2479MathSciNetCrossRef Lai HJ, Yan P, Shu XB et al (2016) Instance-aware hashing for multi-label image retrieval. IEEE Trans Image Process 25(6):2469–2479MathSciNetCrossRef
29.
Zurück zum Zitat Kannan R, Ishteva M, Park H (2014) Bounded matrix factorization for recommender system. Knowl Inf Syst 39(3):491–511CrossRef Kannan R, Ishteva M, Park H (2014) Bounded matrix factorization for recommender system. Knowl Inf Syst 39(3):491–511CrossRef
30.
Zurück zum Zitat Ding CHQ, He X (2015) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of 5th SIAM International Conference on Data Mining. ACM, pp 606–610 Ding CHQ, He X (2015) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of 5th SIAM International Conference on Data Mining. ACM, pp 606–610
31.
Zurück zum Zitat Wang F, Li T, Wang X et al (2011) Community discovery using nonnegative matrix factorization. Data Min Knowl Discov 22(3):493–521MathSciNetCrossRef Wang F, Li T, Wang X et al (2011) Community discovery using nonnegative matrix factorization. Data Min Knowl Discov 22(3):493–521MathSciNetCrossRef
32.
Zurück zum Zitat Nguyen NP, Thai MT (2012) Finding overlapped communities in online social networks with nonnegative matrix factorization. In: Proceedings of 33rd IEEE Military Communications Conference. IEEE, pp 1–6 Nguyen NP, Thai MT (2012) Finding overlapped communities in online social networks with nonnegative matrix factorization. In: Proceedings of 33rd IEEE Military Communications Conference. IEEE, pp 1–6
33.
Zurück zum Zitat Zhang ZY, Wang Y, Ahn YY (2013) An overlapping community detection in complex networks using symmetric binary matrix factorization. Phys Rev E 87:062803CrossRef Zhang ZY, Wang Y, Ahn YY (2013) An overlapping community detection in complex networks using symmetric binary matrix factorization. Phys Rev E 87:062803CrossRef
34.
Zurück zum Zitat Wu WH, Kwong S, Zhou Y et al (2018) Nonnegative matrix factorization with mixed hypergraph regularization for community detection. Inf Sci 435:263–281MathSciNetCrossRef Wu WH, Kwong S, Zhou Y et al (2018) Nonnegative matrix factorization with mixed hypergraph regularization for community detection. Inf Sci 435:263–281MathSciNetCrossRef
35.
Zurück zum Zitat Ma XK, Dong D (2017) Evolutionary nonnegative matrix factorization algorithms for community detection in dynamic networks. IEEE Trans Knowl Data Eng 29(5):1045–1058CrossRef Ma XK, Dong D (2017) Evolutionary nonnegative matrix factorization algorithms for community detection in dynamic networks. IEEE Trans Knowl Data Eng 29(5):1045–1058CrossRef
36.
Zurück zum Zitat Liu X, Wang WJ, He DX et al (2017) Semi-supervised community detection based on non-negative matrix factorization with node popularity. Inf Sci 381:304–321CrossRef Liu X, Wang WJ, He DX et al (2017) Semi-supervised community detection based on non-negative matrix factorization with node popularity. Inf Sci 381:304–321CrossRef
37.
Zurück zum Zitat Shi XH, Lu HT (2016) Community inference with Bayesian non-negative matrix factorization. In: Proceedings of the 18th Asia-Pacific Web Conference on Web Technologies and Applications. Springer, pp 208–219 Shi XH, Lu HT (2016) Community inference with Bayesian non-negative matrix factorization. In: Proceedings of the 18th Asia-Pacific Web Conference on Web Technologies and Applications. Springer, pp 208–219
38.
Zurück zum Zitat Kuang D, Yun S, Park H (2015) SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering. J Global Optim 62(3):545–574MathSciNetCrossRef Kuang D, Yun S, Park H (2015) SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering. J Global Optim 62(3):545–574MathSciNetCrossRef
39.
Zurück zum Zitat Yu WR, Lin XM, Zhang WJ (2013) Towards efficient SimRank computation on large networks. In: Proceedings of the 29th IEEE International Conference on Data Engineering. IEEE, pp 601–612 Yu WR, Lin XM, Zhang WJ (2013) Towards efficient SimRank computation on large networks. In: Proceedings of the 29th IEEE International Conference on Data Engineering. IEEE, pp 601–612
40.
Zurück zum Zitat Du YT, Guan XH, Cai ZM (2010) Enhancing web page classification via local co-training. In: Proceedings of the 20th International Conference on Pattern Recognition. IEEE, pp 23–26 Du YT, Guan XH, Cai ZM (2010) Enhancing web page classification via local co-training. In: Proceedings of the 20th International Conference on Pattern Recognition. IEEE, pp 23–26
41.
Zurück zum Zitat Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: Proceedings of 12th IEEE International Conference on Data Mining. IEEE, pp 745–754 Yang J, Leskovec J (2012) Defining and evaluating network communities based on ground-truth. In: Proceedings of 12th IEEE International Conference on Data Mining. IEEE, pp 745–754
42.
Zurück zum Zitat Lai DR, Nardini C (2016) A corrected normalized mutual information for performance evaluation of community detection. J Stat Mech Theory Exp 9:093403CrossRef Lai DR, Nardini C (2016) A corrected normalized mutual information for performance evaluation of community detection. J Stat Mech Theory Exp 9:093403CrossRef
43.
Zurück zum Zitat Ekanayake J, Li H, Zhang BJ, et al. (2010) Twister: a runtime for iterative MapReduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. ACM, pp 810–818 Ekanayake J, Li H, Zhang BJ, et al. (2010) Twister: a runtime for iterative MapReduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. ACM, pp 810–818
Metadaten
Titel
Improving NMF-based community discovery using distributed robust nonnegative matrix factorization with SimRank similarity measure
verfasst von
Chaobo He
Xiang Fei
Hanchao Li
Yong Tang
Hai Liu
Shuangyin Liu
Publikationsdatum
26.07.2018
Verlag
Springer US
Erschienen in
The Journal of Supercomputing / Ausgabe 10/2018
Print ISSN: 0920-8542
Elektronische ISSN: 1573-0484
DOI
https://doi.org/10.1007/s11227-018-2500-9

Weitere Artikel der Ausgabe 10/2018

The Journal of Supercomputing 10/2018 Zur Ausgabe