Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 5-6/2014

01.09.2014

Overlapping community detection in labeled graphs

verfasst von: Esther Galbrun, Aristides Gionis, Nikolaj Tatti

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 5-6/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a new approach for the problem of finding overlapping communities in graphs and social networks. Our approach consists of a novel problem definition and three accompanying algorithms. We are particularly interested in graphs that have labels on their vertices, although our methods are also applicable to graphs with no labels. Our goal is to find k communities so that the total edge density over all k communities is maximized. In the case of labeled graphs, we require that each community is succinctly described by a set of labels. This requirement provides a better understanding for the discovered communities. The proposed problem formulation leads to the discovery of vertex-overlapping and dense communities that cover as many graph edges as possible. We capture these properties with a simple objective function, which we solve by adapting efficient approximation algorithms for the generalized maximum-coverage problem and the densest-subgraph problem. Our proposed algorithm is a generic greedy scheme. We experiment with three variants of the scheme, obtained by varying the greedy step of finding a dense subgraph. We validate our algorithms by comparing with other state-of-the-art community-detection methods on a variety of performance measures. Our experiments confirm that our algorithms achieve results of high quality in terms of the reported measures, and are practical in terms of performance.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Cohen and Katzir express their approximation factor as \((\frac{2e-1}{e-1}+\epsilon )\), for every \(\epsilon >0\), but we follow the convention that maximization problems have approximation factors less than 1.
 
3
Namely, S. Abiteboul, E. Demaine, M. Ester, C. Faloutsos, J. Han, G. Karypis, J. Kleinberg, H. Mannila, K. Mehlhorn, C. Papadimitriou, B. Shneiderman, G. Weikum and P. Yu.
 
Literatur
Zurück zum Zitat Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466:761–764CrossRef Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466:761–764CrossRef
Zurück zum Zitat Atkins JE, Boman EG, Hendrickson B (1998) A spectral algorithm for seriation and the consecutive ones problem. SIAM J Comput 28:297–310CrossRefMATHMathSciNet Atkins JE, Boman EG, Hendrickson B (1998) A spectral algorithm for seriation and the consecutive ones problem. SIAM J Comput 28:297–310CrossRefMATHMathSciNet
Zurück zum Zitat Balasubramanyan R, Cohen WW (2011) Block-LDA: Jointly modeling entity-annotated text and entity-entity links. In: SIAM international conference on data mining (SDM’11), SIAM/Omnipress, pp 450–461 Balasubramanyan R, Cohen WW (2011) Block-LDA: Jointly modeling entity-annotated text and entity-entity links. In: SIAM international conference on data mining (SDM’11), SIAM/Omnipress, pp 450–461
Zurück zum Zitat Charikar M (2000) Greedy approximation algorithms for finding dense components in a graph. In: International workshop on approximation, randomization, and combinatorial optimization (APPROX’00) pp 84–95 Charikar M (2000) Greedy approximation algorithms for finding dense components in a graph. In: International workshop on approximation, randomization, and combinatorial optimization (APPROX’00) pp 84–95
Zurück zum Zitat Chen W, Liu Z, Sun X, Wang Y (2010) A game-theoretic framework to identify overlapping communities in social networks. Data Min Knowl Discov 21(2):224–240CrossRefMathSciNet Chen W, Liu Z, Sun X, Wang Y (2010) A game-theoretic framework to identify overlapping communities in social networks. Data Min Knowl Discov 21(2):224–240CrossRefMathSciNet
Zurück zum Zitat Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111CrossRef Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111CrossRef
Zurück zum Zitat Coscia M, Rossetti G, Giannotti F, Pedreschi D (2012) DEMON: a local-first discovery method for overlapping communities. In: Yang Q, Agarwal D, Pei J (eds) ACM SIGKDD international conference on knowledge discovery and data mining (KDD’12), pp 615–623 Coscia M, Rossetti G, Giannotti F, Pedreschi D (2012) DEMON: a local-first discovery method for overlapping communities. In: Yang Q, Agarwal D, Pei J (eds) ACM SIGKDD international conference on knowledge discovery and data mining (KDD’12), pp 615–623
Zurück zum Zitat Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. In: Ramakrishnan R, Stolfo SJ, Bayardo RJ, Parsa I (eds) ACM SIGKDD international conference on knowledge discovery and data mining (KDD’00), ACM, pp 150–160 Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. In: Ramakrishnan R, Stolfo SJ, Bayardo RJ, Parsa I (eds) ACM SIGKDD international conference on knowledge discovery and data mining (KDD’00), ACM, pp 150–160
Zurück zum Zitat Fortunato S (2010) Community detection in graphs. Physics Reports 486 Fortunato S (2010) Community detection in graphs. Physics Reports 486
Zurück zum Zitat Gregory S (2007) An algorithm to find overlapping community structure in networks. In: Kok JN, Koronacki J, de Mántaras RL, Matwin S, Mladenic D, Skowron A (eds) European conference on principles and practice of knowledge discovery in databases (ECML/PKDD’07). Lecture Notes in Computer Science, vol 4702, pp 91–102. Springer, Berlin Gregory S (2007) An algorithm to find overlapping community structure in networks. In: Kok JN, Koronacki J, de Mántaras RL, Matwin S, Mladenic D, Skowron A (eds) European conference on principles and practice of knowledge discovery in databases (ECML/PKDD’07). Lecture Notes in Computer Science, vol 4702, pp 91–102. Springer, Berlin
Zurück zum Zitat Gupta R, Roughgarden T, Seshadhri C (2014) Decompositions of triangle-dense graphs. In: Naor M (ed) Innovations in theoretical computer science. (ITCS’14), ACM, pp 471–482 Gupta R, Roughgarden T, Seshadhri C (2014) Decompositions of triangle-dense graphs. In: Naor M (ed) Innovations in theoretical computer science. (ITCS’14), ACM, pp 471–482
Zurück zum Zitat Karypis G, Kumar V (1998) Multilevel algorithms for multi-constraint graph partitioning. In: ACM/IEEE conference on supercomputing (SC ’98), IEEE Computer Society, pp 1–13 Karypis G, Kumar V (1998) Multilevel algorithms for multi-constraint graph partitioning. In: ACM/IEEE conference on supercomputing (SC ’98), IEEE Computer Society, pp 1–13
Zurück zum Zitat McAuley J, Leskovec J (2012) Learning to discover social circles in ego networks. In: Bartlett PL, Pereira FCN, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems (NIPS’12), pp 548–556 McAuley J, Leskovec J (2012) Learning to discover social circles in ego networks. In: Bartlett PL, Pereira FCN, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems (NIPS’12), pp 548–556
Zurück zum Zitat Miettinen P, Mielikäinen T, Gionis A, Das G, Mannila H (2008) The discrete basis problem. IEEE Trans Knowl Data Eng 20(10):1348–1362 Miettinen P, Mielikäinen T, Gionis A, Das G, Mannila H (2008) The discrete basis problem. IEEE Trans Knowl Data Eng 20(10):1348–1362
Zurück zum Zitat Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ (eds) Advances in neural information processing systems (NIPS’01), pp 849–856 Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ (eds) Advances in neural information processing systems (NIPS’01), pp 849–856
Zurück zum Zitat Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818CrossRef Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818CrossRef
Zurück zum Zitat Pinney J, Westhead D (2006) Betweenness-based decomposition methods for social and biological networks. In: Interdisciplinary statistics and bioinformatics, pp 87–90 Pinney J, Westhead D (2006) Betweenness-based decomposition methods for social and biological networks. In: Interdisciplinary statistics and bioinformatics, pp 87–90
Zurück zum Zitat Pons P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms Appl 10(2):284–293CrossRefMathSciNet Pons P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms Appl 10(2):284–293CrossRefMathSciNet
Zurück zum Zitat Pool S, Bonchi F, van Leeuwen M (2014) Description-driven community detection. ACM Trans Intell Syst Technol 5(2):1–28CrossRef Pool S, Bonchi F, van Leeuwen M (2014) Description-driven community detection. ACM Trans Intell Syst Technol 5(2):1–28CrossRef
Zurück zum Zitat van Dongen S (2000) Graph clustering by flow simulation. Ph.D. Thesis, University of Utrecht van Dongen S (2000) Graph clustering by flow simulation. Ph.D. Thesis, University of Utrecht
Zurück zum Zitat White S, Smyth P (2005) A spectral clustering approach to finding communities in graph. In: SIAM international conference on data mining (SDM’05), SIAM/Omnipress, pp 76–84 White S, Smyth P (2005) A spectral clustering approach to finding communities in graph. In: SIAM international conference on data mining (SDM’05), SIAM/Omnipress, pp 76–84
Zurück zum Zitat Xie J, Kelley S, Szymanski BK (2011) Overlapping community detection in networks: the state of the art and comparative study. arxivorg/abs/11105813 Xie J, Kelley S, Szymanski BK (2011) Overlapping community detection in networks: the state of the art and comparative study. arxivorg/abs/11105813
Zurück zum Zitat Yan B, Gregory S (2009) Detecting communities in networks by merging cliques. In: IEEE international conference on intelligent computing and intelligent systems (ICIS’09), pp 832–836 Yan B, Gregory S (2009) Detecting communities in networks by merging cliques. In: IEEE international conference on intelligent computing and intelligent systems (ICIS’09), pp 832–836
Zurück zum Zitat Yang J, Leskovec J (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Leonardi S, Panconesi A, Ferragina P, Gionis A (eds) ACM international conference on web search and data mining (WSDM’13), ACM, pp 587–596 Yang J, Leskovec J (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Leonardi S, Panconesi A, Ferragina P, Gionis A (eds) ACM international conference on web search and data mining (WSDM’13), ACM, pp 587–596
Zurück zum Zitat Zhou H, Lipowsky R (2004) Network Brownian motion: A new method to measure vertex–vertex proximity and to identify communities and subcommunities. In: Bubak M, Albada G, Sloot P, Dongarra J (eds) Computational science (ICCS’04). Lecture Notes in Computer Science, vol 3038, pp 1062–1069 Zhou H, Lipowsky R (2004) Network Brownian motion: A new method to measure vertex–vertex proximity and to identify communities and subcommunities. In: Bubak M, Albada G, Sloot P, Dongarra J (eds) Computational science (ICCS’04). Lecture Notes in Computer Science, vol 3038, pp 1062–1069
Metadaten
Titel
Overlapping community detection in labeled graphs
verfasst von
Esther Galbrun
Aristides Gionis
Nikolaj Tatti
Publikationsdatum
01.09.2014
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 5-6/2014
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-014-0373-y

Weitere Artikel der Ausgabe 5-6/2014

Data Mining and Knowledge Discovery 5-6/2014 Zur Ausgabe

Premium Partner