Skip to main content
Top

2018 | OriginalPaper | Chapter

6. Community Discovery and Identification in Empirical Graphs

Authors : Sławomir T. Wierzchoń, Mieczysław A. Kłopotek

Published in: Modern Algorithms of Cluster Analysis

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter discusses a specific, yet quite important application area of cluster analysis that is of community detection in large empirical graphs. The chapter introduces a number of alternative understandings of the term community, which lead to a diversity of community detection algorithms. Then it presents specific object similarity measures as well as community quality measures (modularity and its derivatives) which require special adaptation or creation of new clustering algorithms to address community detection. In particular, we give some insights into Newman, Louvain, FastGreedy algorithms developed specially for community detection as well as other, like kernel k-means or spectral methods, which were adopted for this purpose. Finally issues of overlapping community and multi-layered community detection are discussed.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
A primary distinction between a graph and a network is that the former is a method of representation of the latter, not necessarily reflecting all the aspects. We say that a network consists of nodes, that may be labelled with various attributes, and of relations between nodes, which may be of various types, and may possess attributes reflecting the presence, absence, degree or missing knowledge about a relation. A graph representing a network would consist of network nodes and would contain edges representing all or some of the types of relations between the nodes.
 
2
like patent networks, journal citation networks.
 
3
co-authorship networks, actor co-staging networks.
 
4
road network, rail-way network, airport network, power supply network, telecommunication network.
 
5
genes, proteins, enzymes etc. viewed as networks of atoms, with chemical bonds as links.
 
6
Compare H. Jeong, B.Tombor, R. Albert, Z.N. Oltvai i A.-L. Barabasi, The large-scale organization of metabolic networks, Nature, 407:651–654, 2000.
 
7
Consult M. Strong, Th.G. Graeber, M. Beeby, M. Pellegrini, M.J. Thompson, T.O. Yeates, D. Eisenberg. Visualization and interpretation of protein networks in Mycobacterium tuberculosis based on hierarchical clustering of genome-wide functional linkage maps. Nucl. Acids Res. 31(24):7099–7109, 2003.
 
8
On June \(16^{th}\) 1998 \(\mathfrak {The\ New\, York\ Times}\) reported that “Mathematicians Prove That It’s a Small World”. The article is available at the address http://​www.​nytimes.​com/​library/​national/​science/​061698sci-smallworld.​html.
 
9
We recommend the Web page http://​www.​weizmann.​ac.​il/​mcb/​UriAlon/​networkMotifsMor​e.​html, dealing with this topic. Notice also that there is an entire research area of frequent pattern mining, analogous to basket analysis, but taking into account links and their types in a graph, e.g. in pharmacology.
 
10
Cf. S. Shen-Orr, R. Milo, S. Mangan & U. Alon, Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics, 31:64–68 (2002).
 
11
A community in a network is usually understood as a cluster in a graph representing some aspect of the network. Therefore, community discovery is usually understood as graph clustering and algorithms of graph clustering are applied also for community detection.
 
12
Cf. e.g. A. Hernando, D. Villuendas, C. Vesperinas, M. Abad, A. Plastino. Unravelling the size distribution of social groups with information theory on complex networks. arXiv:0905.3704v3 [physics.soc-ph].
 
13
see e.g. L. Lü, T. Zhou: Link Prediction in Complex Networks: A Survey. CoRR 1010.0725
 
14
It is easily seen that isolated nodes do not impact the modularity value. It is known that a clustering with maximum modularity does not contain any cluster that consists of a single node with degree 1. Furthermore, there exists always a clustering with maximum modularity, in which each cluster is a connected subgraph. Clustering according to maximal modularity exhibits non-local behaviour on adding new nodes, and in particular it is sensitive to satellite nodes (attached leaf nodes). Generally, the size and structure of clusters in the optimal clustering are influenced by the total number of links in the graph. So structures observed locally in a subgraph may disappear upon embedding into a larger graph. See [88].
 
15
We will discuss the case of directed graphs in point 6.5.
 
16
Whether the resolution limit is an advantage or disadvantage, depends on the application. We may be not interested in having too many groups to consider and in this case resolution limit may prove to be a useful vehicle.
 
17
Goodman, Leo A.; Kruskal, William H. (1954).”Measures of Association for Cross Classifications”. J. of the American Statistical Association 49 (268): 732–764.
 
18
Two paths linking two nodes \(v_i\) and \(v_j\) are node-independent if their only common nodes are the starting and ending nodes of each path that is \(v_i\) and \(v_j\). Similarly edge-independent paths can be defined.
 
19
The graph must be connected, not split into disjoint components.
 
20
Note the complementarity to the Ratio cut criterion from the criterion (5.​67a).
 
21
the ratio of the number of links within a cluster to the number of links leaving that cluster.
 
22
One example is intra-cluster distance measure being the ratio between the number of edges in a cluster to the number of possible edges within that cluster. Another one is ratio association assigned to the set of communities, as the sum of the average link weights within each of communities.
 
23
As an analogy to density based clusters, we can think of cut edge count based clustering as seeking dense probability regions in the n-dimensional space, while Newman’s modularity based clustering as a clustering of dependent dimensions, whereas the info-map type clustering as co-clustering of both; hence differences will emerge.
 
24
S. Paul, Y. Chen: Consistency of community detection in multi-layer networks using spectral and matrix factorization methods. https://​arxiv.​org/​abs/​1704.​07353.
 
25
W. Tang, Z. Lu, and I. S. Dhillon. Clustering with multiple graphs. In Proc. 9th Intern. Conf. on Data Mining, pages 1016–1021, Mianmi, Florida, Dec.2009; as well as Dong, P. Frossard, P. Vandergheynst, and N. Nefedov. Clustering with multi-layer graphs: A spectral perspective. IEEE Trans. on Signal Processing, 60(11):5820–5831, Dec. 2011.
 
26
X. Liu, W. Liu, T. Murata, K. Wakita, A Framework for Community Detection in Heterogeneous Multi - Relational Networks, Advances in Complex Systems 17(6) (2014).
 
27
Beiró et al. [57] proposed a further development of this package.
 
Metadata
Title
Community Discovery and Identification in Empirical Graphs
Authors
Sławomir T. Wierzchoń
Mieczysław A. Kłopotek
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-319-69308-8_6

Premium Partner