nach oben

Social Network Analysis and Mining

Erschienen in:

Open Access 01.12.2016 | Original Article

Close communities in social networks: boroughs and 2-clubs

verfasst von: S. Laan, M. Marx, R. J. Mokken

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2016

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

The structure of close communication, contacts and association in social networks is studied in the form of maximal subgraphs of diameter 2 (2-clubs), corresponding to three types of close communities: hamlets, social circles and coteries. The concept of borough of a graph is defined and introduced. Each borough is a chained union of 2-clubs of the network and any 2-club of the network belongs to one borough. Thus the set of boroughs of a network, together with the 2-clubs held by them, are shown to contain the structure of close communication in a network. Applications are given with examples from real world network data.

This research was supported by The Netherlands Organization for Scientific Research (NWO) under project number (380-52-005 (PoliticalMashup)).

1 Introduction

The last decade has produced an increasing volume of methods and algorithms to analyze community structure in social and other networks, as witnessed by an abundance of recent reviews, e.g. Girvan and Newman (2002), Newman (2004), Balasundaram et al. (2005), Palla et al. (2005), Reichhardt and Bornholdt (2006), Blondel et al. (2008), Leskovec et al. (2008), Porter et al. (2009), Fortunato (2010) and Xie et al. (2013).

In this paper, we study the structure of close communication, contacts and association in networks, as represented by simple graphs. Close communication is defined here as contact between nodes at distances of at most 2, that is by direct contact or by at least one common neighboring node. Such communication is associated with closely knit groups like cliques, coteries, peer groups, primary groups and face-to-face communities, such as small villages and artist colonies. Considered as dense social networks they can form powerful sources of social capital and support for their members and serve both quick internal diffusion of social innovation as well as speedy epidemiological contamination from outside sources.

The parts of a network where close communication can take place are marked by overlapping subsets of nodes, which all are neighbors of each other or have a common neighbor in the same subset. These correspond to graphs with a diameter of at most two.

In the following sections, we shall characterize this structure and indicate ways to detect these in social networks.

Mokken (1979, 2008) introduced the concept of k-clubs of a graph as maximal induced subgraphs of diameter at most k of a simple connected graph G: ’maximal’ in the sense that there is no larger induced subgraph of diameter k which includes them. He also showed that close community networks, in the form of simple graphs of diameter at most two (2-clubs), come in three distinct types: coteries, social circles and hamlets, respectively (Mokken 1980–2011).

Accordingly, the 2-clubs of a simple graph or network G cover the areas of close communication in that network consisting of non-inclusive, possibly mutually overlapping coteries, social circles and hamlets.

In the following sections this system of close communication is studied further and we show that it consists of a set of disjoint containers of nonseparable 2-clubs, i.e. subgraphs that we call boroughs, each of which is formed by a set of edge-chained 2-clubs (hamlets, social circles and coteries) of the network G. Each (nonseparable) 2-club of G is included in exactly one borough of G and each borough consists of a nonseparable union of overlapping 2-clubs of G. Consequently this system of close communication of a network can be analyzed by studying its boroughs and the 2-clubs within each or selected boroughs. The final sections show applications with some real networks and conclude with a discussion.

2 Concepts and notation

As the representation and analysis of networks will be in terms of simple graphs, we will summarize the necessary concepts and notation here (for standard graph-theoretic background see, e.g. Harary 1969, 1994; Wasserman and Faust 1994; Diestel 2005).

A social network will be represented by a simple graph, i.e. an undirected graph $G=G\left( V,L\right)$, without loops or multiple edges, where $V=V\left( G\right)$ is its set of nodes and $L=L\left( G\right)$ is its set of edges $\left( u,v\right) ;u,v\in V\left( G\right) ,$ joining nodes u and v in G. Two nodes u and v are adjacent if the edge $\left( u,v\right) \in L\left( G\right) ;$ notation uv. An edge $\left( u,v\right)$ is incident with its endnodes u and v. Let $\left| V\right|$ denote the size of G, i.e. the number of its nodes and $\left| L\right|$ its number of edges. Unless specified otherwise we shall assume $\left| V\right| =n$ and $\left| L\right| =m$.

A subgraph $H=G(V',L')$ of a graph G is a graph such that all its nodes and its edges are in G:

$$\begin{aligned} V' \subseteq V\left( G\right) \quad{\text { and }}\quad L' \subseteq L\left( G\right) \text {.} \end{aligned}$$

If H is a subgraph of G then G is called the supergraph of H.

If a subgraph G(V’) of G, with $V'\subseteq V$, has all edges (u, v) with (u, v) $\epsilon$ L , then G(V’) is an induced subgraph of G. Unless stated otherwise, we shall use the term subgraph to denote an induced subgraph and consider only subgraphs G(V’) with at least three nodes and three edges.

A path $P_{l}$ is a sequence of distinct adjacent nodes of G, $\left\{ u, x_{1}, \dots , x_{l-1}, v\right\}$, and consecutive incident edges $\left\{ (u, x_{1}), \dots , (x_{l-1}, v)\right\}$ joining two nodes u and v in G.

Its length is the number l of its edges. A chordless path $P_{l}$ is a path such that no two non-successive nodes ($\left| i-j\right| \ne 1$) are adjacent. Two nodes are connected in G if there is a path joining them. The distance $d_{G}\left( u, v\right) = d\left( u, v\right)$ between two nodes u and v of G is the length of a shortest path joining u and v in G . If the nodes are not connected then $d\left( u, v\right)$ is defined as $\infty$.

The diameter $dm\left( G\right)$ of G is the largest distance between nodes in G.

A k-club in G is an induced subgraph of G of diameter at most k (Mokken 1979, 2008). It is a maximal k-club of G if there is no larger k-club in G which includes it. A maximum k-club is one with the largest size in G.

Unless stated otherwise in this paper, k-club, respectively, 2-club, of G will denote a maximal k-club, or 2-club, because k-clubs in G, which are included in larger k-clubs, are not of primary interest here. We shall refer to graphs of diameter at most 2 as 2-clubs.

A cycle of G is a closed path in G where each node is both a starting and an endnode in that path and no node occurs more than once. Its length $\left( l\right)$ is the number of edges (or nodes) of it. The smallest cycle $\left( C_{3}\right)$, a triangle, has length three. A graph with cycles is cyclic. A cycle which is an induced subgraph of G is called a chordless cycle or (for $l > 3$) a hole of G (Nikolopoulos and Palios 2007).

Unless stated otherwise ’cycle’ will denote a $C_{3}$ or a hole of G .

Any edge $\left( u,v\right)$ of a cyclic graph can be a part of multiple cycles, to be denoted as its cycles. Its removal from G can increase some distances between nodes in G . For instance, the distance d(u, v) then increases from 1 to $l-1$ if the length of its shortest cycle is a $C_{l}$.¹

The degree $d_{G}(u) =d\left( u\right)$ of a node u is the number of edges incident with u, which in a simple graph is equal to its number of neighbors. An isolated node has degree 0. A pendant is a node with a single neighbor and has degree 1. The average degree of a graph is $\bar{d}_{G}=\frac{2m}{n}$.

For a connected graph G the degree $d_{G}\left( u\right)$ of a node u takes values in the interval

$$\begin{aligned} 1\le \delta \le d_{G}(u)\le \Delta \le |V\left( G\right) | -1= n-1\text {.} \end{aligned}$$

where $\delta$ and $\Delta$ are the minimum and maximum degrees of G.

A component of a graph is a maximal connected subgraph. A cutpoint is a node, the removal of which increases the number of components, and a bridge is an edge with the same property. A graph with cutpoints is called separable. Connected graphs without cutpoints are called nonseparable (n-s) or, alternatively, 2-connected or bi-connected, and have minimum degree $\delta$ $\ge 2.$ Hence it has no pendants. A bicomponent of a graph is a maximal biconnected subgraph and is part of a component of that graph. Such a (sub)graph is also called a block. Unless specified otherwise we shall assume the simple graph and network to be connected, thus consisting of a single component.

A connected graph with no cycles (acyclic) is called a tree. Each connected graph has at least one spanning tree, i.e. an acyclic subgraph on all nodes of the graph.

A shortest spanning tree (s.s.t.) of G is a spanning tree with the smallest diameter.

In a complete graph $K_{l}$ all l nodes are mutually adjacent and its diameter is $dm\left( K_{l}\right) = 1$. A clique of a graph G is an induced subgraph of G which is a complete graph. It is a maximal clique of G if there is no larger clique in G containing it. A maximum clique of G is one with the largest size.

For a node u $\in V\left( G\right)$ we distinguish:

the k-neighbors of u: $V_{k}(u)$ is the set of all nodes $v\in V(G)$ with $d(u,v) = k, k = 1, ..., dm(G)$. Note that $u\notin V_{k}(u)$.
the k-neighborhood of u: $N_{k}(u)=\bigcup \nolimits _{i=1}^{k} V_{i}(u)$, the set of all nodes $v\in V(G)$ with $d(u,v)=1,2,...,k\le$ dm(G). Note that $u\notin N_{k}(u)$. The closed k-neighborhood of u is defined as $\bar{N}_{k}(u) = N_{k}(u)\cup \left\{ u\right\}$.

The k-degree $d_{G}^{\left( k\right) }\left( u\right) = d_{k}\left( u\right) =\left| N_{k}(u)\right|$ is the size of the k-neighborhood of u. We shall in particular consider the 2-neighborhoods of nodes u of G: $N_{2}(u)$ and $\bar{N}_{2}(u)$. The 2-degree of nodes u of G, $( d_{2}\left( u\right) : u \,\in \,V)$ are bounded by minimum and maximum 2-degrees given by $\delta _{2}\left( G\right) =$ $\delta _{2}$ and $\Delta _{2}\left( G\right) = \Delta _{2}$ in the interval

$$\begin{aligned} 2 \le \delta _{2} \le \Delta _{2} \le |V\left( G\right) | -1 = n - 1 \end{aligned}$$

The k-ego-network of u in G is the subgraph $G(\bar{N}_{k}(u))$ induced by the nodes of its closed k-neighborhood, to be denoted as $\mathcal {E}_{k}^{G}(u)$ (or $\mathcal {E}_{k}(u)$ if the context is clear). Note that u thus is part of its ego-network and is its central ego. We shall in particular consider the ego-networks $\mathcal {E}(u)=\mathcal {E}_{1}(u)$ and $\mathcal {E}_{2}(u)$ of nodes, with sizes $\left| \bar{N}_{1}(u)\right|$ and $\left| \bar{N}_{2} (u)\right|$.

Twinned ego-networks can occur when the ego-networks of two or more nodes $u_{0}, u_{1}, \dots , u_{n}$ coincide:

$$\begin{aligned} \mathcal {E}(u_{0})=\mathcal {E}(u_{1})= \cdots = \mathcal {E}(u_{n}) \end{aligned}$$

thus forming a single ego-network with multiple egos $u_{0}, u_{1}, \dots , u_{n}$. Its ego nodes are called twinned nodes or just twins. The set of central egos $\left\{ u_{0},u_{1}, \ldots , u_{n}\right\}$ of a twinned ego-network forms a clique and is called its center. Each center can be represented by one of its ego-nodes, e.g. $u_{0}$. We can accordingly define a reduced node set $V^{\left( c\right) }\left( G\right)$ as the set of ego-nodes of G, including just a single ego node $u_{0}$ from each twinned ego-network.

Observe that if $\mathcal {E}(u_{0})=\mathcal {E}(u_{1})= \cdots = \mathcal {E}(u_{n})$ then $\mathcal {E}_{k}(u_{0})=\mathcal {E}_{k}(u_{1})=\cdots =\mathcal {E}(u_{n})$ for $k \ge 2$, so that for twinned ego nodes all their k-ego-networks are twinned ego-networks.

Moreover, as nodes can belong to various (sub)graphs, it should be stressed that the relevant ego-network $\mathcal {E}_{k}^{H}(u)$ of a node $u, u\in$ $V(H)\subseteq \, V(G),$ is determined by the particular (sub)graphs H of G for which they are induced.

In this paper close communities, such as acquaintance networks, are studied in the form of simple (sub)graphs of diameter at most 2.² In a close community of that type the closed 2-neighborhood of each of its members covers its complete population: the 2-ego-network ($\mathcal {E}_{2}(u)$) of each node u coincides with the network of that community.

3 Close communities as 2-clubs of a network

Close communities are closely-knit in the sense that every pair of its members are neighbors or has at least one common neighbor, where the neighboring relationship represents a durable or stable acquaintance, contact or association relation. They are modeled by 2-clubs: graphs of diameter at most 2. Mokken (1980–2011) characterized such graphs in terms of the diameter 2 , 3 , or 4 of a shortest spanning tree (s.s.t.), i.e. a spanning tree with smallest possible diameter (assuming $|V(G)|\ge 3$), as a measure of their compactness.

3.1 Close communities: hamlets, social circles, coteries

2-Clubs can only have s.s.t.’s with diameter 2, 3, or 4 corresponding to the following three types:

Coteries A coterie is a 2-club with a shortest spanning tree of diameter 2, corresponding to a spanning star, formed by one central node $u_{0}$, which is adjacent to all other nodes. Hence a coterie is the ego-network $\mathcal {E}\left( u_{0}\right)$ of its central ego $u_{0}$. When a coterie has several s.s.t’s, each with central nodes $u_{0},u_{1},...,$ it is a twinned ego-network with twinned ego nodes $u_{0},u_{1},...$, with the extreme case of a clique (diameter 1) were each node is the center of a spanning star. Thus a clique is a special case of a coterie. The smallest separable coterie is a tree of three points. The smallest nonseparable coterie is $C_{3}$, a triangle (diameter 1).

Social circles A social circle is a 2-club with an s.s.t. of diameter 3. Because every spanning tree with odd diameter has a center consisting of two adjacent nodes (Harary 1969, 1994), a social circle has at least one central pair of neighbours (adjacent nodes) $u_{0}v_{0}$, which together are adjacent to all the other nodes (a coupled star; See Fig. 4 in Mokken 1980–2011). Hence a social circle is a 2-club, such that there is at least one (central) edge $\left( u,v\right)$ with $V_1(u)\cup V_1(v)=V(G)$. The smallest social circle is $C_{4}$, a rectangle (diameter 2).

Hamlets A hamlet is a 2-club with an s.s.t. of diameter 4. Such an s.s.t. (a double, 2-step, star; Fig. 5 in Mokken 1980–2011), can be obtained in two steps from any node of the graph as its center. Hence a hamlet has no central node or a spanning star, nor a central adjacent pair of nodes on a coupled star. Each node can be used as the starting node and center of an s.s.t. The smallest hamlet is $C_{5}$, a pentagon (diameter 2).

We summarize the above observations in the following theorem.

Theorem 1

Three types of 2-clubs.

(i)

A 2-club is either a coterie, a social circle or a hamlet.

(ii)

Social circles and hamlets are nonseparable.

(iii)

Coteries can be separable or nonseparable.

Examples of these types are given in Fig. 1. If a 2-club is separable, then it must have a single spanning star and a corresponding single central node, which is its cutpoint (Mokken 1980–2011, p. 6). Hence it is a separable ego-network and coterie. Thus only coteries can be separable, and then have a single central node, which is its cutpoint (see Fig. 1d), while twinned coteries ares always nonseparable (cf. Fig. 1c)

The smallest nonseparable examples of each type are the cycles $C_{3}$ (coterie), $C_{4}$ (social circle), $C_{5}$ (hamlet).³

The next theorem shows how the types of nonseparable 2-clubs are formed by these cycles.

Theorem 2

Let G be a nonseparable 2-club, then for each edge of G its shortest cycle is $C_{3}$, $C_{4}$, or $C_{5}$, and

(i)

if G is a coterie: for each edge this is a triangle $C_{3}$;

(ii)

if G is a social circle: for each edge this is a triangle $C_{3}$ or a rectangle $C_{4}$;

(iii)

if G is a hamlet: for each edge this is a triangle $C_{3}$, a rectangle $C_{4}$, or a pentagon $C_{5}$.

Proof

No edge of G can be on a shortest cycle $C_{k}$ for $k > 5$ because then its diameter would be larger than 2.

(i)

coterie: G has a shortest spanning tree (s.s.t.) consisting of a central node adjacent to all other nodes of G . Hence all other edges joining nodes of G are on at least one triangle $C_{3}$ of G;

(ii)

social circle: G has an s.s.t. consisting of a central edge $\left( u,v\right)$ with $V_1(u)\cup V_1(v)=V(G)$. Hence any other edge joining nodes of G forms either a triangle with node u, v or edge (u, v), or a rectangle on the edge $\left( u, v\right)$;

(iii)

hamlet: G has an s.s.t. such that all other edges joining nodes of G are on triangles, rectangle, or pentagons.

$\square$

We will call these cycles basic cycles and will denote the set of cycles $\lbrace C_{3}, C_{4}, C_{5} \rbrace$ of a 2-club as its set of basic cycles $\mathcal {C}_{\left[ 3,5\right] }$, or just its basic set.

More general: we shall call the set of cycles of type $\lbrace C_{3}, C_{4}, C_{5} \rbrace$ of a graph G its set of basic cycles, or basic set $\mathcal {C}_{\left[ 3,5\right] }$, which form its nonseparable 2-clubs, as we shall see below. Moreover, we shall denote by basic edge of G any edge (u, v) of G which is on at least one basic cycle of G.

3.2 The 2-clubs of a graph or network

A k-club of a simple graph G is a maximal induced subgraph of diameter at most k (Mokken 1979, 2008). It was introduced as a generalized clique concept to distinguish it from k-cliques of a graph, which were defined as maximal clusters of nodes of a graph within distance k in the distance matrix of that graph (Luce 1950). However, considered as subgraphs k-cliques need not be connected, whereas k-clubs, due to the diameter restriction, are warranted to be connected subgraphs. The k-clubs of a network correspond to locally autonomous subnetworks in the sense that their nodes can communicate or reach each other within distance k along paths involving only nodes of the k-club, and not outside nodes in the larger supernetwork, as would be the case for k-cliques. Occasionally a k-club happens to be a k-clique as well, in which special case it is called a k-clan.⁴

Recently k-clubs have found interest and applications in many network oriented disciplines, such as telecommunication (Balasundaram and Butenko 2006), biology (Balasundaram et al. 2005), genetics (Pasupuleti 2008), forensic data mining (Memon and Larsen 2006), web search (Terveen et al. 1999), graph mining (Cavique et al. 2009), and language processing (Miao and Berleant 2004; Gutiérrez et al. 2011).

Above we defined close communication, contact or association in connected networks and graphs as connectedness along paths of at most length 2 and, accordingly, close communities as 2-clubs of a graph or network.

As such the concept of 2-clubs of a network is of central importance for the analysis of close communities and close communication structures in networks. In that analysis, however, the first type of 2-club (ego-network or coterie) is of subordinate interest compared to the other two, the social circles and hamlets.

Three types, three levels of close communication: These three types of 2-club imply different perspectives or levels of local communication:

Level 1 and most local (ego-network or coterie). Coteries in a graph are rather restricted forms of close communities, as they correspond to just all the ego-networks of a graph, which define and span that graph. There is a central ego node and all close communication is possible via that ego within its ego-network: tightly meshed, involving triangles only. Thus every ego-network $\mathcal {E}_{1}(u)$ of G is a rather trivial coterie in G with its ego node(s) u as the center of a spanning star joining all its neighbors. However, only when it is not included in a larger 2-club of G, and therefore maximal, it is a coterie of G.
Level 2: intermediate (social circle). There is no central node but instead at least one central pair of nodes: two adjacent neighbors, forming a central edge, which together are adjacent to the other nodes of the social circle. All close communication is possible via two central neighbours within (parts of) their ego-networks: more loosely meshed, along triangles and rectangles.
Level 3 and widest (hamlet). In hamlets there are no central nodes or central pairs of nodes and all close communication is between (parts of) ego-networks, widely meshed along triangles, rectangles, pentagons.

Thus coteries are limited forms of close communities, as they correspond to (maximal) ego-networks of G , varying from stars to cliques. Moreover, any ego-network which, as an induced subgraph of G, is separable, i.e. has a cutpoint, is a coterie of G, because any subgraph of G containing that ego-network will have diameter larger than 2, as can be verified easily.

In particular. every pendant of G promotes the ego-network of its single neighbor to a coterie of G (cf. Fig. 1d). Again, long isolated paths $P_{l}$ in G are formed by overlapping path segments $P_{3}$, which are overlapping separable coteries of G , consisting of one central node and two pendant neighbors.

As a consequence any graph or network will also show a multitude of rather trivial (separable) coteries. Hence, from a perspective of close communication, the coteries of a network G are relatively elementary, if not trivial, 2-clubs as such, when compared with the social circles and hamlets of G. They are confined to the level of local communication within their ego-network, whereas the hamlets and social circles involve the wider levels of close communication between and across (parts of) different ego-networks of G.

Our main focus will be on the more proper types of 2-clubs: social circles and hamlets. Moreover, we will consider only 2-clubs with at least three nodes and three edges.

3.3 The boroughs of a graph or network

The nonseparable 2-clubs of a network or graph G are contained in the bicomponents of G. We shall now introduce a new type of maximal subgraph of G, always contained in a single bicomponent of G, which we call a borough of G.

The main result of this section is that each borough contains nonseparable 2-clubs of G, that each nonseparable 2-club of G is a nonseparable 2-club of exactly one borough of G, and that both nonseparable 2-clubs and boroughs consist of edge chained basic cycles: $C_{3}$ (triangles), $C_{4}$ (squares) and $C_{5}$ (pentagons).

Two cycles of a graph G are said to be edge connected when they share at least one common edge.

More generally: a pair of basic cycles $C^{\alpha }, C^{\beta } \in \mathcal {C}_{\left[ 3, 5\right] }$ of G is edge chained in G if they are edge connected, or there is a sequence of edge connected basic cycles $C^{1}, \dots , C^{i}, \dots , C^{c}\in \mathcal {C}_{\left[ 3, 5\right] }$ of G, such that $C^{\alpha }$ is edge connected with $C^{1}$, and $C^{c}$ with $C^{\beta }$, and each intermediate consecutive pair of basic cycles $C^{i}$, $C^{i+1}\in \mathcal {C}^{\left[ 3, 5\right] }$ is edge connected as well.

Using these concepts we can now define a borough in a graph.⁵

Definition 1

A borough in a graph G is an induced subgraph of G such that each of its edges is on a shortest cycle $C_{s}\,\in \,\mathcal {C}_{[3,5]}$, the basic set of G, and all pairs of its basic cycles are edge chained in G.

A borough of G is maximal, denoted as a borough (B) of G, if it is not contained in a larger borough of G. Unless specified otherwise we shall consider only maximal boroughs of G .

A nonseparable 2-club is a special case of a borough as stated in the next proposition:

Proposition 1

A nonseparable 2-club is a borough of diameter at most 2.

Proof

Let G be a 2-club. By Theorem 2 every edge of G lies on a basic cycle. Let $C^{i}$ and $C^{j}$ be two basic cycles of G which are not edge connected in G and let $e_{i}$ and $e_{j}$ be two edges of $C^{i}$ and $C^{j}$ respectively. As G has diameter 2 and is nonseparable, the endnodes of both edges must have common neighbours in G, on intermediate edge connected basic cycles which are edge connected with $C^{i}$ and $C^{j}$, thus establishing the edge chained connection between them. Hence G is a borough. $\square$

Analogously, two nonseparable 2-clubs are said to be edge connected when they have at least one common edge. Two nonseparable 2-clubs $H_{0}$ and $H_{k}$ are edge chained if they are edge connected or there is a sequence of edge connected 2-clubs $H_{0},H_{1},..,,H_{i},..H_{k}$, such that each consecutive pair $H_{i}$, $H_{i+1}$ is edge connected.

Thus we can state the following proposition without proof.

Proposition 2

A set of pairwise edge chained nonseparable 2-clubs is a borough.

An example of a borough, formed by the three edge chained 2-clubs of Fig. 1 is given in Fig. 2.

Thus boroughs and nonseparable 2-clubs of G are formed by cycles in its basic set $C_{\left[ 3, 5\right] } =$ $\left\{ C_{3}, C_{4,}, C_{5}\right\}$.

Properties of boroughs of G Taking into account the maximality of boroughs of G, a number of properties follow from these definitions and previous results. These are listed below as corollaries. They show that, from a perspective of close communication, boroughs can be seen as larger supercommunities packing or hosting close communities in a social network.

Corollary 1

Given the maximality and nonseparability of boroughs of G:

(i)

Every borough of a network G is contained in exactly one bicomponent of G;

(ii)

Two boroughs of G can not share a basic cycle of G, and each basic cycle of G is part of only one borough of G.

(iii)

Each edge of a borough of G is on a basic cycle of G and all its basic cycles are part of that borough of G only. Hence a basic edge of G is a basic edge of one and only one borough of G.

(iv)

Any edge of G, which is not part of any borough of G, is not a basic edge of G but either a bridge or on a shortest cycle $C_{l}$; $l \ge \ 6$ and part of the outback of G.

(v)

Thus the boroughs of G are edge induced subgraphs of G: they are induced by the basic edges of G. The outback of G is induced by the non-basic edges of G.

The maximality and non-separability of boroughs also imply:

Corollary 2

Every nonseparable 2-club of a network G is a 2-club of exactly one borough of G.

Note that a nonseparable 2-club of a graph G is either itself a borough of G, or part of just one borough of G. If it would have been part of two boroughs of G these would share a common edge and could be merged into a larger borough, which contradicts their required maximality.

However, the reverse of Corollary 2 is true only for social circles and hamlets and not for nonseparable coteries, conform the following theorem:

Theorem 3

Let B denote a borough of G.

(i)

A subgraph of G is a hamlet or social circle of G if and only if it is a hamlet or social circle of the corresponding borough B of G.

(ii)

A coterie of a borough B of G is either itself a coterie of G or included in a larger coterie of G.

Proof

(i)

If: let B be a borough of G and assume that a hamlet (or social circle) of B is not a 2-club of G . Then, as it is a 2-club in G , it must be contained in a larger 2-club of G.

(a)

that 2-club cannot be a nonseparable 2-club of G , because then B would not be maximal in G;

(b)

if that 2-club is separable, then it must be a coterie of G with a unique central node, adjacent to all the other nodes of the 2-club (see Sect. 3.1). But then that node must also be a node of B and the hamlet or social circle would be a coterie of B instead, contrary to assumption.

(i)

Only if: follows directly from Corollary 2.

(ii)

If a coterie of a borough B of G is not also a coterie of G, then it must be included in a larger 2-club of G. That must be a coterie of G, as by (i) it cannot be a hamlet or social circle of G, because then it would be included in the same social circle or hamlet of B as well, contradicting its maximality in B. Thus a non-separable coterie of B can be included in a larger, separable coterie of G and therefore, though maximal in B and sharing the central node, is not a coterie of G.

$\square$

A fictitious and elementary illustration is given with Fig. 3a which gives an example of a simple graph G with 29 nodes.

Considering only 2-clubs of at least three points and three lines, a straightforward count of the 2-clubs of the simple graph G of Fig. 3a results in one hamlet, 5 social circles and 13 coteries of G. The hamlet is the pentagon (C5) formed by nodes {13, 14, 16, 17, 18}. The social circles are identified by:

central pairs: (w, 1), (w, 6); size: 6;

central pairs: (6, 7), (6, v), (7, u); size: 5;

central pairs: (20, 19), (20, 17), (19, 18); size: 5;

central pairs: (12, 15), (15, 14), (12, 13); size 5;

central pairs: (12, 11), (11, 10), (12, v); size: 5

Graph G has 13 coteries, of which one coterie is the nonseparable ego-network of node 3, which as a 2-club is maximal.

The other 12 coteries of G are the separable ego-networks of the nodes: 1, 7, 6, u, v, 17, 18, 13, 14, 12, 21 and 24.

Note that the ego-network of node w is nonseparable but not a 2-club of G, because it is included in the social circle 1 of G.

As shown in Fig. 3b graph G has two boroughs: one indicated with red-bold edges and the other with blue-solid edges. Its outback is given with black-dashed edges.

If we consider only the close community area covered by these two boroughs of G, we note that all non-separable 2-clubs of G are contained by the two boroughs:

the top red-bold borough has two social circles, which are the social circles 1 and 2 of G, and its nonseparable coterie of node 3;
the bottom blue-solid borough has three social circles: the social circles 3, 4 and 5 of G, as well as its hamlet.

Boroughs of a graph G can have one or more common points, to be called touch points, as illustrated in Fig. 3b by the neighbour nodes (u,v). The extra bold red edge (u,v) belongs to the top red-bold borough only, as it is part of its basic cycle formed by edges u-v-6-7-u. Its other cycle u-v-12-13-18-19,u is a hole of G of 6 nodes and edges, and not a basic cycle of the lower blue-solid borough. Would that hole have been a basic cycle instead, then the two boroughs would have been edge chained and, due to the required maximality for boroughs of G (see Definition 1), form a single borough of G. Thus edge (u,v) is a basic edge for the top red-bold borough only.

More general: if touch points of a graph G are on common basic cycles of G, then, due to maximality of boroughs of G, all these basic cycles belong to the same borough of G.

Referring to the particular nature of ego-networks as coteries (see conclusion Sect. 3.2), we see in Fig. 3 that the separable coteries of G are not fully covered by its boroughs. For instance, the ego-network of touch points between boroughs, or between boroughs and outback of a graph, are separable and therefore coteries of that graph. This is illustrated above for the ego-networks of nodes u and v. Though both are coteries of G, they are dissolved as such, because their edges are distributed over the two boroughs and, for node v, the outback of G. Again, the ego-networks of nodes 1 and 14, reduced by missing outback edges, are also (separable) coteries of their boroughs of G, but not of graph G itself, because they are included in the corresponding unreduced (separable) coterie of G.

Moreover, the separable coteries in the outback of a graph, mainly stars, such as the ego-networks of nodes 21 and 24 in Fig. 3b, will be ignored by a focus on the boroughs only.

However: all of the coteries of the boroughs are either also coteries of G, or included in a separable coterie of G sharing its ego node:

for the red-bold borough: its three separable coteries are either also coteries of G (see nodes 6 and 7) or included in a coterie of G, e.g.. that of node 1;
for the blue-solid borough: its 5 separable coteries are also coteries of G: those of nodes 17, 18, 13 and 12, or included in a coterie of G with the same ego node, e.g. node 14, and its only nonseparable coterie of node 3 coincides with its nonseparable coterie of G.

Consequently a graph or network G can be partitioned into its boroughs and its outback, where its basic edges and their basic cycles induce the borough structure of G, which contains its areas of close communication and its 2-clubs as close communities, while its non-basic edges determine its outback of more remote communication. So, when the main focus of the analysis of a graph or network is on its close community structure, one might as well ignore its outback part and focus on its boroughs and the 2-clubs contained by them.

Lastly, it is well known that removal of an edge from a nonseparable graph can increase its diameter. The next corollary shows that for boroughs this increase is limited to at most three.

Corollary 3

Let B be a borough of G and let $B-(u,v)$, denote the subgraph obtained by removing an edge (u, v) from B, then

$$\begin{aligned} dm(B)\le dm(B-(u,v)) \le dm(B)+3. \end{aligned}$$

Proof

Consider the set of all shortest paths containing (u, v) defining distances between pairs of nodes of B. Note that such pairs can also be joined by alternative shortest paths in B not containing (u,v).

As (u,v) is on a basic cycle, its removal extends the distance between the nodes u and v by 1, 2, or 3 along the remaining part of the basic cycle(s) on (u,v). Thus, all distances between pairs of nodes of the set, and the diameter of $B-(u,v)$, increase by at most 3. $\square$

An increase of the diameter by exactly 3 implies that the removed edge is on a basic $C_{5}$ of a hamlet of B, as illustrated by Fig. 4 for the removal of the bold-lined edge.

Summary We conclude that the set $\mathcal {B}\left( G\right)$ of boroughs of G contains the proper 2-clubs of G, as distributed across and within its boroughs. Thus, where we defined 2-clubs as the basic type of close community in a network, we can see the boroughs to which they belong as a supercommunity in that network, enveloping chained sets of such close communities.

Moreover, conform Footnote 2 these concepts and results can be extended to the general case of diameter k. Corresponding k-clubs (diameter at most k) and k-boroughs are both formed from the basic set $C_{\left[ 3, 2k+1\right] } = \left\{ C_{3}, \dots ,C_{2k+1}\right\}$ of basic cycles with diameter 3 to k.

For instance, since the early days of social network analysis (SNA) triangles and triad censuses have been of central importance in the analysis of social networks (e.g. Holland and Leinhardt 1970, 1971; Davis and Leinhardt 1972; Johnsen 1985; Frank 1988; Watts and Strogatz 1998). Such ‘very local structures’, Faust (2007) of direct communication between neighbours in triads, correspond to 1-clubs and 1-boroughs, as formed by triangles only (e.g. 3-cliques $\left( K_{3}\right)$ as in Palla et al. 2005). It is not difficult to see that the 1-clubs and 1-boroughs of a network are nested in the 2-clubs and (2)-boroughs which are the subject of this paper.

In other, e.g. topological, contexts 3-clubs (at most diameter 3) and corresponding 3-boroughs will require for their formation the smallest 3-clubs hexagon ($C_{6}$) and heptagon ($C_{7}$) in the corresponding basic set $\left\{ C_{3},..,C_{7}\right\}$. A rather special case is that of a ‘football’ type of graph, a single borough, consisting of pentagons and hexagons only, so that all its 3-clubs are formed by just two types from the basic set: the pentagon $\left( C_{5}\right)$ and hexagon $\left( C_{6}\right)$.

4 Some applications

With the introduction of k-clubs (Mokken 1979) it was pointed out that in practice their search, detection, and identification in other than small networks would be a hard, if not prohibitive, computing task, at the time beyond available hardware and algorithmic capabilities, such as the clique algorithm of Bron and Kerbosch (Bron and Kerbosch 1973).

Later results in computational complexity theory demonstrated that for $k\ge 2$ several variants of k-club detection were NP-hard (e.g. Bourjolly et al. 2002; Balasundaram et al. 2005), such as, for instance, finding a k-club of size larger than $\Delta (G)+1$ (Butenko and Prokopyev 2007), or more generally, for a given k-club, finding a larger k-club containing it (Pajouh and Balasundaram 2012).

4.1 Finding boroughs and 2-clubs

Despite these theoretical limits, in the last decade resources and algorithm theory have made significant progress toward workaround, heuristic and practical detection algorithms to detect k-clubs (Bourjolly et al. 2000, 2002; Pasupuleti 2008; Asahiro et al. 2010; Yang et al. 2010; Carvalho and Almeida 2011; Schäfer et al. 2012; Chang et al. 2012; Veremyev and Boginski 2012; Hartung et al. 2012, 2013; Pajouh and Balasundaram 2012; Shahinpour and Butenko 2012). Most of these algorithms find either k-clubs of at least a given minimum size in a given graph G, or the maximum (i.e. largest in number of nodes) k-club of G.

These sources inspired us to develop some specific software modules enabling us to detect both boroughs and 2-clubs in a simple graph. From a perspective of community detection and network analysis it is more interesting to find all (inclusion-wise maximal) 2-clubs than just the largest 2-clubs (i.e. maximum cardinality). Hence our approach of detecting 2-clubs in a graph was designed to achieve or approximate that purpose within the limits of available computational capabilities.

In doing so we made use of the crucial intermediate position of the boroughs, as separate components of a graph or network, hosting its edge-chained 2-clubs: its hamlets, social circles and (part of) its coteries (see Theorem 3). This suggested a two step approach to finding all 2-clubs: first find all boroughs, then find all 2-clubs inside boroughs, taking into account the proviso at the end of Theorem 3 concerning the special nature of coteries of a network and of its boroughs.

We thus developed algorithms, conform to Definition 1, to detect all boroughs of a graph by joining and chaining cycles from its set of basic cycles (triangles, rectangles and pentagons), using available methods of finding all cycles (e.g. Tiernan 1970; Weinblatt 1972; Fosdick et al. 1973; Johnson 1975 or, more specifically, of finding only cycles of given length (Alon et al. 1994; Yuster 2011).

Another set of algorithms was developed for finding all 2-clubs of a graph, sorted by type (coterie, social circle or hamlet).

Thus we could also detect the 2-clubs in separate selected (e.g. the largest) boroughs of a network.

The (usually numerous and overlapping) 2-clubs that are found are stored in a database, per borough classified according to the three possible types/level of close communication (coteries, social circles, hamlets) and per type sorted according to size. They can then be inspected and analyzed by a Viewer interface. Details are given in Laan (2012) and in the available open source licensed package by Laan (2014).

4.2 Some real network results

In this section, we illustrate the concepts introduced above with some datasets, chosen to cover different data domains as well as to provide some analytic perspectives. The different data domains and associated network data are:

the well-known small network of Zachary’s karate club;
corporate board networks as given by the interlocking directorate networks for the top 300 European firms for the year 2010;
co-authorship data taken from the large DBLP dataset.

The Zachary data will illustrate the perspectives of 2-club analysis at the micro scale of small face-to-face networks.

The European corporate network concerns a much larger network and the ensuing multitude of 2-clubs thus changes the analytic perspectives.

Finally we investigated the distribution of boroughs and their sizes in much larger datasets, as illustrated for the DBLP co-authorship data set.

4.2.1 Zachary’s karate club

Zachary (1977), a well known dataset, concerned a voluntary association, a university student karate club, with a total membership of about 60 persons/nodes. Zachary analyzed a valued network, where edges denoted the observed number of 8 types of mutual interaction outside karate lessons. He restricted his analysis to the main component of 34 interacting members, thus disregarding 26 other non-connected or isolated members. After conflicting views two factions polarized around two main opponents—node 1 (labeled ‘Mr. Hi’, the karate teacher) and node 34 (‘John A.’, president and main officer)—and subsequently split accordingly. Zachary predicted the composition of the splits using a max-flow–min-cut algorithm.

For this paper we reanalyzed his data in the form of a simple undirected graph with an edge indicating at least one of the 8 types of interaction. First we used a standard SNA package (Borgatti et al. 2002) and then applied our borough and 2-club detection algorithms to the relevant components. This simple connected network of 34 nodes has diameter 5 and consists of two bicomponents (size 27 and 7), separated by a common cut point (node 1: Mr. Hi), and a pendant (node 12) attached to node 1 (Mr. Hi) as well. Both bicomponents prove to be boroughs and node 1 (Mr. Hi) is a member, and touch point, of each of these. Thus the ego-network of cutpoint Mr. Hi is a coterie in the larger graph and distributed over the two boroughs and the pendant node 12. Given the face-to-face nature of this (subset of) a student association, it is not surprising that all its edges, except the pendant (1,12), are part of at least one 2-club in just one of the two boroughs.

The smallest borough of size 6 contains, in addition to touch node 1 (Mr. Hi), nodes 5, 6, 7, 11, and 17, which were not further considered by Zachary. It has diameter 2 and thus is a 2-club (a social circle) and a trivial borough as such.

The second, largest, borough of size 27 corresponds to the network analyzed in Zachary’s paper. This borough has diameter 4 and contains 13 2-clubs including 4 coteries (all separable), 8 social circles, and one hamlet. They are listed in Table 1.

The two opposing leaders, Mr. Hi (node 1) and John A. (node 34), are both part of just three 2-clubs:

the 7-node ego-network (coterie) of node 32;
a social circle of size 14 (with central pair 34-14); and
the hamlet of size 8.

The latter two 2-clubs are depicted in Fig. 5.

Moreover, each of the two opposing nodes (1 or 34) are part of five 2-clubs excluding the other opponent. Hence, all 2-clubs contain at least one of the two opponents: node 1 or node 34. In particular the 14 node social circle and 8 node hamlet look like negotiation forums of the two opposing sides. For instance, the hamlet of 8 nodes connects the central egos (nodes 1, 34, 3, and 32) of the 4 coteries.

Table 1

List of all 2-clubs in the large 27-node borough of Zachary’s Karate Club

Type	Size	List of members
Coterie	7	[1, 25, 26, 29, 32, 33, 34]
Social circle	8	[24, 26, 28, 29, 30, 32, 33, 34]
Social circle	8	[3, 24, 25, 28, 29, 32, 33, 34]
Social circle	8	[24, 25, 26, 28, 29, 32, 33, 34]
Hamlet	8	[1, 3, 25, 28, 29, 32, 33, 34]
Social circle	10	[1, 2, 3, 4, 8, 9, 14, 29, 32, 33]
Social circle	10	[1, 2, 3, 4, 8, 9, 14, 31, 32, 33]
Coterie	11	[1, 2, 3, 4, 8, 9, 10, 14, 28, 29, 33]
Social circle	11	[1, 2, 3, 4, 8, 9, 14, 18, 20, 22, 31]
Coterie	12	[1, 2, 3, 4, 8, 9, 13, 14, 18, 20, 22, 32]
Social circle	14	[1, 2, 3, 4, 9, 10, 14, 20, 28, 29, 31, 32, 33, 34]
Social circle	17	[3, 9, 10, 14, 15, 16, 19, 21, 23, 24, 28, 29, 30, 31, 32, 33, 34]
Coterie	18	[9, 10, 14, 15, 16, 19, 20, 21, 23, 24, 27, 28, 29, 30, 31, 32, 33, 34]

For each two club, the type, size and the members are given. The leaders of the two parts after the split (1 and 34) are indicated in boldface

These four coteries represent the ego-networks of the two opposing nodes 1 (Mr Hi: size 12) and 34 (John A.: size 18), and the nodes 32 (size 7) and 3 (size 11), where node 3 appears to be a supporting ‘lieutenant’ node for Mr. Hi (node 1) and node 32 for his opponent John A. In terms of their 2-club memberships both nodes show extensive liaison connections with the opposing side.

Membership of particular 2-clubs appears to be a good predictor for faction membership after the split. To keep within the bounds of this paper, we can illustrate this with the problematic mysterious node 9, the only node mentioned explicitly by Zachary in his paper, apart from Mr. Hi (1) and John A. (34). Node 9 was problematic in the sense that he was classed as a (mild) supporter of the side of 34 (John A.), but in the end showed up as a member of the opposing faction of Mr. Hi after the split. However, this move can be understood by an analysis of his 2-club memberships.

Node 9 was member of eight 2-clubs, each of which included at least one of the two opponents 1 or 34, and distributed as follows:

five 2-clubs with only node 1 (Mr. Hi);
two 2-clubs with only node 34 (John A.); and
one 2-club with node 1 and node 34.

Moreover, in 7 of its eight 2-clubs node 9 is accompanied by node 3, its neighbor and firm Mr. Hi supporter, as we noted above.

So, on the basis of its 2-club affiliations alone, one would have predicted node 9 to move (or stay) with the faction of Mr. Hi after the split, as in fact he did. The 2-club analysis also revealed liaison roles of two nodes, not mentioned as such in the Zachary paper: node 3 for Mr. Hi (node 1) and node 32 for John A. (node 34).

4.2.2 European corporate network 2010

This network was constructed from the interlocking directorates of the largest 286 stock listed companies, as studied by Heemskerk (2013).⁶ Its nodes designate the boards of individual companies and its edges indicate that the companies they connect share at least one common director in their boards. Hence the network provides channels of interpersonal contact and communication between companies at the level of their boards.

The source network for these 286 companies had one giant component of 259, which we chose for further analysis. Apart from three trivial ’boroughs’ of sizes 4, 3 and 3, we found one single giant borough of 225 companies.

This single borough, covering 87 % of the dominant component and 79 % of all firms, formed a compact sub-network in the corporate European network of 2010, consisting of 2128 overlapping or edge-chained 2-clubs of corporations, with a median 2-club size of 10 corporations in a size range of 4–27. This result confirms Heemskerk’s (2013) original conclusion that by 2010 this European network appeared to be well integrated. However, with a diameter of 7 the borough was rather stretched.

Table 2

European corporate borough 2010: 2-clubs of borough and two of its major companies compared

European borough 2010	Cot.	Soc. circ.	Hamlet	Total
All 2-clubs borough	138	717	1273	2128
% of total 2-Clubs Borough	6.5 %	33.7 %	59.8 %	100 %
Size range	4–27	5–25	5–24	4–27
Median size	10	14	16	15
Coverage nodes borough*	99.6 %	89.4 %	92.5 %	100 %
Compagnie Nationale à Portfeuille SA	15	284	691	990
% of type 2-club borough	10.9 %	39.6 %	54.3 %	46.5 %
BNP Paribas SA	12	105	350	467
% of type 2-club borough	8.7 %	14.6 %	27.5 %	21.9 %
Cie Nat. à Portefeuille or BNP Paribas	25	332	752	1109
% of total 2-clubs borough	18.1 %	46.3 %	59.1 %	52.1 %
Common 2-clubs	2	57	289	348
% Cie Nat. à Portefeuille of BNP Paribas	16.7 %	54.3 %	82.6 %	74.5 %

* Coverage: % of nodes of borough (100 % = 225)

The multitude of 2-clubs, to be expected for large networks, can be analyzed by means of views and selections from their database. Table 2 shows some results.

The first upper part of Table 2 gives the distribution of these 2128 2-clubs of the European borough over the three types and levels of close communication.

The first, most local level of communication, the coteries, formed 6.5 % of the 2-clubs of the borough. They were the ego-networks of 138 central companies: 62 % of the 225 companies in the borough. Together the nodes of these coteries covered practically all (coverage: 99.9 %) of the companies (nodes) of the borough.

The ego-networks of the other 85 companies were not coteries of the borough but were included in or split over larger 2-clubs. That was, for example, the case for the German automotive company Volkswagen AG and two large banks: the German Deutsche Bank AG and the Spanish Banco Santander SA.

Thus this set of 138 coteries formed the local backbone of the borough, as the two next levels of more extended close communication, social circles and hamlets, are formed from parts of the ego-networks of their central companies. Their composition strongly suggests a predominance of French 2-clubs in the borough: the 20 largest coteries consist of the ego-networks of 12 French, 3 German, 2 British firms, and a Swedish, a Belgian and a Swiss company.

The second intermediate level of social circles was formed by one-third (717: 33.7 %) of the 2-clubs of the borough, with a median size of 14 companies in a size range of 5–25. Together the social circles cover 89.4 % of all nodes of the borough.

The composition of the largest social circles again confirms the predominance of the largest French companies in the network. They were formed around one or more central pairs of major French companies (i.e. pairs of central neighbors adjacent to all others in the 2-club).

Figure 6 gives a detail of a large social circle of 25 companies, mainly French, with 19 French firms, 1 Franco-Belgian, 2 British, 2 Dutch, and 1 Luxembourg. It shows its densest part around two central pairs, one formed by two French companies, Total SA and GDF Suez SA, the other by Total SA and the Belgian company Compagnie Nationale à Portefeuille SA.

The third and widest level of close communication, the hamlets, occupied a major part (1273: 59.8 %) of the 2-clubs of the European borough, with a median size of 16 in a size range of 5–24. Altogether the hamlets of the borough cover nearly all, i.e. 92.5 % of its 225 nodes. An example is given by Fig. 7 to which we will return later.

Heemskerk (2013, p. 91) cites Compagnie Nationale à Portefeuille SA, a Belgian investment holding of the Frère family, as most involved in European interlocks, with 17 European and 2 national (Belgian) interlocks. We investigate this conclusion further in terms of its participation in major 2-clubs of the European borough of corporate interlocks.

The second, lower part of Table 2 summarizes this analysis. It was a member of almost half (990: 46.5 %) of the 2-clubs of the borough. At the most local level it was a member of 15 (10.9 %) of the coteries of the borough. Of these 15 ego-networks, identified by their central (ego) company and ordered by size, it was the center of the sixth coterie (size 22). Of the other 14 coteries ten were large French companies, two were Luxembourg based, followed by another Belgian company and a German company.

This predominant francophone orientation suggests that it is more part of the French regional network than of a cross-European one. Widening the level of close communication to its (Compagnie Nationale à Portefeuille) membership of social circles and hamlets of the borough confirmed this impression: it was part of 284 social circles (39.6 %) and 691 (54.3 %) hamlets. Among the largest social circles it formed part of one or more central pairs with the five largest French companies, as illustrated by Fig. 6, where it forms one of the two central pairs with the French company Total SA.

We therefore studied its 2-club memberships together with those of the largest French bank: BNP Parisbas SA, as summarized in the lower part of Table 2. BNP Paribas SA itself was included in 467 (21.9 %) of the 2-clubs of the borough. The combined membership of Compagnie Nationale à Portefeuille or BNP Parisbas accounted for 1109 (52.1 %), or more than half of the 2128 2-clubs in the European borough. In 348 of those they participated together. Consequently Compagnie Nationale à Portefeuille participated in almost three quarter (74.5 %) of the 2-clubs to which BNP Parisbas belonged.

Hence in terms of 2-club memberships the Belgian Compagnie Nationale à Portefeuille was clearly a part of the center of the French corporate sub-network in 2010.

Subsequent developments appear to support this conclusion. The controlling Belgian holding ERBE, for 53 % owned by the Belgian Frère family and for 47 % by BNP Parisbas, removed Compagnie Nationale à Portefeuille from the Belgian stock exchange on 2 May 2011, after a successful bid for outstanding stock. This appeared to be part of a familial succession strategy and an agreement allowing BNP Paribas to withdraw from ERBE. In a press release of December 10, 2013 BNP Paribas announced its completion of this arrangement through the purchase by the Frère Group of the entire BNP Paribas shareholding in ERBE.

As the second firm, most involved in European interlocks, with a reported 14 European and 2 national interlocks, Heemskerk (l.c) cites ABB Ltd, a Swiss based multinational corporation operating mainly in power and automation technology, such as robotics. In this case his conclusion appears to be fully supported by investigating its 2-club memberships in the corporate European borough for 2010. Not surprisingly it was central ego of a coterie (size 17) of the borough, consisting of companies from six European nations: 3 German, 2 French, 3 Swiss, 5 Swedish, 3 Dutch and 1 Finnish.

ABB Ltd participated in 64 social circles (size 7–21): the first 5 largest social circles solidly German with central pairs from the largest German companies. After those follow a number of mainly Swedish social circles and a number of social circles of mixed nationality.

At the widest level of close communication ABB Ltd participated in 75 hamlets (size 5–19) of different nationalities. An example is given with the hamlet of Fig. 7, containing thirteen firms: three French, four British, one Swedish and one Swiss, ABB Ltd itself.

For a more elaborate analysis, beyond the scope of this paper, we refer to Mokken and Laan (2015).

4.2.3 Boroughs in DBLP co-authorship networks

We use the DBLP database of Computer Science publications to obtain some insights on the availability of boroughs in large real live networks.⁷ DBLP can be seen as a bipartite network consisting of authors and publications as nodes connected by the ‘is author of’ relation. For a given integer threshold t, we induce an undirected co-authorship network between authors from the DBLP network by relating two author nodes when they have coauthored at least t publications. For t between 5 and 10, Table 3 contains basic statistics about the number of boroughs, and their distribution according to their size. Thus, for a threshold t, the number of nodes in the t-co-authorship network is the number of authors who have at least t joint publications with one other author. The density of the resulting networks is fairly stable and slowly increases from $2\cdot 10^{-5}$ for $t=5$ to $4\cdot 10^{-5}$ for $t=10$.

We can draw several conclusions from this experiment:

Table 3

Basic statistics about the Borough distribution in the co-authorship networks based on DBLP, for thresholds on the number of coauthored publications ranging from $t= 5{-}10$

t	#nodes	#edges	#boroughs	5 largest sizes
5	152,018	239,309	15,550	[23,414, 211, 91, 78, 74]
6	118,423	169,328	12,177	[8054, 153, 81, 51, 50]
7	94,784	125,308	9713	[6294, 144, 51, 50, 39]
8	77,288	95,270	7701	[4741, 137, 90, 56, 46]
9	63,916	74,185	6115	[3743, 121, 45, 42, 40]
10	53,596	59,071	4913	[2907, 118, 43, 36, 34]

Large sparse networks contain relatively many boroughs, roughly an order of magnitude smaller than the number of nodes in the network.
Large networks contain one ‘giant’ borough, whose number of nodes is roughly an order of magnitude smaller than the number of nodes in the network.
The number of nodes of all boroughs except the ‘giant’ one is small.
The sizes of the boroughs of the DBLP co-authorship networks are distributed according to a power-law.

Given the complexity of finding basic cycles for such large networks and available capacities, we only computed the boroughs for t from 5 to 10.

5 Discussion

Our reanalysis of the Zachary karate club network demonstrated the usefulness of 2-club analysis for relatively small ’very local’ networks (Faust 2007). The next example, concerning the networks of corporate interlocking directorates for Europe in 2010, illustrated the huge numbers of distinct, but overlapping 2-clubs of the three types to be expected for larger, possibly dense networks. However, once the boroughs and their 2-clubs are detected, identified and stored, the challenge of their analysis can be met with the plethora of currently available statistical methods of search, data mining and matching techniques of massive databases.

Our exercise with the large DBLP data set shows that a much larger challenge will be how to combine the micro, i.e very local, in-depth focus of close communication by boroughs and 2-clubs with the global analysis of the Big Data massive networks which currently confront community detection. Promising techniques can be based on the analysis of appropriate segments of such networks, using their hierarchical modularities with techniques such as proposed by Blondel et al. (2008) or by focusing on selected 2-neighborhoods.

Finally, some researchers (e.g. Hartung et al. 2012) have noted that the largest 2-clubs they found in real-world networks just coincided with the ego-network of a node with maximum degree ($\Delta \left( G\right) +1$). As the size of a coterie cannot be larger than that limit, any 2-club of larger size than the maximum degree plus one must be a hamlet or a social circle. In our analyses of various real world networks we also did not find a social circle or hamlet larger than the maximum degree, the largest coterie.

As it is not difficult to construct examples of networks with hamlets or social circles which exceed that limit, a question of further research is to hunt for empirical, real-world datasets where that is indeed the case. Networks with no or limited preferential attachment seem likely candidates.

Acknowledgments

We thank Johan van Doornik for several suggestions in the initial stages of our research and are grateful to some referees for suggestions for improvement.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Vorheriger Artikel Homophilic network decomposition: a community-centric analysis of online social services

Nächster Artikel Comparison of sentiment lexicon development techniques for event prediction

For related points see Granovetter (1973) and Everett (1982).

The theorems and corollaries in this paper can be extended and proven for the general case of diameter k, e.g. k-clubs, k-clubs, k-boroughs, etc. Given the focus of this paper on diameter 2, and to simplify presentation and analysis accordingly, we shall formulate our results mainly for this special case.

Note that $C_{3}$ has diameter 1 and is a 1-club also.

This case was called a sociometric clique by Alba (1973).

Batagelj and Zaversnik (2007) introduced k-gonal connectedness of cycles. Our edge chained connection of basic cycles corresponds to their 5-gonal connectedness.

The European data for 2010 were kindly made available to us by Eelke Heemskerk.

Downloaded from http://dblp.uni-trier.de/xml at 2012-02-21.

Alba RD (1973) A graph-theoretic definition of a sociometric clique. J Math Sociol 3:113–126MathSciNetCrossRefMATH

Alon N, Yuster R, Zwick U (1994) Finding and counting given length cycles. In: Proceedings of the 2nd European symposium on algorithms. Lecture notes in computer science, vol 855. Springer, Utrecht, pp 354–364

Asahiro Y, Miyano E, Samizo K (2010) Approximating maximum diameter-bounded subgraphs. In: López-Ortiz (ed) LATIN 2010: theoretical informatics. 9th Latin American symposium, vol LNCS 6034. Springer, Oaxaca, pp 615–626

Balasundaram B, Butenko S (2006) Graph domination, coloring and cliques in telecommunications. In: Resende, MG, Pardalos PM (eds) Handbook of optimization in telecommunications. Springer, New York, pp 865–890CrossRef

Balasundaram B, Butenko S, Trukhanovzu S (2005) Novel approaches for analyzing biological networks. J Combin Optim 10(1):23–39MathSciNetCrossRefMATH

Batagelj V, Zaversnik M (2007) Short cycle connectivity. Discrete Math 307:310–318MathSciNetCrossRefMATH

Blondel VD, Guillaume J-L, Lambiotte R, Lefebre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008CrossRef

Borgatti SP, Everett MG, Freeman LC (2002) Ucinet for windows: software for social network analysis. Analytic Technologies, Harvard

Bourjolly J-M, Laporte G, Pesant G (2000) Heuristics for finding k-clubs in an undirected graph. Comput Oper Res 27(6):559–569MathSciNetCrossRefMATH

Bourjolly J-M, Laporte G, Pesant G (2002) An exact algorithm for the maximum k-club problem in an undirected graph. Eur J Oper Res 138:21–28MathSciNetCrossRefMATH

Bron C, Kerbosch J (1973) Finding all cliques of an undirected graph (H) (Algorithm 457). Commun ACM 16:575–577CrossRefMATH

Butenko S, Prokopyev O (2007) On k-club and k-clique numbers in graphs. Tech. rep., Texas A&M University

Carvalho F, Almeida M (2011) Upper bounds and heuristics for the 2-club problem. Eur J Oper Res 210:489–494MathSciNetCrossRefMATH

Cavique L, Mendes AB, Santos JM (2009) An algorithm to discover the k-clique cover in networks. In: Lopes LS, Lau N, Mariano P, Rocha LM (eds) EPIA ’09 proceedings of the 14th Portuguese conference on artificial intelligence: progress in artificial intelligence. Springer, Berlin, pp 363–373

Chang MS, Hung LJ, Lin CR, Su PC (2012) Finding large k-clubs in undirected graphs. Computing 1–20. doi:10.1007/s00607-012-0263-3

Davis JA, Leinhardt S (1972) The structure of positive interpersonal relations in small groups. In: Berger J, Zelditch MJ, Anderson B (eds) Sociological theories in progress, vol 2. Houghton Mifflin, Boston, pp 218–251

Diestel R (2005) Graph theory. In: Graduate texts in mathematics, 3rd edn. Springer, Heidelberg

Everett MG (1982) A graph theoretic blocking procedure for social networks. Social Netw 4:147–167MathSciNetCrossRef

Faust K (2007) Very local structure in social networks. Sociol Methodol 37(1):209–256CrossRef

Fortunato S (2010) Community detection in graphs. Phys Rep 486:75–174MathSciNetCrossRef

Fosdick L, Ehrenfeucht A, Osterweil L (1973) An algorithm for finding the elementary circuits of a directed graph. Tech. rep, Colorado University

Frank O (1988) Triad count statistics. Discrete Math 72:141–149MathSciNetCrossRefMATH

Girvan M, Newman M (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA (PNAS) 99(12):7821–7826MathSciNetCrossRefMATH

Granovetter MS (1973) The strength of weak ties. Am J Sociol 78(6):1360–1380CrossRef

Gutiérrez Y, Vázquez S, Montoyo A (2011) Word sense disambiguation: a graph-based approach using n-cliques partitioning technique. In: Mũnoz R, Montoyo A, Métais E (eds) NLDB’11 proceedings of the 16th international conference on Natural language processing and information systems. Springer, Berlin, pp 112–124

Harary F (1969) Graph theory. Addison-Wesley, Reading

Harary F (1994) Graph Theory. Westview Press, BoulderMATH

Hartung S, Komusiewicz C, Nichterlein A (2012) Parameterized algorithmics and computational experiments for finding 2-clubs. In: Thilikos D, Woeginger G (eds) 7th international symposium on parameterized and exact computation (IPEC 2012), vol IPEC 2012. LNCS, vol 7535. Springer, Ljubljana, pp 231–241

Hartung S, Komusiewicz C, Nichterlein A (2013) On structural parameterizations for the 2-club problem. In: van Emde Boas P, Groen FC, Italiano GF, Nawrocki J, Sack H (eds) SOFSEM 2013: theory and practice of computer science, vol LNCS 7741. Springer, Berlin, pp 233–243

Heemskerk EM (2013) The rise of the European corporate elite: evidence from the network of interlocking directorates in 2005 and 2010. Econ Soc 42(1):74–101CrossRef

Holland PW, Leinhardt S (1970) A method for detecting structure in sociometric data. Am J Sociol 76:492–513CrossRef

Holland PW, Leinhardt S (1971) Transitivity in structural models of small groups. Comparat Group Stud 2:107–124

Johnsen EC (1985) Network macrostructure models for the Davis-Leinhardt set of empirical sociomatrices. Social Netw 7(3):203–224MathSciNetCrossRef

Johnson D (1975) Finding all the elementary circuits of a directed graph. SIAM J Comput 4(1):77–84MathSciNetCrossRefMATH

Laan S (2012) Fast finding of 2-clubs. Bachelor’s thesis. Informatics Institute, University of Amsterdam

Laan S (2014) Finding 2-clubs (viewer version 5.0) [software]. https://github.com/Neojume/TwoClubs. https://github.com/Neojume/TwoClubs

Leskovec J, Lang KJ, Dasgupta A, Mahoney MW (2008) Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th international conference on World Wide Web. WWW ’08. ACM, New York, pp 695–704. http://doi.acm.org/10.1145/1367497.1367591

Luce R (1950) Connectivity and generalized cliques in sociometric group structure. Psychometrika 15:169–190MathSciNetCrossRef

Memon N, Larsen HL (2006) Structural analysis and mathematical methods for destabilizing terrorist networks using investigative data mining. In: Li X, Zaiane O, Li ZE (eds) The second international conference on advanced data mining and applications (ADMA 2006), vol LNAI 4093. Springer, Berlin, pp 1037–1048

Miao J, Berleant D (2004) From paragraph networks to document networks. In: Proceedings of the international conference on information technology: coding and computing (ITCC 2004), vol 1. IEEE Conference Publications. IEEE Computer Society, New York, pp 295–302

Mokken RJ (1979) Cliques, clubs and clans. Qual Quant 13(2):161–173. doi:10.1007/BF00139635 CrossRef

Mokken RJ (1980–2011) Coteries, social circles and hamlets. close communities: a study of acquaintance networks (1980–2011). Tech. rep., University of Amsterdam: Informatics Institute. http://arxiv.org/abs/1203.5218

Mokken RJ (2008) Cliques, clubs and clans. In: Freeman L (ed) Social networks analysis. Volume II: the structure of social groups. SAGE benchmarks in social research methods series. Sage Publications, London

Mokken RJ, Laan S (2015) Close communication and 2-clubs in corporate networks: Europe 2010. Tech. rep., University of Amsterdam: Informatics Institute. http://arxiv.org/abs/1510.08662 and http://ssrn.com/abstract=2686127

Newman M (2004) Detecting community structure in networks. Eur Phys J B 38:321–330CrossRef

Nikolopoulos SD, Palios L (2007) Detecting holes and antiholes in graphs. Algorithmica 47(2):119–138. doi:10.1007/s00453-006-1225-y MathSciNetCrossRefMATH

Pajouh FM, Balasundaram B (2012) On inclusionwise maximal and maximum cardinality k-clubs in graphs. Discrete Optim 9:84–97MathSciNetCrossRefMATH

Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818CrossRef

Pasupuleti S (2008) Detection of protein complexes in protein interaction networks using n-clubs. In: Marchiori E, Moore J (eds) EvoBIO 2008, vol LNCS 4973. Springer, Berlin, pp 153–164

Porter MA, Onnela J-P, Mucha PJ (2009) Communities in networks. Not AMS 56(9):1082–1166MathSciNetMATH

Reichhardt J, Bornholdt S (2006) Statistical mechanics of community detetection. Phys Rev E 74(1):016110MathSciNetCrossRef

Schäfer A, Komusiewicz C, Moser H, Niedermeier R (2012) Parameterized computational complexity of finding small-diameter subgraphs. Optim Lett 6(5):883–891. doi:10.1007/s11590-011-0311-5 MathSciNetCrossRefMATH

Shahinpour S, Butenko S (2012) Algorithms for the maximum k-club problem in graphs. J Combinat Optim 1–35. doi:10.1007/s10878-012-9473-z

Terveen L, Hill W, Amento B (1999) Constructing, organizing, and visualizing collections of topically related web resources. ACM Trans. Comput.-Hum. Interact. 6(1), 67–94

Tiernan JC (1970) An efficient search algorithm to find the elementary circuits of a graph. Commun ACM 13(12):722–726MathSciNetCrossRefMATH

Veremyev A, Boginski V (2012) Identifying large robust network clusters via new compact formulations of maximum k-club problems. Eur J Oper Res 218:316–326MathSciNetCrossRefMATH

Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press, New YorkCrossRefMATH

Watts D, Strogatz S (1998) Collective dynamics of ’small-world’ networks. Nature 393:440–442CrossRef

Weinblatt H (1972) A new search algorithm for finding the simple cycles of a finite directed graph. Journal of the ACM 19(1):43–56MathSciNetCrossRefMATH

Xie J, Kelley S, Szymanski Boleslaw, K (2013) Overlapping community detection in networks: the state of the art and comparative study. ACM Comput Surv 45(4)

Yang CP, Liu CY, Wu BY (2010) Influence clubs in social networks. In: Pan JS, Chen SM, Nguyen NT (eds) Computational collective intelligence. Technologies and applications, proceedings, second international conference, ICCCI 2010. Lecture notes in computer science, vol 6422. Springer, Kaohsiung, pp 1–10

Yuster R (2011) A shortest cycle for each vertex of a graph. Inf Process Lett 111, 1057–1061

Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33:452–473CrossRef

Titel: Close communities in social networks: boroughs and 2-clubs
verfasst von: S. Laan
M. Marx
R. J. Mokken
Publikationsdatum: 01.12.2016
Verlag: Springer Vienna
Erschienen in: Social Network Analysis and Mining / Ausgabe 1/2016
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI: https://doi.org/10.1007/s13278-016-0326-0

Springer Professional

Abstract

1 Introduction

2 Concepts and notation

3 Close communities as 2-clubs of a network

3.1 Close communities: hamlets, social circles, coteries

3.2 The 2-clubs of a graph or network

3.3 The boroughs of a graph or network

4 Some applications

4.1 Finding boroughs and 2-clubs

4.2 Some real network results

4.2.1 Zachary’s karate club

4.2.2 European corporate network 2010

4.2.3 Boroughs in DBLP co-authorship networks

5 Discussion

Acknowledgments

Weitere Artikel der Ausgabe 1/2016

A medical image retrieval scheme with relevance feedback through a medical social network

Close communication and 2-clubs in corporate networks: Europe 2010

Large cultural networks and smart specialization: What is new in regional policy diagnostic analytics?

The social role of social media: the case of Chennai rains-2015

Optimising influence in social networks using bounded rationality models

Finding remarkably dense sequences of contacts in link streams