Introduction
shce
)” section, we derive the new “structural hole centrality” measure. Finally in “Evaluation” section, we present an analysis and evaluation of the new centrality measure. This section is completed with a case-study of how new measure can be used in practice, in a study of social capital in a network of Norwegian board directors.Related work
Social capital and its measurement
Strategic network formation
A bonding and bridging strategic game
Connections model
conn
was proposed originally in [17] and introduces the following payoff function, representing the utility or value that a player u receives from a network G:conn
, in which \(\delta _{uv} = \delta \) and \(c_{uv} = c\), are constant for all u, v.conn
model is that value through the connections is accrued to the sources of those connections. The fact that u has a path to another player v, allows u to reap the benefit of that connection. Intermediary nodes along the path between u and v obtain value through their own connections to v, but they do not obtain any benefit for their role as connectors between u and v. Thus the conn
does not assign value for the role of being a connector in a structural hole and hence cannot be considered to model the utility of bridging social capital.Kleinberg’s structural hole model
ksh
model. The key difference between ksh
and conn
is that, in the ksh
, the value of indirect paths is assigned to the connectors along these paths, rather than the end-points. Thus, if w is a player that forms a length-two path between vertices u and v, i.e. the edges \((u,w)\in G\) and \((w,v) \in G\), then the value \(\delta _{uv}\) that u would obtain for a connection to v, is allocated to w instead. In an undirected graph, w accrues both v’s value to u, \(\delta _{uv}\) and u’s value to v, \(\delta _v\). More exactly, since there may be many length-two paths between u and v, the value obtained by each intermediary, w, is a monotonically decreasing function of the number of such paths.conn
model, if \(\delta \) is the value that a direct link between u and v would assign to u, then an intermediary w, obtains the valueValue functions and allocation rules
conn
and ksh
games are examples of a value function/allocation rule game. A network is formed in which individuals are connected by social links and those interconnections convey on the group as a whole some total productivity or value. Given individual utilities, such as those defined in Eq. (1), the value function of the network is given byconn
and the ksh
, we have arrived at the value function by summing individual payoffs. Instead, given a value function, it is possible to define an allocation rule, that is, a function that distributes the total network value, \(\mu (G)\), to the nodes, so that each node obtains a payoff \(\mu _u(G)\) such that Eq. (2) holds. It is worth noting, that, for the specialisation of the conn
game in which only length-two paths accrue any benefit, i.e. \(\delta _{uv}^{d_{uv}}=0\) when \(d_{uv}>2\), the conn
and the ksh
have the same total value, but it is allocated differently—all benefit goes to the source nodes in the case of conn
, while the indirect benefit goes to the intermediary nodes in the case of ksh
.Limitations of the conn
and ksh
conn
and ksh
models. In particular,-
The
ksh
model only considers length-two paths for indirect benefits. -
The
ksh
model allocates the entire indirect benefit to intermediary nodes. This eliminates any personal motivation for a player to form indirect links. -
The
conn
model allocates no benefit to intermediary nodes, ignoring the important role that they play in creating value in the network. -
Neither model takes account of the structural quality of the connecting nodes.
conn
are studied in [17] and, depending on the relationship between the fixed direct benefit \(\delta \) and the cost c, consist of either a fully connected network, an empty network or a complete star network, see Fig. 2. In particular, the efficient networks do not contain any triangles, which are known as strong social structures. We argue that the advantage that a node gains from paths in the network, depends on the quality of the end-points of these paths. If the end-points are gateways into strong communities, then there is significant advantage, while if the end-points are themselves dead-ends, or have limited reach into the rest of the network, then they yield relatively less value. We illustrate this point in Fig. 1. Here, we measure the ksh
value of a network, as the network is modified to increase its clustering. Specifically, starting with a network with a scale-free degree distribution, we carry out pairwise swaps of edges in the network in such a way that the degree distribution remains fixed, while the clustering coefficient of the network varies. The interesting features of this plot are where the payoff remains fixed or nearly fixed, while the clustering coefficient decreases. The reduction in clustering coefficient is indicative of intermediaries in structural holes are connecting between ever weaker community structures. We argue that the payoff of being an intermediary in such a situation should also ideally decrease. We aim to develop a model that accounts for this anomaly and whose efficient networks contain the sort of social structures that we might expect to find in real social networks.
The structural hole connections model (shc
)
ksh
assigns value to bridging, while the conn
focuses more on bonding, over direct and indirect links. Our goal is to propose a new model, that merges the features of the conn
and the ksh
, to capture both bonding and bridging social capital. We call our model the structural hole connections model (shc
). In particular,-
We consider the structural value of nodes as the end-points of connections.
-
We extend the
ksh
to longer paths, maintaining the Harmonic allocation of value to intermediaries on these paths. -
We combine this extended
ksh
with theconn
model, so that value is allocated to both source and intermediary nodes along each path.
conn
model and hence our model can be understood as a new allocation function for the value in that model. However, rather than restrict ourselves to the symmetric conn
, instead we consider that the benefit is dependent on the end node, v, so that:ksh
assumes that intermediaries connecting nodes u and v, that are not directly connected, receive the value that would otherwise go to the end-points. The value is assigned entirely to the intermediary, while the conn
assumes that nodes obtain value for other nodes to whom they have indirect, as well as direct, connections. In merging these two perspectives, we consider that a source node on a connecting path retains some fraction \(\gamma \le 1\) of the value of the end-point of the path, while the remainder of the value, \((1 - \gamma )\), goes to the intermediaries. Thus, we allocate \(100\times \gamma \%\) of the value, as the conn
does, to the source of the connection and \(100\times (1 - \gamma )\%\) of the value, as the ksh
does, to the intermediaries.ksh
to longer paths. We retain the Harmonic benefit allocation used in the ksh
, so that the full value of an indirect link is retained in the network, but is allocated between intermediary and source nodes. In particular, any intermediary w on a length \(\ell \) geodesic path between u and v, obtains the benefitksh
, use \(d_{\max }=2\).) Note that, in this definition, the intermediary benefit is allocated equally to \(\ell -1\) intermediaries along each path over all \(m_{\ell uv}\) paths. Also, we have retained the path distance discounting (\(\delta ^{\ell -1} b_v\)) of the conn
model, which was not applied in the original ksh
model. To summarise, in the shc
model, a node obtains value from the network-
by direct connections to other nodes;
-
by being the source of a length \(\ell \) geodesic path to another node;
-
by being an intermediary on a length \(\ell \) geodesic path to another node,
shc
can be written as:conn
model, where \(\forall v, b_v = 1\). That is:Discussion
shc
, in Fig. 2, we show some example efficient networks for \(n=6\), for the case of a constant benefit \(b_v = b_v^{\text{equal}}=1\), corresponding to a symmetric conn
model, and when \(b_v = b^{\text{tri}}\). It may be observed that, in the second case, efficient networks containing triangles are found, as only nodes connected to triangles have a non-zero nodal benefit. This shows that the shc
, with \(b_v = b_v^{\text{tri}}\) yields a richer set of efficient networks than the symmetric conn and that they contain structures that are commonly observed in real social networks.
Structural hole centrality (shce
)
shc
game as a means of defining a structural hole centrality measure that can identify nodes in a social network with high social capital.shc
, the \(\gamma \) parameter controls the allocation of value to nodes in the network. Different values of \(\gamma \) may be considered as different allocation functions, that distribute the total network value, which is determined by \(\delta \), \(d_{\text{max}}\), \(b_v\) and the cost c. This total network value is obtained as a sum over all the paths in the network of the path-length discounted benefits obtained from end-points of those paths. The question of a fair allocation of such network value has been addressed in works such as [24]. One approach is to identify desirable properties of the allocation function and determine an allocation that satisfies those properties. Two desirable properties of a fair allocation are that it be component additive, that is, the value generated by any connected component in a network should be allocated among the nodes in that component; and that it satisfy equal bargaining power, that is, that if two nodes, u and v are connected, then the change in the value allocated to node u when the edge (u, v) is removed, should equal the change in the value allocated to node v. Equal bargaining power says that the pair of nodes each benefit or suffer equally from the addition of a link between them. These two properties hold if and only if nodes are allocated their so-called “Myerson value”, defined as:shc
game allows for the exploration of a range of different allocations, by modifying the value of \(\gamma \) and, when \(0<\gamma < 1\), all nodes along a path get allocated some proportion of the value that is generated by that path. In fact it is generally the case, that the Myerson value correlates strongly (in rank order) to the node utilities of the shc
for some value of \(\gamma \), typically when \(\gamma \approx 0\). On the other hand, modifying \(\gamma \) allows an analyst to explore how different nodes benefit from different allocation strategies and this can give some insight into their position of influence in the network: when \(\gamma \approx 1\), nodes that are connected along short paths to many other nodes can expect to benefit from a high payoff, while when \(\gamma \approx 0\), nodes that are intermediaries on many short paths can expect a high payoff. Hence we define the structural hole centrality measure, shce
, as the payoff of the shc
game. To parameterise the cost, we stick with a fixed cost c for every link, and note that the total value in the network is zero whenshce
asshce
are summarised in Table 1.shce
centrality parametersParameter | Description |
---|---|
\(b_v\) | Benefit associated with connecting to a node v, where that benefit captures the structural quality of v in the network |
\(\delta \) | Indirect path benefit discount, such that a path of length \(\ell \ge 2\) to a node v, accrues a benefit of \(\delta ^\ell b_v\) |
\(d_{\text{max}}\) | The maximum path length, such that there is no benefit to being connected along a path of greater length |
\(\gamma \) | Proportion of the indirect path benefit that is associated to the source of a path |
\(\eta \) | The scaled cost of a connection to a node |
Relationship to other centrality measures
shce
with a set of other commonly used centrality measures in network analysis. From the above presentation, it is clear that the shce
is similar to closeness centrality (a measure of the average closeness of a node to other nodes in a network) when \(\gamma =1\) and is similar to edge-betweenness centrality (a measure of the extent to which a node is found on shortest paths in the network), when \(\gamma =0\). Nevertheless, the shce
is not identical to either measure. In fact the \(d_{\text{max}}\) and \(\delta \) parameters allow for a restriction in the horizon over which a node’s distance to other nodes influences its shce
value, while closeness and edge-betweenness consider the relationship to all nodes in the network. The \(\gamma \) parameter, then allows for a mixture of the betweenness and closeness perspectives. The difference in the measures is illustrated for the Minnesota road network, shown in Fig. 3, which has a diameter of 98. The settings of the shce
focus value strongly on intermediate nodes, by taking \(\gamma =0\), along with a maximum cost of \(\eta =1.0\) for edges. The plot shows the tied rank of the measures, where nodes with largest centrality value have rank n and nodes with smallest have rank 1. The Spearman rank correlation of shce
with closeness and betweenness is not particularly strong for these settings. The shce
also has similarities to the Katz centrality measure, which computes a node’s centrality in relation to its discounted distance to other nodes in the network. However, the Katz allocates its value solely to the source nodes on such paths and so cannot be used as a measure of bridging capital. We will show in our case-study in “Evaluation” section that computing a profile of shce
centrality scores as \(\gamma \) is varied allows for some insight into the mix of values that actors get from the position in a social network and provides a single framework with which social capital can be assessed.
Comparison of shce
with Myerson value
shc
allocation of value to that of the Myerson value in a simple network, with a constant benefit function. In Fig. 4a, we show a network consisting of a single 4-node undirected path. By counting all shortest paths in this network, we can find the total network value asshce
allocation depends on the value of \(\gamma \) and is given byshce
will do so as shown in Fig. 4c. Both methods give higher weight to node 3 than nodes 2 and 5, since the value remains in the network if either one of these is removed. However, the value that shce
gives to the end-points of paths depends on the \(\gamma \) parameter.shce
and the Myerson value on a random network of \(n=13\) nodes, using both the triangle nodal benefit (\(b_v^{\text{tri}}\)) and constant nodal benefit (\(b_v^{\text{equal}}\)) functions. Again, in Fig. 5, the colour indicates the rank of the node. In the case of the triangle benefit, value is concentrated on the nodes that form the single triangle in the network (nodes, 1, 4 and 13), for both measures. The Myerson gives higher values to peripheral nodes 7 and 8, since these nodes add to the value of the network by linking to nodes with non-zero benefit. With \(\gamma =0.0\), the shce
focuses more value on intermediary nodes such as 3 and 11 that straddle paths to the non-zero benefit nodes. The overall rank correlation of the shce
and Myerson is 0.34 in this case. When all nodes have the same benefit, high Myerson values attach to nodes 3 and 5 that add value to the network by forming the path that connects the nodes in the lower left corner to the rest. But, again Myerson credits the peripheral nodes 7, 8 and 9 because they too add to the overall value in the network. The highest correlation (0.99) between these Myerson scores and the shce
score occurs when we choose a value of \(\gamma =0.5\), that allocates the value equally between source and intermediary nodes on connecting paths. From these examples, it is clear that there is no best value of the shce
parameters, in the fairness sense from which the Myerson is derived. But it is also generally the case that some settings of the shce
parameters can achieve centrality scores that correlate strongly with the Myerson. While adjusting \(\gamma \) cannot lead to a fair allocation in the sense of the Myerson value, it can allow insight to be derived into which nodes benefit, when the allocation of value favours bridging capital over bonding or vice versa. The shce
relies on the analyst to determine an insightful allocation of the value in the network by adjusting its parameters, while the Myerson provides a single best allocation in some well-defined sense. We note however that work such as [24] argues that the fairness criteria of the Myerson may not be appropriate, depending on the context in which the strategic game is analysed.
Evaluation
shce
correlation with other centralities
shce
fits in relation to other metrics, it is worthwhile applying this same analysis to the shce
.shce
centrality measure using a subset of the CommunityFitNet corpus of networks [25] which, in total, contains 572 real-world networks drawn from the Index of Complex Networks (ICON) [26]. The CommunityFitNet corpus includes a variety of network sizes and structures. Our analysis assumes unweighted, simple, undirected networks. We only consider networks with a single connected component and also reject any other networks for which any of the analysed centrality measures fails to compute.3 There remains 299 networks, on which our analysis is performed, which come from 6 different domains (see Table 2), with a range of nodes from 8 to 3155 (average 464) and a mean sparsity of 4.72%.Domain | Number | Percent |
---|---|---|
Biological | 117 | 39 |
Economic | 12 | 4 |
Informational | 17 | 6 |
Social | 84 | 28 |
Technological | 53 | 18 |
Transportation | 16 | 5 |
CMC
) between the shce
measure and various other node centrality measures. This statistic is chosen in [11] on the basis that relationships between measures can be nonlinear, though they are generally always monotonic. The centrality measures that we compare against are listed and defined in Table 3. From their definitions, the connections to the shce
are apparent. In particular, shce
relies on values measured along shortest paths, similarly to the cc
, hc
, bc
and kc
. Like the \(\texttt {cc}\) and \(\texttt {hc}\), path contributions are inversely proportional to their lengths. Like the kc
, the contribution of a path decays according to a benefit factor \(\delta <1\), such that a path’s contribution is proportional to \(\delta ^\ell \), where \(\ell \) is the path length. Nevertheless, the \(\gamma \), \(\delta \) and \(\eta \) parameters allow control over the shce
, so that preference can be given to a node’s bonding or bridging capabilities.Measure | Symbol | Formula |
---|---|---|
Shortest-path betweenness centrality | bc | \(c_w = \sum _{u\ne v \ne w}\sum _{\ell } {\mathbb {1}}(m_{\ell uv}>0)f_{\ell uwv}\) |
Shortest-path closeness centrality | cc | \(c_w = \frac{n-1}{\sum _v \ell _{wv}}\) |
Eigenvector centrality | ec | \(c_w = \frac{1}{\lambda _1} \sum _v a_{wv}e_v\) |
Katz centrality | kc | \(c_w = \sum _v \left( ( \text {I} - \delta \text {A}^T)^{\tiny {-1}}-\text {I}\right) _{wv}\) |
Degree centrality | dc | \(c_w = \sum _v a_{wv}\) |
Harmonic centrality | hc | \(c_w = \frac{1}{n-1}\sum _v \frac{1}{\ell _{wv}}\) |
Constraint centrality | conc | \(c_w = \sum _v a_{wv}(p_{wv} + \sum _{u} p_{wu} p_{uv})^2\) |
shce
and the Katz measure, \(\texttt {kc}\). Differently to many other centrality measures (such as \(\texttt {cc}\) and \(\texttt {hc}\), where only the length of the path is important), both \(\texttt {kc}\) and shce
accumulate a contribution along all shortest paths between pairs of nodes, in proportion to \(\delta ^\ell \). However, for the Katz measure, this contribution is associated with the source of the path, while in the shce
, we can use \(\gamma \) to control whether the contribution is assigned to the source, or among the intermediary nodes on the path.
shce
, we note that \(\delta \), \(d_{\text{max}}\), \(\eta \) and \(b_v\) relate to how the network is valued—the extent to which value is placed on indirect paths, and how the end-points of these paths are relatively valued. The parameter \(\gamma \) relates to how that value is allocated to the nodes in the network. Generally, actors bring value to the network through the paths that they occupy and that value is allocated to them proportionately, as determined by \(\gamma \). The parameter \(\eta \) controls the value of direct connections, the more costly they are, the more value needs to attain through the indirect connections that they help form. \(d_{\max }\) and \(\delta \) together determine the distance horizon over which an actor can attain some value for others in the network. In the following analysis, we fix \(d_{\text{max}}=10\), which for most of the involved networks exceeds or is close to their diameter and set \(\delta =0.9\).shce
with the centralities defined in Table 3 when \(\eta =0.0\) and \(\eta =0.5\), respectively, and a constant nodal benefit function is used. Figures 8 and 9 contain the analogous boxplots for the case of the triangle benefit function. We can observe the effect of varying the \(\gamma \) parameter to distribute the network value in different ways. When \(\gamma =0.0\), the value from indirect links is placed fully on the intermediaries, the shce
correlates most strongly with the betweenness centrality bc
and this correlation weakens as \(\gamma \) is raised to 1.0. At the same time, we see a strengthening of the correlation to the cc
, bc
and hc
that value short connections from source nodes to other nodes in the network. Generally, when \(\eta =0.0\) and there is no cost associated with direct link formation, so that high degree nodes are not penalised, we see that the shce
is consistently negatively correlated with the conc
, which values dense neighbourhoods. On the other hand, when a cost for link formation is introduced (Figs. 7 and 9), then the shce
exhibits increasing positive correlation with conc
as value is focused away from intermediaries. We can see that the shce
becomes less well-correlated with standard centrality measures as a mixture of benefits (Figs. 7b and 9b) is valued. We also see less strong correlations with the standard centrality measures when the triangle benefit function is used. It should be noted that, particularly, for some of the smaller networks in the dataset, these can be a high fraction of nodes that are not incident on any triangles, reducing the benefit of connecting to them to zero.
shce
with other centrality measures. Different combinations of \(\gamma \), \(\eta \), and \(\delta \) were used to measure shce
values using both constant and triangle-based nodal benefits. The Spearman’s \(\rho \) correlation plots show that most of the pairs of centrality measures have medium-to-high positive correlation (with the exception of conc
) with each other when compared using mean between-network CMC
(the mean CMC
for each pair of centrality measures across 299 networks) values. Similar to the boxplots, in these plots, for both constant and triangle-based benefits, the conc
is negatively correlated with other measures as it values for zero values of \(\gamma \) and \(\eta \) at \(\delta =0.9\).shce
and other centrality measures, we also examined the association between network properties and the CMC
for different networks. We used following six out of the eight global network properties used for the similar analysis in [11]: assortivity, connection density, clustering, global efficiency, majorization gap, and spectral gap. In particular, objective of this analysis to examine how the shce
relates to the network topology as well as how it is compared relative to other centrality measures. Before results of this analysis are discussed, we briefly remind ourselves the definitions of network topological properties that were used in the analysis. Assortivity measures node’s preferences to connect with other nodes with similar degree. Clustering is the number of closed triangles in the network. The efficiency measure defined by [27] is the inverse of path connecting two nodes in the network and at global scale global efficiency is the average of efficiency for all the nodes in the network [28]. The majorization gap is the difference between empirical network and idealised threshold network [29]. It is calculated as difference in network degree sequence and its corrected conjugate sequence. Networks with high majorization gap will be distant from a threshold network and have lower CMCs
[11]. Finally, the spectral gap is the difference between moduli of two largest eigenvalues of the adjacency matrix. It quantifies the extent to which a network being sparse and well connected at the same time [11].CMC
including the shce
with \(\gamma =0.5\), \(\delta =0.5\), \(\eta =0.5\) calculated for both, the triangle nodal benefit (\(b_v^{\text{tri}}\)) and the constant nodal benefit (\(b_v^{\text{equal}}\)) functions, shown in Fig. 11a and 11b, respectively. The lower triangle in each subplot indicates the Spearman correlation between CMC
and the network property. The upper triangle indicates if this correlation was significant (grey) or not (while). Through our analysis with various combinations of parameters, we observed that shce
consistently is significantly correlated with path-based network measures and negatively correlated with assortivity across various values of \(\gamma \).shce
behaves in an expected manner and aligns with other centrality measures to a greater or lesser extent, depending on the setting of its parameters. However, a single measure that allows control over a node’s bonding and bridging capabilities can be useful. For instance, an analysis of a node’s rank vs \(\gamma \), can allow an analyst to better understand how the actor’s social status is composed. A low rank will indicate low status, in any case, but a rank which diminishes with \(\gamma \) suggests that status is being maintained mainly through bonding relationships, suggesting a route to increasing social capital would focus on enhancing its role as a bridges.
The Norwegian boards social network
shce
in the analysis of the social network of Norwegian boards of directors introduced in [30]. This set of networks were originally used to analyse the social capital of women directors in Norway. We take the May 2011 one-mode dataset in which actors correspond to board members and a link between a pair of actors exists in the network if they are members of a common board. We extract the largest connected component of this network, which consists of 784 nodes and 2522 edges. For each actor in the network, we compute the shce
value with \(d_{\max }=10\), \(\delta =0.9\), \(\eta =0.5\), \(b_v = b^{\text{equal}}_v\) and a range of \(\gamma \) values from 0 to 1. Thus, we allow long paths up to 10 connections to impact on the shce
and discount according to path length relatively slowly. We examine the different shce
profiles that result, where a profile of each actor is a graph of an actor’s shce
centrality vs \(\gamma \). We focus on how the profile can allow broad categories of actor to be identified. In particular, we examine at what value of \(\gamma \) an actor achieve their highest shce
value. A large majority of actors (83%) achieve their maximum shce
value at \(\gamma =1\), indicating that it is primarily through their bonding (over direct and indirect paths) to other actors that their social status is achieved. Only two actors, who are both female, achieve them maximum shce
score at \(\gamma =0\), indicating indicating that it is primarily through their bridging capabilities that their social status in the network is achieved. Just 6 out of 784, have a balanced profile, in which their greatest shce
value is achieved at \(\gamma =0.5\). Four out of six of these ‘balanced’ profiles are female. Examples of the three different profile types, are illustrated in Fig. 12, where ego-networks, extending to depth two from the ego are displayed alongside the shce
profile. We can observe in these examples, how actor 646, whose profile shce
increases with \(\gamma \) is bound in a tightly knit community, while actor 273 bridges along many paths between friends-of-a-friend; the balanced profile actor 751 is also a good bridge, while having many direct connections in well-connected neighbours. As another indication that female actors are somewhat more inclined to act as bridges in the social network, the actors are ordered according to the value of \(\gamma \) at which their shce
profile peaks, so that actors whose shce
profile peaks at \(\gamma =0\) are ordered first and those whose profile peaks at \(\gamma =1\) are last in the ordering. Focusing only on the 17% of actors whose peak is before \(\gamma =1\), in Fig. 13, we plot the cumulative proportion of females and males in that ordering. We see that females are over-represented among the low values of \(\gamma \), indicating a greater tendency for female actors to get more value from the network, when that value is allocated to bridges.
shce
to shed light on issues of social capital in social networks. We do not offer definitive conclusions and refer readers to [30] for a deep sociological analysis of these networks. However, we do contend that the shce
can yield deeper insights, in comparison to the betweenness centrality measure that was exploited in the original study.Conclusion
conn
and ksh
network formation games. While we have shown some examples of efficient networks that emerge from this game, the main focus of this paper has been on a new centrality measure, that is defined as a fixed point of the linear system that spreads the benefit associated with each node in the network, among those nodes that connect to it along geodesic paths. The new centrality measure has the advantage of the Katz measure in that it depends on the connecting paths, rather than simply on path-lengths. But, more particularly, it is parameterised in a way that allows the analyst to control the way nodes are valued according to their bonding and bridging capabilities. We have benchmarked the new measure against a number of other common centrality measured and showed its application on some example networks. In future work, we will provide a more detailed analysis of the bonding and bridging game and identify the structures that emerge as stable networks from this game.