Among the ten nodes in Fig.
1: nodes 1, 2, 3, 5, 6, 7 and 8 qualify for the role of a bridge node; however, the extent to which they play the role of a bridge node varies with their topological positions. Node 5 effectively plays the roles of both a
bridge:
hub node and a
bridge:
border node. Without node 5, two of the three neighbors (i.e., nodes 0 and 9 among nodes {0, 6, 9}) in its home cluster would be disconnected from the rest of the network (in this context: node 5 strongly plays the role of a
bridge:
hub node) and the other three neighbors (nodes 1, 6 and 7) would have to communicate on a longer path (in this context: node 5 strongly plays the role of a
bridge:
border node). On the other hand, node 1 (that has the same degree as node 5) is weak in its role as a
bridge:
hub node and moderate in its role as a
bridge:
border node. The nodes in the home cluster of node 1 are not entirely dependent on node 1 to stay connected to each other as well with nodes in the other clusters. Hence, we argue that any centrality metric proposed in the literature to quantify and rank the nodes in Fig.
1 for the role of bridge nodes should rank node 5 higher than node 1. Likewise, nodes 7 and 8 that are part of their own cluster, but connected to the other two clusters, are ideal candidates to be considered as
bridge:
between-clusters nodes as well as
bridge:
border nodes and be ranked high as bridge nodes. Among nodes 1, 2 and 3, we argue that node 3 be ranked relatively higher because of its topological position to be the only node to directly connect its home cluster {1, 2, 3, 4} with an alien cluster {7, 8}; whereas, neither node 1 nor node 2 are the only nodes to directly connect their home cluster {1, 2, 3, 4} with an alien cluster. The extent to which each of nodes 1, 2 and 3 play the role of a
bridge:
hub node or
bridge:
border node appear to be about the same; however, the extent to which node 3 plays the role of a
bridge:
between-clusters node is relatively more stronger than that of nodes 1 and 2. In “
Community-aware approach and
Community-unaware approach” sections, we highlight the lapse in the existing work in the literature to differentiate nodes on the basis of the extent to which they play the role of a bridge node in the three topological positions and illustrate how they end up overestimating or underestimating the role of a node as a bridge node.
The community-aware approach requires a community detection algorithm to be run a priori and use the identified communities as the basis to quantify the role of a node as a bridge node. Most of the works on the community-aware approach assume the communities to be non-overlapping in nature (i.e., a node can be part of only one community).
Ghalmane et al. (
2019a) propose the notion of a community hub-bridge (CHB) measure that appears to take into consideration the role of a bridge node both as a
bridge:
hub node and a
bridge:
border node, but not as a
bridge:
between-clusters node. Per (Ghalmane et al.
2019a), the CHB measure for a node
i in community
Ck is the sum of the terms: |
Ck| * DC
intra(
i,
Ck), quantifying the role of the node as a
bridge:
hub node and
βc(
i) * DC
inter(
i,
Ck), quantifying the role of the node as a
bridge:
border node; where |
Ck|, DC
intra(
i,
Ck),
βc(
i) and DC
inter(
i,
Ck) respectively represent the number of nodes in community
Ck, the number of nodes connected to node
i in its home cluster
Ck, the number of neighboring communities for node
i and the number of nodes connected to node
i from other clusters. Table
1 lists the CHB scores for the nodes in the example graph of Fig.
1 with the Louvain clusters considered as the communities of the nodes. We observe
bridge:
between-clusters nodes such as nodes 7 and 8 incur CHB scores of 3 each, which is even less than the score for nodes 0, 4 and 9 that are all purely internal nodes within a community and cannot be considered as bridge nodes.
Table 1
Computation of the community hub-bridge (CHB) scores (Ghalmane et al.
2019a) for the Nodes in Fig.
1
0 | 0 | 4 | 1 | 0 | 0 | 4 | 0 | 4 |
1 | 2 | 4 | 3 | 1 | 2 | 12 | 2 | 14 |
2 | 2 | 4 | 3 | 1 | 1 | 12 | 1 | 13 |
3 | 2 | 4 | 3 | 1 | 1 | 12 | 1 | 13 |
4 | 2 | 4 | 3 | 0 | 0 | 12 | 0 | 12 |
5 | 0 | 4 | 3 | 2 | 2 | 12 | 4 | 16 |
6 | 0 | 4 | 1 | 1 | 2 | 4 | 2 | 6 |
7 | 1 | 2 | 1 | 1 | 1 | 2 | 1 | 3 |
8 | 1 | 2 | 1 | 1 | 1 | 2 | 1 | 3 |
9 | 0 | 4 | 1 | 0 | 0 | 4 | 0 | 4 |
Ghalmane et al. (
2019a) also propose that the actual number of disjoint communities (referred with an acronym: NDC in “
Analysis of real-world networks” section for comparison purposes) to which a node has incident edges as the basis to rank nodes as bridge nodes: nodes with incident edges to several disjoint communities are considered to more suit the role of a bridge node. However, such a formulation will fail to consider the
bridge:
between-clusters nodes (like nodes 7 and 8 in Fig.
1) that are connected to only two or few disjoint communities as well as the
bridge:
hub nodes (like node 3 in Fig.
1) that tend to have a larger fraction of its links to nodes within the community, but connected to relatively fewer number of disjoint communities (however, the links to the outside communities are important from betweenness and connectivity point of view).
Gupta et al. (
2016) propose a community centrality measure based on simply the intra-degree (number of edges to nodes within the community) and inter-degree (number of edges to nodes in other communities) of the nodes. For two nodes
i and
j, node
i gets ranked higher than node
j with respect to the community centrality measure, if node
i has both a larger intra-degree as well as a larger inter-degree than node
j (or) node
i has an intra-degree that is not significantly smaller than that of node
j, but has a larger inter-degree. For two nodes
i and
j with equal intra-degree (or equal inter-degree), node
i is ranked higher than node
j with respect to the community centrality measure if node
i has a larger inter-degree (or larger intra-degree). We claim that the community centrality measure cannot be a good measure to rank nodes with respect to the extent of "bridgeness" as the inter-degree of a node need not always correspond to the number of distinct communities a node is connected to, a very important characteristic to consider for a bridge node (with regards to the position as a
bridge:
border node). For instance, in Fig.
1, the (intra-degree, inter-degree) of both nodes 1 and 5 are (3, 2). Per the community centrality measure, both nodes 1 and 5 will be equally ranked as bridge nodes; whereas, we argue that node 5 should be ranked higher than node 1 (see the earlier discussion).
Magelinski et al. (
2021) recently proposed the notion of "modularity vitality" (referred to as MVIT in “
Analysis of real-world networks” section for comparison purposes) to identify nodes whose majority of incident links are inter-community links and exist in the periphery of a community (such nodes are referred to as bridge nodes in Magelinski et al. (
2021)) and nodes whose majority of incident links are intra-community links and exist well-inside a community (such nodes are referred to as hub nodes in Magelinski et al. (
2021)). In this pursuit, a node whose removal results in an increase in the overall modularity score of the network is considered to play the role of a bridge node and a node whose removal results in a decrease in the overall modularity score of the network is considered to play the role of a hub node. However, we claim that the decrease in the overall modularity score due to the removal of a node cannot be alone used to conclude that a node is "not a bridge node". For example, the removal of node 5 in Fig.
1 would result in the largest decrease in the overall modularity score for the network and per the modularity vitality measure, node 5 could be concluded as a hub node and not a bridge node. However, we claim that node 5 is a bridge node, serving as both a
bridge:
hub node as well as a
bridge:
border node. Thus, the modularity vitality approach has the weakness of not able to identify bridge nodes that could potentially serve as hub nodes of larger degree within their home cluster.
Ghalmane et al. (
2019b) proposed to quantify the centrality of a node with respect to a classical centrality metric (DC, BC, EC or CC) as a vector of two entries that quantify the contribution of a node within its local network as well as in the global network with respect to the particular classical centrality metric. For a given network and the communities identified using a community detection algorithm, the local network is constructed by removing all the inter-community links in the original network and the global network is constructed by removing all the intra-community links in the original network. The intention is that nodes with larger centrality metric values in both the local and global networks could be identified as the bridge nodes. Per this scheme, considering degree centrality (DC) as the classical centrality metric, the vectors for both nodes 1 and 5 in Fig.
1 will be (3, 2). Ghalmane et al. (
2019b) also proposed to use the fraction of intra-community links and inter-community links as weights for the centrality metric values computed for a node in respectively the local and global networks to arrive at a comprehensive centrality metric (referred to as
modular centrality in Ghalmane et al.
2019b) value for a node. With nodes 1 and 5 in Fig.
1 having 3 intra-community links and 2 inter-community links, they will incur identical (scalar) values for the modular centrality with respect to DC. When betweenness centrality (BC) is considered as the classical centrality metric, the modular centrality vectors for nodes 2, 3, 7 and 8 would be (0, 0) each (however, the BC values of these vertices in the original network are each greater than zero), leading to a wrong conclusion that they are not bridge nodes. Note that in addition to being a scalar value, the modular centrality metric value is directly dependent on the underlying classical centrality metric used and the ranking of the nodes as bridge nodes would vary depending on the classical centrality metric considered. An extension of the above work for overlapping communities is available in Ghalmane et al. (
2019c), wherein the local network for an overlapping node (that is part of more than one community) comprises of links to nodes in each of its participating communities and the global network for an overlapping node comprises of links to the communities that it is not part of.
The community-unaware approach does not require a community detection algorithm to be run a priori. Instead, a community-unaware approach typically quantifies the extent of "bridgeness" of a node by considering the extents of "hubness'' and "betweenness" of the node. Two notable centrality metrics in the literature that fall in the category of community-unaware approach and also include the word "bridge" as part of their name are: bridging centrality (BRC) (Hwang et al.
2008) and bridge node centrality (BNC) (Liu et al.
2019), both of which consider the betweenness centrality (BC) of the node on the (shortest) paths between any two nodes in the network as well as the degree of the node. Both BRC (see Eq.
1) and BNC (see Eq.
2) are computationally heavy, global and synchronous metrics (i.e., require knowledge about the entire network topology in order to be computed for a node as well as cannot be computed for a particular node without being computed for every node in the network). On the other hand, our proposed NBNC tuple is computationally light, locally computable and asynchronous metric (i.e., can be computed for just the node in question by using only the two-hop neighborhood information).
The BRC value (Hwang et al.
2008) for a node is the product of its BC and bridging coefficient (BCO: computed as the ratio of the resistance of the node and the sum of the resistances of its neighbors; see Eq. 1 for the formulations of BCO and BRC). The resistance of a node (
i) is the inverse of the degree (
ki) of the node. Table
2 presents the computation of the BRC values of the vertices for the example graph of Fig.
1. Due to the incorporation of betweenness centrality and resistance in its formulation, BRC is likely to highly rank nodes with a lower degree, but connected to larger degree neighbors, and a larger betweenness. In this context, we observe the BRC metric to take a note of the "
bridge:
between-clusters" nodes 7 and 8 in the example graph and rank them highly as bridge nodes. Nevertheless, we still notice that BRC ranks node 1 relatively higher than node 5 (which exhibits a larger bridgeness compared to node 1), even though node 5 incurs a BC value that is twice of node 1: the reason is that node 1 has relatively more high-degree neighbors than node 5 and hence incurs a larger BCO. The product of BC and BCO results in a scalar value that is an overestimate for node 1 and an underestimate for node 5 with regards to their role as bridge nodes, a typical weakness of the approaches that aim at formulating a scalar centrality metric to quantify the role of a node as bridge node. Instead, we need a tuple that would have two or more terms to capture the extents of hubness and betweenness of the nodes.
$$BCO(i) = \frac{{1/k_{i} }}{{\sum\limits_{{j \in Neighbors(i)}} {1/k_{j} } }}\quad \quad BRC(i) = BC(i)*BCO(i)$$
(1)
Table 2
Computation of the bridging centrality (BRC) values for vertices in the graph of Fig.
1
0 | 1 | 1 | 5 | 0.2000 | 5.0000 | 0.0000 | 0.0000 |
1 | 5 | 0.2 | 2, 3, 4, 5, 6 | 1.3667 | 0.1463 | 9.6667 | 1.4151 |
2 | 4 | 0.25 | 1, 3, 4, 6 | 1.1167 | 0.2239 | 1.3333 | 0.2978 |
3 | 4 | 0.25 | 1, 2, 4, 8 | 1.1167 | 0.2239 | 4.5000 | 1.0075 |
4 | 3 | 0.33 | 1, 2, 3 | 0.7000 | 0.4762 | 0.0000 | 0.0000 |
5 | 5 | 0.2 | 0, 1, 6, 7, 9 | 3.0333 | 0.0659 | 18.500 | 1.2198 |
6 | 3 | 0.33 | 1, 2, 5 | 0.6500 | 0.5128 | 1.8333 | 0.9385 |
7 | 2 | 0.5 | 5, 8 | 0.7000 | 0.7143 | 3.3333 | 2.3786 |
8 | 2 | 0.5 | 3, 7 | 0.7500 | 0.6667 | 1.8333 | 1.2200 |
9 | 1 | 1 | 5 | 0.2000 | 5.0000 | 0.0000 | 0.0000 |
Liu et al. (
2019), the authors proposed the notion of Bridge Node Centrality (BNC) to diminish the importance of high-degree nodes and give more importance to nodes with larger route betweeness and closeness to the rest of the nodes in the network. The BNC (Liu et al.
2019) of a node (see Eq.
2) is computed as the product of its route betweenness (RBW) and bridgeness coefficient (BCE) in a normalized scale; the RBW for a node is the ratio of the sum of the weights of the paths (between any two nodes in the network) that go through the node and the degree of the node; the BCE of a node (a measure of closeness of the node) is the inverse of the sum of the lengths (of length > 2) of the shortest paths from the node to the rest of the nodes in the network. The notations used in Eq. (
2) represent the following:
R is the total number of shortest path routes in the network between any two nodes;
\(\omega _{j}\) =
Pj/
Lj, where
Pj and
Lj are the probability of information flow through route
j and
Lj is the length of route
j;
\(\delta _{j}\) = 1 if node
i is in route
j and is 0 otherwise;
ki is the degree of node
i; dist(
i,
j) is the number of hops in the shortest path route between nodes
i and
j (note that only dist(
i,
j) > 2 are considered in the computation of the BCE values of a node);
\(\overline{{RBW}} (i)\) and
\(\overline{{BCE}} (i)\) respectively represent the normalized RBW and BCE values of node
i.
$$RBW(i) = \sum\nolimits_{{j = 1}}^{R} {\frac{{\omega _{j} \delta _{j} }}{{k_{i} }}} \quad BCE(i) = 1/\sum\nolimits_{{j = 1}}^{n} {dist(i,j)} \quad BNC(i) = \overline{{RBW}} (i)*\overline{{BCE}} (i)$$
(2)
With the BNC metric, nodes with lower degree but a larger RBW (like the
bridge:
between-clusters nodes 7 and 8 in Fig.
1) are likely to have a larger BNC. However, nodes with larger degree are likely to suffer from a lower BNC unless they have a very high information flow. For example, a predominantly
bridge:
hub node like node 3 in Fig.
1 is likely to incur a lower BNC due to its larger degree and a moderate RBW.
Kleinberg et al. (
2008), the authors refer to a bridge node as a structural hole filling the gap between two or more nodes that are otherwise not directly connected and evaluate the various benefits the structural holes bring to a network from a game-theoretic perspective. A recent work (Yang and An
2020) quantifies the importance of a node on the basis of a metric called the
degree and structural hole count (DSHC) that is computed for a node
i per the following formula:
$$DSHC_{i} = \sum\limits_{{j \in \Gamma _{i} }} {\left( {\left( {\frac{1}{{k_{i} }} + \frac{1}{{k_{j} }}} \right)*\frac{1}{{1 + \Delta _{{ij}} }}} \right)} ^{2} ,$$
(3)
wherein Γ
i,
ki and
kj are respectively the set of neighbors of node
i, degree of node
i and degree of node
j, and Δ
ij indicates the number of structural holes that node
i fills by connecting node
j with the former's other neighbor nodes. Per the above formula, high-degree nodes with high-degree neighbor nodes and a larger value for the number of structural holes with regards to its neighbors are likely to incur a lower DSHC value (highly ranked as a structural hole). While applying the above formula for vertices 1 and 5 in the example graph of Fig.
1, we observe the DSHC scores for vertices 1 and 5 to be 0.1465 and 0.1625 respectively: implying, node 1 is highly ranked as a structural hole compared to node 5, whereas node 5 should be the top-ranked bridge node in this graph. The above formula for DSHC suffers from the same weakness that we had earlier highlighted for the BRC and BNC metrics. The formulation tries to combine node degrees (a measure of the hubness of the node as well as its neighbors) with the number of structural holes and comes up with a single quantitative measure that could overestimate or underestimate the role of a node as the bridge node.
Berahmand et al. (
2019), the authors propose a scalar centrality metric to quantify the extent to which a node can serve as an effective spreader of information. Without any formal name in Berahmand et al. (
2019), the metric is simply referred to as DCL in Berahmand et al. (
2019), probably due to its formulation (see Eq. 4 below) that involves terms based on the degree centrality, inverse of the clustering coefficient and the location information (considered in the form of the degree of the immediate neighbors) of a node. In Eq. (
4),
ki and
CCi respectively denote the degree and clustering coefficient (Meghanathan
2017b) of node
i, Γ
i denotes the set of neighbors of node
i and
\(|E(\Gamma _{i} )|\) denotes the average of the degrees of the neighbors of node
i. The clustering coefficient (Meghanathan
2017b) of a node is the ratio of the actual number of links connecting the neighbors of the node and the maximum possible number of links between the neighbors of the node. While the first and second terms of Eq. (
4) respectively incorporate the degree and inverse of the clustering coefficient of a node, the third term incorporates the location information of a node on the basis of the degrees of the neighbors of the node.
$$DCL_{i} = \left\{ {k_{i} } \right\}*\left\{ {\frac{1}{{CC_{i} + (1/k_{i} )}}} \right\} + \left\{ {\frac{{\sum\limits_{{j \in \Gamma _{i} }} {k_{j} } }}{{|E(\Gamma _{i} )| + 1}}} \right\}$$
(4)
The DCL formulation tends to give preference for nodes with larger degree, but a lower clustering coefficient and connected to high degree neighbors. Table
3 illustrates the calculation of the DCL metric values for the nodes in Fig.
1. We observe node 5 to incur the largest DCL score, justifying its roles as a
bridge:
hub node and a
bridge:
border node. However, the DCL score (the second largest score) of node 1 is greater than the DCL score of node 3 as well as greater than the DCL scores of nodes 7 and 8. Per the DCL metric, if a low degree node with lower clustering coefficient fails to have high degree neighbors (like nodes 7 and 8, one of whose neighbors have a lower degree), it is difficult to get highly ranked as a bridge node. Like the previously discussed metrics, the DCL metric also suffers from the weakness of combining multiple terms as part of its formulation in order to get computed as a scalar value.
Table 3
Computation of the DCL values for vertices in the graph of Fig.
1
0 | 1 | 1 | 1 | 5 | 5 | 5 | 1.3333 |
1 | 5 | 0.2 | 0.5 | 2, 3, 4, 5, 6 | 19 | 4.75 | 10.4472 |
2 | 4 | 0.25 | 0.67 | 1, 3, 4, 6 | 15 | 3.75 | 7.5057 |
3 | 4 | 0.25 | 0.5 | 1, 2, 4, 8 | 14 | 3.5 | 8.4444 |
4 | 3 | 0.33 | 1 | 1, 2, 3 | 13 | 4.33 | 4.6947 |
5 | 5 | 0.2 | 0.1 | 0, 1, 6, 7, 9 | 12 | 2.4 | 20.1961 |
6 | 3 | 0.33 | 0.67 | 1, 2, 5 | 14 | 4.67 | 5.4691 |
7 | 2 | 0.5 | 0 | 5, 8 | 7 | 3.5 | 5.5556 |
8 | 2 | 0.5 | 0 | 3, 7 | 6 | 3 | 5.5000 |
9 | 1 | 1 | 1 | 5 | 5 | 5 | 1.3333 |
Some of the other related approaches (primarily in the context of identifying the most influential nodes for spreading information) that exist in the literature are as follows: Ibnoulouafi et al. (
2018), the authors propose to quantify the spreading capability of a node (referred to as M-Centrality) as a weighted average of the K-shell index number of the node (computed as part of K-shell decomposition (Wang et al.
2016)) and the degree variation in the neighborhood of the node; the weight for computing the M-Centrality metric is determined by applying the entropy technique of He et al. (
2016) on the K-shell index numbers for the nodes and the extent of variation in node degree compared to the degrees of the neighbor nodes. The K-shell decomposition approach is likely to assign higher index numbers for nodes that are well-connected within a community (like node 4 in Fig.
1) but may not be even connected to nodes in other communities: not a favorable characteristic for identifying bridge nodes. On the other hand, the K-shell index numbers for lower-degree nodes as well as for higher-degree nodes connected to lower-degree nodes are likely to be lower. For example, in Fig.
1, the K-shell index number for critical bridge nodes like nodes 5, 7 and 8 is 2 each, whereas the K-shell index number for nodes 1, 2, 3 and 4 (that are part of a clique: a well-knit community) is 3 each. The variation in node degree compared to the degrees of the neighbor nodes is likely to be larger for nodes (like node 5) with higher degree, but connected to lower degree neighbors and lower for nodes (like nodes 7 and 8) with lower degree, but connected to higher-degree neighbors. Thus, from the perspectives of both the K-shell index numbers and the extent of variation in node degree compared to the neighbor nodes, the M-centrality metric may not be a suitable metric for identifying
bridge:
between-clusters nodes. For the problem of automatic keyword extraction, the authors in Vega-Oliveros et al. (
2019) first construct a ranking matrix of the vertices (keywords) in a co-occurrence graph of keywords with respect to a suite of nine different centrality metrics (local, global and intermediate-level metrics). They then subject this matrix to principal component analysis and rank the keywords based on the first principal component. Recently (Bucur
2020), a multi-centrality dataset of the nodes (i.e., containing the centrality values of the nodes with respect to two or more metrics) was used for training a (binary) classifier to predict whether or not a node with certain values for the centrality metrics will be a super spreader.
Overall, we observe the community-unaware bridgeness metrics such as BNC and BRC to give preference for nodes (such as the bridge: between-clusters nodes) with lower degree and larger information flow/betweenness or metrics such as the DCL give preference for nodes (bridge: border nodes) with larger degree, but lower clustering coefficient and connected to high-degree neighbors. On the other hand, we observe the community-aware bridgeness metrics to give preference for nodes (bridge: hub nodes) with larger degree and have one or more inter-community links. A common weakness that we noticed for several metrics under both the approaches is the intention to come up with a scalar score (typically a weighted average of two or more different measures: say, the extent of hubness and betweenness or a weighted average of the entries in a tuple that is an amalgamation of the classical centrality metrics). In this section, we have highlighted several scenarios wherein such a formulation would either underestimate or overestimate the role played by a node as bridge node and cannot comprehensively consider all the three possible topological positions of a bridge node. We need a tuple-based formulation that would take a fundamentally new approach to comprehensively capture the extent to which a node could serve as a bridge node with respect to all the three topological positions.