Classical centrality measures
An undirected graph or network is a pair 〈N,E〉, where N is a finite set of vertices or nodes and E is a set of edges e of the form {i,j} with i,j∈N, i≠j.
We define the set of neighbours of a node i in graph 〈N,E〉 as the set Ni(E)={j∈N:{i,j}∈E}, and the degree of i as the number di(E)=|Ni(E)| of neighbours of i in graph 〈N,E〉. With a slight abuse of notation, we denote by NS(E)={j∈N:∃i∈S s.t.j∈Ni(E)} the set of neighbours of nodes in S∈2N, S≠∅, and in the graph 〈N,E〉. A path between nodes i and j in a graph 〈N,E〉 is a finite sequence of nodes (i0,i1,...,ik), where i=i0 and j=ik, k≥1, such that {is,is+1}∈E for each s∈{0,⋯,k−1} and such that all these edges are distinct. Two nodes i,j∈N are connected in 〈N,E〉 if i=j or if there exists a path between i and j in E. The length of a path between i and j is the number of edges in the path and a shortest path between i and j is a path between i and j with minimum length. Let i∈N and S⊆N∖{i}.
Centrality measures assign to each node in a network a value that corresponds to some extent to the relevance of that node within the network structure. The four classical centrality measures considered in this paper are the following:
(1)
Degree centrality (
Nieminen 1974;
Shaw 1954): the degree centrality of
i∈
N is defined as |
Ni(
E)|, i.e. the number of neighbours of
i in graph 〈
N,
E〉. It is an index of the potential communication activity of a node.
(2)
Closeness centrality (
Beauchamp 1965;
Sabidussi 1966): the closeness centrality of node
i is defined as
\(\frac {|N|-1}{\sum _{j \in N} {h}(i,j)}\), where
h(
i,
j) is the distance between
i e
j, i.e. the length of the shortest path between
i and
j. It measures to what extent a node can avoid the control potential of the others nodes.
(3)
Betweeness centrality (
Bavelas 1948;
Freeman 1977): the betweenness centrality of a node
k is defined as
\(\sum _{i,j \in N} b_{ij}(k)\), where
\(b_{ij}(k)=\frac {g_{ij}(k)}{g_{ij}}\) and
gij is the number of shortest paths between nodes
i and
j, while
gij(
k) is the number of shortest paths between nodes
i and
j that contain
k. It is an index of the potential of a node for control of communication.
(4)
Eigenvector centrality (
Bonacich 1972): the eigenvector centrality of a node
i is defined as the
i−
th element of the principal eigenvector of the adjacency matrix
A=(
aij) corresponding to 〈
N,
E〉, where
aij=1 if {
i,
j}∈
E and
aij=0 otherwise. It assigns high centrality to nodes that are highly connected to nodes who themselves have high centrality.
A game-theoretic relevance index
Let 〈
N,
E〉 be a
co-expression network, that is a network where the set of nodes
N represents a set of genes and the set of edges
E describes the interaction among genes, i.e. there exists an edge between two genes if they are directly interacting in the biological condition under analysis. Moreover, let
\(k \in \mathbb {R}^{N}\) be a parameter vector that specifies the a priori importance or
weight of each gene. According to
Cesari et al. (2017), we define the
coalitional game\(\left (N,v_{E}^{k}\right)\), where
N is the set of genes under study and the
characteristic function\(v_{E}^{k}\) assigns a worth to each coalition of genes
S⊆
N representing the overall magnitude of the interaction between the genes in
S, which takes into account the weight (i.e., the a priori importance) of each gene directly connected to
S in the biological network. More precisely, the map
\(v_{E}^{k}:2^{N} \rightarrow \mathbb {R}\) assigns to each coalition
S∈2
N∖{
∅} the value
$$ v_{E}^{k}(S)=\sum_{j \in S \cup N_{S}(E)} k_{j} $$
(1)
that is the sum of the weights associated to the genes in
S and to the ones that are directly connected in 〈
N,
E〉 to some genes in
S (by convention,
\(v_{E}^{k}(\emptyset)=0\)) (
Cesari et al. 2017). The class of games (
N,
v) defined according to relation (
1), on some gene network
G≡〈
V,
E〉 and with parameter
\(k \in \mathbb {R}^{N}\), is denoted by
\(\mathcal {EK}^{N}\).
A well-known
solution for coalitional games is the Shapley value (
Shapley 1953), which was introduced in 1953 and since then applied to a wide range of fields, including biology (
Moretti and Patrone 2008). The Shapley value
ρ(
v) of a game (
N,
v) is defined as the average of marginal vectors over all |
N|! possible orders in
ΣN (|
N| is the cardinality of the set
N). In formula
$$ {\rho}_{i}(v)=\sum_{\sigma \in \Sigma_{N}} \frac{m_{i}^{\sigma}(v)}{|N|!} \makebox{ for all} i \in N, $$
(2)
where ΣN is the set of all possible permutations of the elements in N and σ(i)=j means that with respect to σ player i is in the j-th position, and where the marginal vector\(m^{\sigma }(v) \in \mathbb {R}^{N}\) is defined by \(m_{i}^{\sigma }(v)=v(\{j \in N:\sigma (j) \leq \sigma (i)\})-v(\{j \in N:\sigma (j) < \sigma (i)\})\) for each i∈N (i.e., \(m_{i}^{\sigma }(v)\) is the marginal contribution of player i to the coalition of players with lower positions in σ).
In the paper (Cesari et al.
2017), the authors have shown that the Shapley value is the unique
relevance index for genes (defined as a map
\(\rho :\mathcal {EK}^{N} \rightarrow \mathbb {R}^{N}\)) which satisfies four desired properties, namely, symmetry, the dummy player property, efficiency and star-additivity. Three out of these four properties are natural axioms borrowed from the related literature on cooperative games: the property of symmetry requires that if two genes
i and
j have the same a priori weight (
ki=
kj) and in addition, they are connected to the same set of neighbours in a network, then they should have the same relevance; the dummy player property basically implies that the relevance of a disconnected node in a network coincides with its a priori importance; the efficiency axiom determines the scale of measure, setting the sum of the relevance of all genes equal to the sum of their a priori weights. The fourth property introduced in the paper (Cesari et al.
2017) to axiomatically characterize the Shapley value on the class
\(\mathcal {EK}^{N}\), i.e., the star-additivity axiom, says that increasing the a priori weight of a node
i from 0 to a positive value should affect the relevance of gene
i and its neighbours at the same extent, for whatever graph. Consequently, reallocating the a priori importance of a node among its neighbours, the star additivity property catches the idea of measuring the ability of nodes to absorb the changes in expression of correlated genes, as previously discussed in the motivating example illustrated in “
A motivating example” section.
A practical limitation inherent to most of the applications of the Shapley value is the computational burden related to its calculus. Surprisingly, in the paper (Cesari et al.
2017) the authors have shown that the Shapley value of a coalitional game
\(\left (N,v_{E}^{k}\right)\) can be computed according to the following much simpler relation:
$$ {\rho}_{i}\left(v_{E}^{k}\right)=\sum_{j \in (N_{i}(E) \cup \{i\})} \frac{k_{j}}{d_{j}(E)+1}, $$
(3)
for each i∈N.
According to relation (
3), a gene connected to many genes who themselves have a low degree gets a high Shapley value (in other words, the relevance of a gene increases with the number of its neighbours having a low degree). Relation (
3) also suggests that genes with a high Shapley value would be able to interact directly with the maximum number of other nodes in the network and its removal would split the network in a maximum number of connected components with few genes, or eventually constituted by isolated genes. As shown in the paper (Aadithya
2010), relation (
3) can be calculated via an
\(\mathcal {O}(|N| + |E|)\) procedure, which makes possible its computation on very large networks (recall that on a network 〈
N,
E〉, calculating betweenness centrality takes
\(\mathcal {O}(|N| |E|)\) time using Brandes’ algorithm (
Brandes 2001). We conclude this section showing the results of relation (
3) applied to the motivating example in “
A motivating example” section.
Another way to keep into account the a priori importance of genes was proposed in the paper (Moretti et al.
2010) by means of the so-called
association game, where a set of key-genes
K⊂
N (e.g. a set of genes known a priori to be involved in biological pathways related to chromosome damage) is considered and the value assigned to a coalition
S is the number of key-genes interacting only with
S formally, in the paper (Moretti et al.
2010) the value assigned to a coalition
S⊆
N is the cardinality of the set {
i∈
K:
Ni(
E)⊆
S}. However, the definition proposed in relation (
1) seems more flexible to explore all possibilities of reciprocal influence among genes. It generalizes the game introduced in the papers (Suri and Narahari
2008; Aadithya et al.
2010) for determining the “top-
k nodes” in a co-authorship network, by the introduction of a parameter that specifies the a priori importance of each node. The parameter vector
k allows for an a priori ranking of the genes according to their importance, while in the previous model introduced in the paper (Moretti et al.
2010) only a two-level distinction was made between key-genes and non key-genes. Moreover, by measuring to what extent a coalition of genes is connected to the rest of the network, relation (
1) generalizes the notion of degree centrality for groups of genes, which is justified by some practical evidences showing a strong correlation between the degree centrality and genes that are essential for different biological functions (see, for instance, (Bergmann et al.
2004; Carlson et al.
2006; Jeong et al.
2001; Junker et al.
2006; Zampetaki et al.
2010). In fact, if only the weight of genes inside a coalition was to be considered (and not the one of the neighbours, as in our definition), the centrality measure obtained through relation (
3) would coincide with a “weighted” degree centrality.