In this section, we provide an experimental evaluation of
IWDS on synthetic and real networks.
1 The design of a strong evaluation scheme for our algorithm is not simple, since we have to face two main issues:
1.
Existing methods for computing the top
k overlapping subgraphs (Galbrun et al.
2016) are defined for unweighted graphs and cannot be used on dual networks.
2.
Existing network alignment algorithms do not aim to extract top k densest subgraphs.
Consequently, we cannot easily compare our approach with the existing state of the art methods, and we design an ad-hoc procedure for the evaluation of our method based on the following steps. First, we consider the performance of our approach on synthetic networks. In this way, we show that, in many of the cases we considered,
IWDS can correctly recover top
k weighted densest subgraphs. Then we apply our method to four real-world dual networks.
Synthetic networks
In the first part of our experimental evaluation, we analyse the performance of IWDS to find planted ground-truth subgraphs on synthetic datasets.
Datasets. We generate two noiseless synthetic datasets, consisting of \(k=5\) planted dense subgraphs (cliques). Synthetic1 contains five non-overlapping ground-truth subgraphs, while Synthetic3 contains five overlapping ground-truth subgraphs.
In Synthetic1, each planted dense subgraph contains 30 nodes and has edge weights randomly generated in the interval [0.8, 1]. In Synthetic3, each planted dense subgraph contains 20 nodes not shared with other planted subgraphs. The subgraphs are arranged in a cycle, 5 nodes of each subgraph are shared with the subgraph on one side and 5 nodes are shared with the subgraph on the other side. Edge weights are randomly generated in the interval [0.8, 1].
These cliques are then connected to a background subgraph of 100 nodes. We consider three different ways to generate the background subgraph: Erdös–Renyi with parameter \(p=0.1\), Erdös–Renyi with parameter \(p=0.2\) and Barabasi–Albert with parameter equal to 10. Weights of the background graphs are randomly generated in interval [0, 0.5]. Then 50 edges connecting cliques and the background graph are randomly added (with weights randomly generated in interval [0, 0.5]).
Based on this approach, we generate four different sets of synthetic networks, called Synthetic1, Synthetic2, Synthetic3 and Synthetic4. Synthetic1 (for the non-overlapping case) and Synthetic3 (for the overlapping case) are generated as described above. Synthetic2 and Synthetic4, respectively, are obtained by applying noise to the synthetic networks in Synthetic1, Synthetic3, respectively. The noise is added by varying 5%, 10% and 15% of node relations of each network. A set of pairs of nodes are chosen randomly: if they belong to the same clique, the weight of the edge connecting the two nodes is changed to a random value in the interval [0, 0.5]; else an edge connecting the two nodes is (possibly) added (if not already in the network) and its weight is randomly assigned a value in the interval [0.8, 1].
Outcome. We present the results of our experimental evaluation, in particular, the average running time, density, distance and F1-score,
2 varying the parameter
\(\alpha\). We recall that F1-score is the average mean of precision and recall, and, as in Galbrun et al. (
2016) we consider this measure to evaluate the accuracy of our method to detect the ground-truth subgraphs. Following Yang and Leskovec (
2012), we consider the number of shared nodes between each ground-truth subgraph and each detected subgraph, so that we are able to define the best-matching of ground-truth subgraphs and detected subgraphs. Then, we compute the
F1[
t/
d] measure as the average F1-score of the best-matching ground-truth subgraph to each detected subgraph (
truth to detected) and
F1[
d/
t] measure as the average F1-score of the best-matching detected subgraph to each ground-truth subgraph (
detected to truth). Notice that in most of the cases considered, the running time of
IWDS increases with the increasing of
\(\alpha\). Also, generally, the solutions returned by
IWDS for larger values of
\(\alpha\) are denser than for small values, while the solutions with small values of
\(\alpha\) have a higher value of distance (hence the subgraphs returned have a smaller overlapping).
Tables
1 and
3 report average results of running time (in minutes), density, distance and F1 scores for the two noiseless datasets. Table
1 shows the experimental results for the noiseless
Synthetic1 dataset, where ground-truth subgraphs are disjoint. In this case
IWDS is able to detect the ground-truth subgraphs for all values of
\(\alpha\), averaged over 300 examples. Table
2 shows the experimental results for the noiseless
Synthetic3 dataset, where ground-truth subgraphs are overlapping. In this case the best performances are achieved for
\(\alpha =0.75\), where F1[t/d] = 0.745, while F1[d/t]= 0.804. The experimental results show that F1[d/t] increases with
\(\alpha\), in particular for lower values of
\(\alpha\) (
\(\alpha \le 0.25\)) the performance of
IWDS for this measure is poor. We observe that for values of
\(\alpha \ge 0.5\), the F1[t/d] measure decreases as
\(\alpha\) increases.
Table 1
Performance of IWDS on non overlapping generated networks (called synthetic1) for \(k=5\), varying \(\alpha\) from 0.05 to 0.9, the running time (in minutes), the density and the distance are averaged over 300 examples
Time | 0.0188 | 0.0187 | 0.0194 | 0.0217 | 0.0259 | 0.0231 |
Density | 65.28 | 65.28 | 65.28 | 65.28 | 65.28 | 65.28 |
Distance | 20 | 20 | 20 | 20 | 20 | 20 |
F1[t/d] | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
F1[d/t] | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Table 2
Performance of IWDS on overlapping generated networks (called synthetic3) for \(k=5\), varying \(\alpha\) from 0.05 to 0.9, the running time (in minutes), the density and the distance are averaged over 300 examples
Time | 0.0104 | 0.0128 | 0.0145 | 0.0170 | 0.0188 | 0.0209 |
Density | 21.00 | 23.41 | 32.29 | 46.11 | 57.95 | 64.22 |
Distance | 18.178 | 17.473 | 16.321 | 15.853 | 15.741 | 15.344 |
F1[t/d] | 0.509 | 0.415 | 0.689 | 0.768 | 0.745 | 0.456 |
F1[d/t] | 0.101 | 0.157 | 0.331 | 0.583 | 0.804 | 0.923 |
Tables
3 and
4 show the performances of
IWDS on the noisy datasets
Synthetic2 and
Synthetic4. Recall that for these datasets, we consider noise values of 0.05, 0.10 and 0.15. The results we present are averaged over 90 examples. As for the noiseless datasets, we vary the value of parameter
\(\alpha\).
Table 3
Performance of IWDS on non overlapping generated networks with added noise varying from 0.05 to 0.15 (called synthetic2) for \(k=5\), varying \(\alpha\) from 0.05 to 0.9, the running time (in minutes), the density and the distance are averaged over 90 examples
0.05 |
Time | 0.0181 | 0.0181 | 0.0203 | 0.0214 | 0.0222 | 0.0236 |
Density | 65.46 | 65.46 | 65.48 | 65.53 | 65.55 | 65.55 |
Distance | 20 | 19.998 | 19.996 | 19.996 | 19.991 | 19.850 |
F1[t/d] | 0.989 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
F1[d/t] | 0.990 | 0.991 | 0.993 | 0.996 | 0.998 | 0.995 |
0.10 |
Time | 0.0187 | 0.0179 | 0.0207 | 0.0199 | 0.0233 | 0.0238 |
Density | 65.42 | 65.42 | 65.53 | 65.72 | 65.89 | 66.00 |
Distance | 19.999 | 19.999 | 19.986 | 19.976 | 19.960 | 19.847 |
F1[t/d] | 0.978 | 0.968 | 1.00 | 1.00 | 1.00 | 1.00 |
F1[d/t] | 0.960 | 0.962 | 0.970 | 0.982 | 0.992 | 0.994 |
0.15 |
Time | 0.0126 | 0.0131 | 0.0164 | 0.0194 | 0.0230 | 0.0241 |
Density | 36.63 | 39.35 | 43.03 | 51.57 | 59.67 | 64.47 |
Distance | 19.439 | 19.111 | 18.218 | 18.112 | 18.083 | 18.056 |
F1[t/d] | 0.93 | 0.95 | 0.93 | 0.98 | 0.95 | 0.94 |
F1[d/t] | 0.41 | 0.47 | 0.54 | 0.70 | 0.85 | 0.94 |
Table 4
Performance of IWDS on overlapping generated networks with added noise varying from 0.05 to 0.15 (called synthetic4) for \(k=5\), varying \(\alpha\) from 0.05 to 0.9, the running time (in minutes), the density and the distance are averaged over 90 examples
0.05 |
Time | 0.0090 | 0.0112 | 0.0149 | 0.0180 | 0.0203 | 0.0205 |
Density | 21.34 | 24.89 | 32.09 | 45.92 | 57.31 | 63.50 |
Distance | 18.361 | 17.506 | 15.823 | 15.550 | 15.220 | 15.024 |
F1[t/d] | 0.649 | 0.660 | 0.563 | 0.692 | 0.631 | 0.527 |
F1[d/t] | 0.131 | 0.228 | 0.332 | 0.589 | 0.806 | 0.927 |
0.10 |
Time | 0.0098 | 0.0118 | 0.0137 | 0.0178 | 0.0195 | 0.0212 |
Density | 21.95 | 25.54 | 32.72 | 46.35 | 58.43 | 65.25 |
Distance | 18.275 | 17.260 | 15.761 | 15.229 | 14.876 | 13.847 |
F1[t/d] | 0.648 | 0.568 | 0.567 | 0.595 | 0.548 | 0.463 |
F1[d/t] | 0.131 | 0.225 | 0.330 | 0.581 | 0.807 | 0.936 |
0.15 |
Time | 0.0092 | 0.0113 | 0.0149 | 0.0178 | 0.02 | 0.0218 |
Density | 22.40 | 26.06 | 32.98 | 46.68 | 58.91 | 65.75 |
Distance | 18.213 | 17.189 | 15.332 | 14.932 | 14.263 | 12.717 |
F1[t/d] | 0.624 | 0.555 | 0.501 | 0.539 | 0.419 | 0.303 |
F1[d/t] | 0.134 | 0.223 | 0.336 | 0.586 | 0.811 | 0.938 |
For Synthetic2, for noise value 0.05 and 0.10, we obtain near optimal solutions for all the cases considered. The performances of IWDS starts to degrade with noise equal to 0.15, in particular the values of F1[d/t] for \(\alpha \le 0.25\). F1[t/d] is instead close to 1 (at least 0.93) for the values of \(\alpha\) considered.
For Synthetic4, the added noise has a significant impact on the quality of computed solutions, even for noise value equal to 0.05. While the noise increasing has a limited effect on IWDS for small value of \(\alpha\) (\(\alpha \le 0.25\)), for higher values of \(\alpha\) leads to a degrade in performance, in particular for F1[t/d].
Dual networks
We evaluate IWDS on four real-world dual network datasets:
Datasets. G-graphA. The G-graphA dataset is derived from the GoWalla social network where users share their locations (expressed as GPS coordinates) by checking-in into the web site (Cho et al.
2011). Each node represents a user and each edge links two friends in the network. We obtained the physical network by considering friendship relation on the social network. We calculated the conceptual network by considering the distance among users. Then we run the first step of our algorithm and we obtained the alignment graph
G-graphA, containing 2,241,339 interactions and 9878 nodes (we set
\(\delta\)=4). In this case a DCS represents set of friends that share check-ins in near locations.
DBLP-graphA. The DBLP-graphA dataset is extracted from a computer science bibliography and represents interactions between authors. Nodes represent authors and edges represent connections between two authors if they have published at least one paper together. Each edge in the physical network connects two authors that co-authored at least one paper. Edges in the conceptual network represent the similarity of research interests of the authors calculated on the basis of all their publications. After running the first step of the algorithm (using \(\delta\)=4), we obtained an alignment graph DBLP-graphA dataset containing 553,699 interactions and 18,954 nodes. In this case a DCS represents a set of co-authors that share some strong common research interests and the use of DNs is mandatory, since physical network shows only co-authors that may not have many common interests and the conceptual network represents authors with common interest that may not be co-authors.
HS-graphA.
HS-graphA is a biological dataset and is taken from the STRING database (Szklarczyk et al.
2016). Each node represents a protein, and each edge takes into account the reliability of the interactions. We use two networks for modelling the database: a conceptual network represents such reliability value; and a physical network stores the binary interactions. The
HS-graphA dataset contains 5,879,727 interactions and 19,354 nodes (we set
\(\delta\)=4).
Protein-interaction We extracted from the STRING database a subnetwork of proteins involved into the SARS-COV-2 infection (Szklarczyk et al.
2016). The physical network contains interacting proteins, while the conceptual network contains the strength of the association among them.
Protein-Interaction contains 192 nodes and 418 edges (Table
5).
Table 5
Properties of the alignment graphs obtained for each dataset
DBLP-graphA | Co-authorship | 18,954 | 553,699 |
G-graphA | Social | 9878 | 2,241,339 |
HS-graphA | Protein interactions | 19,354 | 5,879,727 |
Protein-interaction | Protein interactions | 192 | 418 |
Outcome
For these large size datasets, we set the value of
k to 20, following the approach in Galbrun et al. (
2016). Table
6 reports the running time of
IWDS, and the density and distance of the solutions returned by
IWDS. As for the synthetic datasets, we consider six different values of
\(\alpha\). As shown in Table
6, by increasing the value of
\(\alpha\) from 0.05 to 0.5,
IWDS (except of one case,
HS-graphA with
\(\alpha =0.1\)) returns solutions that are denser, but with lower distance.
Table
6 shows also how the running time of
IWDS is influenced by the size of the network and by the value of
\(\alpha\). We have put a bound of 20 h on the running time of
IWDS and the method was not able to return a solution for HS-graphA for
\(\alpha \ge 0.5\) within this time. The running time is influenced in particular by the number of edges of the input network. DBLP-graphA and HS-graph-A have almost the same number of nodes, but HS-graph-A is much more denser than DBLP-graphA.
IWDS for the former network is remarkably slower than for DBLP-graphA (1.986 slower for
\(\alpha =0.05\), 6.218 slower for
\(\alpha =0.25\)). The running time of
IWDS is considerably influenced by the value of parameter
\(\alpha\), since it increases as
\(\alpha\) increases. Indeed by increasing the value of
\(\alpha\), less nodes are removed by Case 1 and Case 2 of
IWDS, hence in iterations of
IWDS V-Greedy is applied to larger subgraphs. This fact can be seen in particular for HS-graphA, for which
IWDS failed to terminate within 20 h when
\(\alpha \ge 0.5\).
Table 6
Performance of IWDS on real-world network for \(k=20\), varying \(\alpha\) from 0.05 to 0.9. For each network, we report the running time in minutes, the density and the distance
Alignment-graph |
Time | 0.055 | 0.058 | 0.062 | 0.065 | 0.068 | 0.068 |
Density | 28.14 | 30.45 | 37.14 | 46.44 | 47.73 | 52.94 |
Distance | 378.76 | 373.61 | 359.94 | 351.50 | 347.81 | 339.17 |
G-graphA |
Time | 89.84 | 98.72 | 184.87 | 336.72 | 426.56 | 486.68 |
Density | 2863.99 | 4000.73 | 6345.67 | 10989.07 | 9297.13 | 10737.01 |
Distance | 275.82 | 257.84 | 220.16 | 210.79 | 196.06 | 193.02 |
DBLP-graphA |
Time | 105.69 | 125.71 | 165.25 | 212.07 | 251.08 | 277.39 |
Density | 39.61 | 52.39 | 74.12 | 91.13 | 97.25 | 98.78 |
Distance | 307.72 | 231.25 | 213.04 | 204.37 | 198.54 | 196.96 |
HS-graphA |
Time | 209.88 | 749.06 | 1027.58 | – | – | – |
Density | 1326.07 | 1153.68 | 1799.22 |
Distance | 226.40 | 212.34 | 205.55 |
Biological evaluation of results
For biological data there is the possibility to evaluate the relevance of the results considering the relevance of the biological knowledge that results may convey.
Biological data are usually annotated with terms extracted from ontologies, e.g. Gene Ontology (Guzzi et al.
2012). Consequently, experiments of analysis of biological data may evaluated in terms of the biological knowledge inferred from the analysis of data and in terms of the statistical relevance of the results themselves. For instance, given a DCS extracted from two biological networks, it is interesting to determine the biological meaning of the DCS and how this is relevant, i.e. how this DCS may convey biological relevance with respect to another random one. Usually, subgraphs of biological networks may represent groups of interacting proteins sharing some common functions or playing similar biological roles. Consequently, it is possible to evaluate the biological relevance of obtained results by considering the role of proteins. Such information are stored and organised into biological ontologies such as Gene Ontology (GO) (Harris et al.
2004). GO functional enrichment has been proposed to evaluate the significant presence of common roles or function in a solution represented as a list of genes/proteins. It has been shown that the use of semantic similarities (SS) (Guzzi et al.
2012) is a feasible and efficient way to quantify biological similarity among proteins. SS measures are able to quantify the functional similarity of pairs of proteins/genes, comparing the GO terms that annotate them, therefore proteins that share the biological role have high values of semantic similarity. As a consequence, genes/proteins that are found in the same solution should have a semantic similarity significantly higher than random expectation. These considerations have been used during the design of the evaluation of our results that we adapted from the evaluation scheme proposed in Mina and Guzzi (
2014).
Given a DCS
\(DCS_k\) we calculate its internal semantic similarity
\(SS_{DCS_k}\) as the average semantic similarity of all the nodes pairs of the DCS as follows:
$$\begin{aligned} SS_{DCS_k} =\frac{\sum _{n_i \in DCS_k}\sum _{n_j \in DCS_k,j \ne i} SS_{DCS_k}(n_i,n_j)}{\Vert SS_{DCS_k}\Vert \Vert SS_{DCS_k-1}\Vert } \end{aligned}$$
(1)
We compare the DCS extracted from the biological network against random ones obtained by randomly sampling the input networks to prove their statistical significance. Given a DCS
\(DCS_i\), we can test the null hypothesis:
\(H_1^0\): the average semantic similarity of the protein internals to the DCS
\(SS(DCS_i)\) is higher than by chance, where the background distribution can be estimated from the semantic similarity of random subgraphs
\(RS_i\) taken from the alignment graph
\(SS(RS_i)\), using for instance 0.05 as significance level.
Consequently we design this test as described in the following algorithm:
-
Let \(DCS_{i}\) be a given DCS;
-
Let \(SS(DCS_{i})\) be its internal semantic similarity
-
Let \(V_s\) be the set of 100 random subgraph with same size \(V_s\)={\(RS_{j}\)} j=0,..,99
-
For Each \(RS_{j} \in V_s\) calculate \(SS_j(RS_{j})\) the internal semantic similarity of each random solution
-
Compare \(SS(DCS_{i})\) and all the \(SS_j(RS_{j})\) using a non parametric test
-
Accept or Refuse the Hypothesis \(SS(DCS_{i})\) is significantly higher than \(SS_j(RS_{j})\)
Consequently, for each graph in the solution we generate 100 random graphs of the same size, by sampling the obtained alignment graph. For each graph we calculated its internal semantic similarity using the Resnick measure (Resnik
1999). Results demonstrate that our solution is biologically relevant and the relevance is higher than by chance as summarised in Table
7.
Table 7
Comparison of the average semantic similarity for the two biological networks considered
Random solutions | \(0.3 \pm 0.1\) |
DCS | \(0.6 \pm 0.1\) |