Similarity of networks
Existing similar research has considered the count or the percent of nodes/edges discovered during a network’s discovery. For a network
G, this was done by measuring the percent that has been discovered at step
i with the subgraph
G
i
through tracking |
V(
G
i
)|/|
V(
G)| and |
E(
G
i
)|/|
E(
G)|. While both are great intuitive measures, they only captures the cardinality of sets of nodes and edges discovered, but not so much the topology of the network itself (
Davis et al. 2016;
Chen et al. 2017;
Wijegunawardana et al. 2017).
Common metrics for measuring general network similarity use comparison of degree distributions, density, clustering coefficient, average path length and etc. Inexact matching of two networks after a sequence of edits is commonly measured by Graph Edit Distance (GED). GED measures the cost of adding/removing/substituting nodes and edges to make one graph look like the other one. This works well for shortest paths rather than arbitrary graphs. Algorithms use combinatorial searches over the space of possible edits, therefore they are computationally intensive.
To optimize this idea,
Kernel functions are used that explore regions of the network to be matched using GED such as:
-
Node/Edge Kernel: For labeled graphs, whenever two nodes/edges have the same label, the kernel function value is 1, otherwise 0. If the node/edge labels take real values, then a Gaussian kernel is used.
-
Path kernel: Whenever two paths (sequences of alternating nodes and edges) are of the same length, the path kernel is the product of the kernels of the node and edge on the paths. If the length of the common paths is different (i.e., the algorithm didn’t detect a common path between the networks), the value of the path kernel function is 0.
-
Graph kernel: The graph kernel compares subgraphs of each of the graphs by comparing the proportion of all common paths out of all possible paths in each of the two subgraphs.
Kashima and Inokuchi (
2002) map networks to feature vectors and then cluster the vectors based on Naïve distance methods. Attributes about the data are needed in order to create the feature vectors, which they mine using search engines on the Web.
More sophisticated methods consider capturing networks’ topology before comparing them. For example, Gromov–Hausdorff (G-H) distance uses shape analysis. The network’s shape is constructed by piecing together small subgraphs whose similar structure is easy to find. This shape is then transformed into a linkage matrix that captures how these subgraphs interconnect. Then the G-H distance is the farthest distance any node of a network
G is from the network
H, or the farthest any node of
H is from
G, whichever is greater, taken over all possible embedding (drawing) of the two networks
G and
H (
Lee et al. 2011).
Similarly, Pržulj uses graphlets (
Pržulj 2007) to capture the topology of a network. Graphlets are all possible subgraphs of small number of nodes capturing the local structure properties. Graphlet Frequency Distribution can be used to compare networks, by keeping track of the frequencies of all the different size graphlets (
Rahman et al. 2014). This is unfeasible for large graphs as it requires an exact count of each graphlet. It has been used in comparing aerial images (
Zhang et al. 2013), scene classification plays an important role in multimedia information (
Zhang et al. 2011), and learning on synthetic networks (
Janssen et al. 2012).
Koutra et al. proposed DeltaCon (
Koutra et al. 2013), a scalable algorithm for same size networks, based on nodes influence. It compares the two networks with the exact same node set, by computing each node’s influence on the other network’s nodes. These values are stored as matrices for each network, and the difference between the matrices is then measured to give an affinity score measuring similarity.
Feasible inexact matching of different size graphs use spectral analysis. The spectrum of classes of graphs has been questioned since 1956 (
Günthard and Primas 1956), and further studied since then (
VanDam and Haemers 2003;
Florkowski 2008;
Gera and Stanica 2011). General graph theoretical results are known since the 70
s (
Cvetković 1971), reviewed in (
Chung 1997), and later extended for complex networks (
VanMieghem 2010). Spectral clustering of graphs can use the eigenvalues of several matrices (
Wilson and Zhu 2008). We use the
adjacency matrix
A, the
Laplacian L=
D−
A (where
D is the degree matrix), and
normalized Laplacian defined as the matrix:
$$L(u,v) = \left\{ \begin{array}{ll} 1 & \quad \text{if} \ u=v \ \text{and} \ d_{v} \ \neq 0,\\ -1/\sqrt{d_{v} \cdot d_{u}} & \quad \text{if } u \neq v, \text{ and } u \text{ adjacent to } v,\\ 0 & \quad \text{otherwise,} \end{array}\right. $$
where
d
v
denotes the degree of the vertex
v.
The eigenvalues of each of these matrices define the spectrum of the network. While the existence of cospectral graphs (i.e. graphs that share the same adjacency matrix spectrum) is known since 1971 (
VonCollatz and Sinogowitz 1957;
Harary et al. 1971) and ongoing research considers the graphs that are determined by their spectra (
VanDam and Haemers 2003), spectral analysis is very useful in comparing networks as we explain next. Particularly, since finding these co-spectral graphs is “out of reach” (
Schwenk 1973;
Godsil and McKay 1982).
Eigenvalue analysis is used to describe the behavior of a dynamic system (
Trefethen and Bau III 1997), and in our case, the behavior of a network representing the system. To see its relevance in comparing networks, note that eigenvalues measure the node cluster cohesiveness or community structure that has widely been studied in network science. Moreover, the algebraic connectivity of the graph, and thus the spectra, captures the topology of the graph (
Frankl and Rödl 1987). Of particular interest for us, is that the spectral clustering can differentiate between the structural equivalence and the regular equivalence of nodes. For the structural equivalence, nodes are placed in the same community if they have similar connection patterns to the
same neighbors. For the regular equivalence, nodes are placed in the same community if they have similar connection patter to
any neighbors. Of course, this can be extended to probabilistic models where stochastic equivalences are introduced based on groups being stochastically equivalent if their respective connecting probabilities to neighbors are the same.
The distribution of eigenvalues of the adjacency matrix can be found in Chung (
1997), focused on the correlation of the range of the distribution of eigenvalues to the type of graph. This was further studied as the behavior of the distribution of the eigenvalues of the graph, such as convergence results in (
Dumitriu and Pal 2012). Correlations between the power law distribution of the graph and the distribution of the eigenvalues have been presented in (
Mihail and Papadimitriou 2002). Analyzing several real graphs, they inferred that if the degrees of the graph
d1…
d
n
were power law distributed, then there is a high probability that the eigenvalues of the graph will be power law distributed (and take on the values
\(\sqrt {d_{1}}\ldots \sqrt {d_{n}}\)) (
Mihail and Papadimitriou 2002).
The distribution of the eigenvalues of the Laplacian is more closely linked to the structure of the graph than only using the eigenvalues of the adjacency list (
Chung 1997). The
normalized Laplacian contains the degree distribution as well as the adjacency matrix information from the graph. While spectral analysis was previously used to cluster similar trees and synthetic graphs (
Wilson and Zhu 2008), we use the spectra within a different methodology based on nonparametric statistics.
Nonparametric statistical tests can capture whether two samples of a network are similar without actually knowing the true networks. We compare the eigenvalues distributions of two samples (subgraphs) and test the assumption they came from the same distribution. The alternative hypothesis is there is significant change between the two samples. By looking at the actual step by step inference using Gera (
2015), we could visually see small differences between consecutive snapshots. No major changes happen in consecutive snapshots.
We use the nonparametric test of Ruth and Koyak (
2011), where the first
m of
N observations
X1⋯
X
m
⋯
X
N
are assumed to follow distribution
F1 and the rest are from
F2. This allows us to see a “shift point" at
Xm+1 where our samples are no longer from the same distribution. For our research, each observation
X
i
is the eigenvalue distribution of a sampled/inferred graph. In our case, the null hypothesis is the the distributions of eigenvalues of the two samples are the same. Each test will yield a
p-value representing the probability that the test statistic would be as extreme or more extreme than what was observed with a particular sample, assuming the null hypothesis is true. If two samples have very different eigenvalue distributions, the null hypothesis is less plausible and the
p-value is low. If two distributions of eigenvalues match, that is strong evidence in support of the null hypothesis, and any other outcome would be more extreme than what was observed with a have close to 1
p-value.
Scaling the Erdös-Rényi networks
We will use the Erdös-Rényi random graph model for a scaling analysis of the metric. The Erdös-Rényi random graph model is characterized by two parameters: the number of nodes (or graph size) N and the connectivity probability α, where α is defined as the fraction of the N(N−1)/2 independent non-vanishing off-diagonal adjacency matrix elements. All the nodes are isolated when α=0, whereas we have a complete graph for α=1.
From a
random matrix theory point of view it is a common practice to look for the scaling parameter(s) of a random matrix model; in this way the
universal properties of the model can be revealed (
Méndez-Bermúdez et al. 2017a;
2017b). Scaling studies of the Erdös-Rényi random graph model can be found in Méndez-Bermúdez et al. (
2015) and Martínez-Mendoza et al. (
2013). The average degree is then
$$ \xi = \alpha \times N, $$
(1)
where ξ is the mean number of nonzero elements per adjacency matrix row, called the scaling parameter of the Erdös-Rényi random graph model. In particular, it was shown that spectral, eigenfunction, and transport properties are universal (i.e. equivalent) for a fixed average degree ξ.
In fact, several papers have been devoted to analytical and numerical studies of the Erdös-Rényi random graph model as a function of
ξ. Among the most relevant results of these studies we can mention that: (i) In the very sparse case (
ξ→1) the density of eigenvalues was found to deviate from the Wigner semicircle law with the appearance of singularities, around and at the band center, and tails beyond the semicircle (
Rodgers and Bray 1988;
Rodgers and deDominicis 1990;
Mirlin and Fyodorov 1991;
Evangelou and Economou 1992;
Evangelou 1992;
Semerjian and Cugliandolo 2002;
Khorunzhy and Rodgers 1997;
Kühn 2008;
Rogers et al. 2008;
Slanina 2011); (ii) A delocalization transition of eigenstates was found at
ξ≈1.4, see Mirlin and Fyodorov (
1991); Evangelou and Economou (
1992); Evangelou (
1992) and Fyodorov and Mirlin (
1991); (iii) The nearest-neighbor eigenvalue spacing distribution
P(
s) was found to evolve from the Poisson to the Gaussian Orthogonal Ensemble (GOE) predictions for increasing
ξ, see Jackson et al. (
2001); Evangelou and Economou (
1992) and Evangelou (
1992) (the same transition was reported for the spectral number variance in Jackson et al. (
2001)); (iv) The onset of the GOE limit for the spectral properties occurs at
ξ≈7, see Méndez-Bermúdez et al. (
2015) and Martínez-Mendoza et al. (
2013), meaning that the spectral properties of the graph above this value coincide with those of a system with maximal disorder. Also, the first eigenvalue/eigenfunction problem was addressed in Kabashima et al. (
2010) and Kabashima and Takahashi (
2012).
For our paper, following Méndez-Bermúdez et al. (
2015) and Martínez-Mendoza et al. (
2013), we look for universal properties of the discovery algorithm.