Introduction
-
We provide a new cross-graph embedding method that catches the properties of two graphs.
-
We use the proposed method in a framework that jointly applies network alignment and link prediction.
-
We perform extensive experiments on public datasets and evaluate our method and peer’s method [12] on many aspects to show how our method surpasses the existing method.
Related work
Notations
The proposed method
Node embedding
Construct the expanded graph
Forming the similarity measure
Finite step transition matrix
Laplacian matrix
Calculating the embedding
Network alignment
Link prediction
The proposed method variations
-
EG-FST: It use the FST similarity measure.
-
EG-LAP: It use the Laplacian similarity measure.
-
EG-Mini: A customized version of EG-FST without recalculating the embedding in each iteration and without adding \(E_{sim}\) edges set. We added this method to evaluate the effect of these two factors in the overall process.
Experiments
Datasets
-
Facebook/Twitter: [29] has collected this dataset from about.me which is a platform that gathers different online social network accounts associated with the same user. Facebook/Twitter accounts have been collected and the graph map the users as nodes and friendship as edges.
-
Douban online/offline: Douban is a Chinese online social network published by [30]. They took a subgraph from it (online) and constructed another graph based on real-world relations (offline). The alignment between the two networks is the match between the real world person and the Douban user.
-
DBLP/DBLP disturbed copy: It is a co-authorship graph collected by [31] which consider the nodes as the authors and an edge links two authors if they have any common academic work. We used the same version used in [12] where researchers randomly generated another graph from it to align with the original while preserving the properties of the graph.
Datasets | # Nodes | # Edges | Interoperability |
---|---|---|---|
Facebook | 1043 | 4734 | 0.89 |
Twitter | 1043 | 4860 | |
Douban online | 1118 | 8164 | 0.76 |
Douban offline | 1118 | 1511 | |
DBLP | 2151 | 6306 | 0.94 |
DBLP disturbed copy | 2151 | 5676 |
Evaluation protocols
Datasets evaluation
-
CENALP: [12] proposed a cross-network skip-gram embedding with the same general algorithm for jointly align nodes and predict links. It is the latest and currently the most advanced method that do the two tasks jointly where all other methods just do one task or do the two tasks separately. We used the parameters as discussed in the paper (\(\alpha =5,K=3,q=0.2,c=0.5\))
-
EG-FST: Our proposed method using the FST similarity measure. We set the parameters as (\(\alpha =5,K=3,t= 0.5\))
-
EG-LAP: Our proposed method using the Laplacian similarity measure. We set the same parameters as EG-FST.
-
EG-Mini: A customized version of EG-FST without recalculating the embedding in each iteration and without adding \(E_{sim}\) edges set. We set the same parameters as EG-FST
Comparing multiple similarity measures
-
Adjacency Matrix: the default adjacency matrix which measure the distance between nodes as 1 if they are connected and 0 if they are not.
-
Laplacian Matrix: as described in "The proposed method" Section
-
Symmetric Normalized Laplacian Matrix SNL: calculated with Eq. 11 where D is the degree matrix and A is the adjacency matrix.$$\begin{aligned} SNL=D^{-\frac{1}{2}}.A.D^{-\frac{1}{2}} \end{aligned}$$(11)
-
Transition Matrix: as given in "The proposed method" Section
-
Personalized PageRank Matrix PPR: Given by Eq. 12 where I is the identity matrix and P is the transition matrix and \(\alpha\) is a hyperparameter (we used 0.1 as its value)$$\begin{aligned} PPM=\alpha (I-(1-\alpha )P)^{-1} \end{aligned}$$(12)
-
Finite Step Transition Matrix FST: as described in "The proposed method" Section.
Network size effect on time
-
CENALP/5 [12]: It is the CENALP method as described before, but since this method has a very far duration value range (because of the skip-gram training in the embedding phase of each iteration), we have displayed its time measurement divided by 5 to scale its line down.
-
CENALP-MA [12]: A variation of CENALP where they only apply the skip-gram embedding in the first iteration then used mean aggregate to recalculate the embedding in the upcoming iterations.
-
EG-GPU: The EG-FST method implemented using GPU.
-
EG-CPU: The EG-FST method implemented using CPU.
-
EG-Mini-GPU: EG-Mini implemented using GPU.
-
EG-Mini-CPU: EG-Mini implemented using CPU.
Datasets | # Nodes | # Edges |
---|---|---|
DBLP-500-1 | 500 | 1669 |
DBLP-500-2 | 500 | 1508 |
DBLP-900-1 | 900 | 3133 |
DBLP-900-2 | 900 | 2808 |
DBLP-1300-1 | 1300 | 4384 |
DBLP-1300-2 | 1300 | 3928 |
DBLP-1700-1 | 1700 | 5503 |
DBLP-1700-2 | 1700 | 4944 |
DBLP-2100-1 | 2100 | 6243 |
DBLP-2100-2 | 2100 | 5618 |