1 Introduction
2 Predicting system
2.1 Conceptual framework
2.2 Connected-triple extraction
2.2.1 Concept of connected-triple
2.2.2 Extracting connected-triples from datasets
2.2.3 Transitivity property of connected-triple
2.3 Link analysis
2.3.1 Categorical data
2.3.2 Continuous numeric data
2.4 Fuzzy inference model
2.4.1 Link weight fuzzification
2.4.2 Fuzzy inference
2.5 Illustrative link strength prediction
3 Empirical evaluation
3.1 Datasets
Dataset collection | Type | NDC | ANIED | ANVED | ANVCTD |
---|---|---|---|---|---|
Bank | C | 7 | 6459 | 4 | 2 |
Mushroom | C | 9 | 912 | 5 | 2 |
Salary | C | 6 | 5028 | 6 | 1 |
Student-Por | C | 4 | 163 | 8 | 3 |
Student-Mat | C | 4 | 101 | 8 | 2 |
Connect-4 | C | 6 | 11034 | 7 | 2 |
Wine | N | 4 | 1256 | 3 | 1 |
Twitter | N | 7 | 82038 | 13 | 4 |
Facebook | N | 5 | 104 | 5 | 2 |
Urban | N | 13 | 203 | 12 | 3 |
News | N | 12 | 3048 | 7 | 2 |
Music | N | 11 | 211 | 11 | 4 |
Synthetic-1 | C | 22 | 15491 | 18 | 4 |
Synthetic-2 | C | 30 | 24823 | 20 | 3 |
Synthetic-3 | N | 25 | 21990 | 20 | 2 |
Synthetic-4 | N | 35 | 19926 | 18 | 4 |
3.2 Methods for comparison
3.2.1 Neighbour-based metrics
-
Weighted Jaccard Coefficient (WJC) This metric assesses and normalises the weights of the common neighbours of a given pair of variables (Dimitriadou et al. 2004). It assigns higher values for pairs of variables which share a higher sum of weights over common neighbours relative to the total weights of all the neighbours they have. For two distinct variables \(v_A\) and \(v_B\), this measure is defined bywhere \(\Gamma (v_A)\) and \(\Gamma (v_B)\) denote the adjacent neighbourhood of \(v_A\) and that of \(v_B\), respectively.$$\begin{aligned} \mathrm{WJC}(v_A, v_B) = \frac{\sum \nolimits _{ v_C \in \Gamma (v_A) \bigcap \Gamma (v_B)}{w(v_A, v_C) + w(v_B, v_C)}}{\sum \nolimits _{v_D \in \Gamma (v_A) \bigcup \Gamma (v_B)}{w(v_A, v_D) + w(v_B, v_D)}} \end{aligned}$$
-
Weighted Resource Allocation (WRA) This metric is motivated by the physical process of resource allocation (Zhou et al. 2009). Different from WJC, WRA does not only involve adjacent neighbours, but also exploit the neighbours of the direct neighbours of a pair of variables. Formally, WRA is defined as:where \(s(v_C)\) denoted the sum of weights for the variable \(v_C\) associated with all of its existing links, such that$$\begin{aligned} \mathrm{WRA}(v_A, v_B) = \sum _{v_C \in \Gamma (v_A) \bigcap \Gamma (v_B)}{\dfrac{w(v_A, v_C) + w(v_C, v_B)}{s(v_C)}} \end{aligned}$$$$\begin{aligned} s(v_C) = \sum _{v_X \in \Gamma (v_C)}{w(v_C,v_X)} \end{aligned}$$
3.2.2 Path-based metrics
-
Local Weighted Path (LWP) This metric is an extended version (Thi et al. 2014) of the local path (LP) method (Lü et al. 2009); it is also a special case of the well-known Katz algorithm (Katz 1953). Unlike the metrics that only use the information of the nearest neighbours (be they adjacent or otherwise), LWP makes use of further information from local paths with a length value of 2 and 3. Let A denote the weighted adjacent matrix of all variables under discussion, and \(A^2\) and \(A^3\) represent the weighted adjacent matrices based on A with a length of 2 and 3, respectively, LWP is then defined as follows:where \(\alpha \) is a small number close to 0, which is being used to penalise the weight of the paths with greater length. In the experiment, \(\alpha \) is set to 0.01 (as with the default value typically used when running this metric).$$\begin{aligned} \mathrm{LWP} = A^2 + \alpha A^3 \end{aligned}$$
-
Relation Strength Similarity (RSS) This metric was originally introduced as an asymmetric measure for weighted social networks (Chen et al. 2012). It may also be adopted as a symmetric measure for the problem considered herein. For the present study, suppose that there are T simple paths shorter than a path length of e from the variable \(v_A\) to \(v_B\), and a path with length of u (\(u \le e\)) from \(v_A\) and \(v_B\) is formed with Z variables \(v_1\), \(v_2\),..., \(v_{Z-1}\), \(v_Z\), where \(v_1\) represents \(v_A\) and \(v_Z\) represents \(v_B\). Then, the RSS metric from \(v_A\) to \(v_B\) is defined bywith$$\begin{aligned} \mathrm{RSS}(v_A, v_B) = \sum _{t=1}^{T}{R_{u}^{*}}(v_A, v_B) \end{aligned}$$$$\begin{aligned} R_{u}^{*}(v_A, v_B) = {\left\{ \begin{array}{ll} \prod \limits _{z=1}^{Z-1}{R(v_z, v_{z+1})} &{} Z \le e + 1 \\ 0 &{} \mathrm{otherwise} \end{array}\right. } \end{aligned}$$
3.2.3 Random walk-based metric
-
SimRank (SR) The well-known SimRank algorithm (Jeh and Widom 2002) was proposed on the basis of the intuition that two nodes within a graph are similar if they are connected to similar nodes in the graph. This can obviously be adapted for use in the present study. For a pair of variables \(v_A\) and \(v_B\), their SR score is computed by$$\begin{aligned} \mathrm{SR}(v_A, v_B) = \dfrac{\gamma \sum \limits _{v_C \in \Gamma (v_A)} \sum \limits _{v_D \in \Gamma (v_B)} {SR(v_C, v_D)}}{|\Gamma (v_A)||\Gamma (v_B)|} \end{aligned}$$
3.3 Experimental setup
3.4 Experimental results
3.4.1 Experimental results for real data
3.4.2 Experimental results for synthetic data
3.5 Complexity analysis
Method | Time complexity |
---|---|
FCT |
\(O(k^2p^2q^2lf)\)
|
WJC |
\(O(k^2p^2q^2)\)
|
WRA |
\(O(k^2p^2q^2)\)
|
LWP |
\(O(k^3p^3)\)
|
RSS |
\(O(k^2p^2q^{e+1})\)
|
SR |
\(O(k^2p^2q^2r)\)
|