1 Introduction
2 Related Work
3 Methodology
3.1 Problem Formulation
3.2 Objective Formulation
3.2.1 IDRN Classifier
3.2.2 Multi-label Classification
3.2.3 Community Prior
3.3 Efficiency
4 Experiments
4.1 Networks with Known Communities
4.1.1 Metrics on Comparing Community Partitions
4.1.2 Networks with Community Meta-data
-
karate: The Karate club network is a well-known network of friendships between 34 members in an American University [35]. After a dispute between the coach and the treasurer, the network is further split into two communities. There are 34 nodes and 78 edges in this network.
-
dolphin: The dolphin social network is an undirected social network of frequent associations between 62 dolphins in a community living off Doubtful Sound, New Zealand [36]. The dolphin network splits into two communities as a result of the departure of a key individual. The links between nodes are established by observation of statistically significant frequent associations. The network contains 62 nodes and 159 edges.
-
football: This is a collage football network which represents the schedule of games between American college football teams in the 2000 season [37]. Nodes in the network represent teams, and edges represent regular-season games between teams. There are 115 nodes and 613 edges in the network. This network is partitioned into 12 conferences.
-
polbook: This is a network of books about US politics published around the time of the 2004 presidential election and sold by the online bookseller Amazon.com [38]. Edges between books represent frequent co-purchasing of books by the same buyers. The community labels are ‘liberal,’ ‘neutral’ and ‘conservative.’ There are 105 nodes and 441 edges in the network.
-
polblog: This is a network of hyperlinks between weblogs on US politics, recorded in 2005 by Adamic and Glance [39]. Community labels are ‘liberal’ and ‘conservative’ which are assigned by blog directories or occasional self-evaluation. There are 1490 nodes and 16,715 edges in the network.
4.1.3 Experiments on Networks
Network | Metrics (%) | Algorithms | |||||
---|---|---|---|---|---|---|---|
IDRN | GN | CNM | Louvain | MMO | LPA | ||
Karate | NMI | 87.62 | 57.98 | 69.25 | 68.73 | 56.12 | 46.58 |
Rand | 95.44 | 76.92 | 85.90 | 85.90 | 74.36 | 85.90 | |
Jaccard | 91.58 | 74.29 | 84.06 | 83.82 | 70.59 | 84.72 | |
Dolphin | NMI | 79.88 | 55.42 | 62.08 | 51.62 | 40.63 | 51.42 |
Rand | 92.72 | 82.39 | 83.23 | 76.73 | 69.18 | 79.25 | |
Jaccard | 88.40 | 81.82 | 83.23 | 76.13 | 68.59 | 79.11 | |
Football | NMI | 83.67 | 87.89 | 76.24 | 85.61 | 91.11 | 82.56 |
Rand | 95.47 | 92.01 | 84.18 | 90.38 | 92.33 | 90.05 | |
Jaccard | 62.25 | 88.84 | 79.49 | 86.89 | 88.92 | 86.50 | |
Polbook | NMI | 59.87 | 55.85 | 53.08 | 53.69 | 40.63 | 56.40 |
Rand | 85.75 | 83.67 | 82.77 | 83.22 | 57.60 | 84.35 | |
Jaccard | 69.92 | 82.90 | 82.16 | 82.38 | 55.16 | 83.61 | |
Polblog | NMI | 53.30 | 30.34 | 37.99 | 37.55 | 34.77 | 39.09 |
Rand | 80.36 | 93.70 | 95.33 | 95.12 | 93.47 | 95.66 | |
Jaccard | 67.63 | 93.28 | 95.02 | 94.79 | 93.03 | 95.37 |
4.2 Datasets for Classification
Dataset | #Nodes | #Edges | #Classes | Average category |
\(\frac{{\#\hbox {Edges}}}{{\#\hbox {Nodes}}}\)
|
---|---|---|---|---|---|
Amazon | 83,742 | 190,097 | 30 | 1.546 | 2.270 |
CoRA | 24,519 | 92,207 | 10 | 1.004 | 3.782 |
IMDb | 19,359 | 362,079 | 21 | 2.301 | 18.703 |
PubMed | 19,717 | 44,324 | 3 | 1.000 | 2.248 |
Wikipedia | 35,633 | 495,388 | 16 | 1.312 | 13.903 |
Youtube | 22,693 | 96,361 | 47 | 1.707 | 4.246 |
Blogcatalog | 10,312 | 333,983 | 39 | 1.404 | 32.387 |
Flickr | 80,513 | 5,899,882 | 195 | 1.338 | 73.278 |
PPI | 3890 | 37,845 | 50 | 1.707 | 9.804 |
POS | 4777 | 92,295 | 40 | 1.417 | 19.320 |
CiteSeer | 3312 | 4536 | 6 | 1.000 | 1.369 |
Snow2014 | 9489 | 22,309 | 90 | 2.351 | 2.538 |
WebKB | 877 | 1388 | 5 | 1.000 | 1.582 |
SocialSpam | 5,275,125 | 49,6691,571 | 2 | 1.000 | 94.157 |
\(\text {Snow2014}_{\mathrm{all}}\)
| 533,874 | 942,226 | 90 | 2.534 | 1.764 |
\(\text {Youtube}_{\mathrm{all}}\)
| 1,138,499 | 2,990,443 | 47 | 1.000 | 2.626 |
4.3 Classification Evaluation Metrics
4.3.1 Baseline Methods
-
WvRN [4]: The Weighted-vote Relational Neighbor is a simple but surprisingly good relational classifier. Given the neighbors \({\mathcal {N}}_i\) of node i, the WvRN estimates i’s classification probability P(y|i) of class label y with the weighted mean of its neighbors as mentioned above. As WvRN algorithm is not very complex, we implement it in Java programming language by ourselves.
-
SocioDim [14]: This method is based on the SocioDim framework which generates a representation in d dimension space from the top-d eigenvectors of the modularity matrix of the network, and the eigenvectors encode the information about the community partitions of the network. The implementation of SocioDim in Matlab is available on the author’s website.10 As the authors preferred in their study, we set the number of social dimensions as 500.
-
DeepWalk [17]: DeepWalk generalizes recent advancements in language modeling from sequences of words to nodes [44]. It uses local information obtained from truncated random walks to learn latent dense representations by treating random walks as the equivalent of sentences. The implementation of DeepWalk in Python has already been published by the authors.11
-
LINE [2]: LINE algorithm proposes an approach to embed networks into low-dimensional vector spaces by preserving both the first-order and second-order proximities in networks. The implementation of LINE in C++ has already been published by the authors.12 To enhance the performance of this algorithm, we set embedding dimensions as 256 (i.e., 128 dimensions for the first-order proximities and 128 dimensions for the second-order proximities) in LINE algorithm as preferred in its implementation.
-
node2vec [19]: It also takes a similar approach with DeepWalk which generalizes recent advancements in language modeling from sequences of words to nodes. With a flexible neighborhood sampling strategy, node2vec learns a mapping of nodes to a low-dimensional feature space that maximizes the likelihood of preserving network neighborhoods of nodes. The implementation of node2vec in Python is available on the authors’ website.14
4.4 Different Community Prior
Metric | Network | IDRN |
\({\hbox {IDRN}}_{\mathrm{Louvain}}\)
|
\({\hbox {IDRN}}_{\mathrm{CNM}}\)
|
\({\hbox {IDRN}}_{\mathrm{MMO}}\)
|
\({\hbox {IDRN}}_{\mathrm{LPA}}\)
|
---|---|---|---|---|---|---|
Hamming score (%) | Amazon | 56.20 | 63.86 | 63.54 | 56.09 | 58.65 |
Youtube | 37.17 | 39.91 | 39.94 | 38.90 | 39.58 | |
CoRA | 71.71 | 73.58 | 72.33 | 71.77 | 72.33 | |
IMDb | 20.33 | 20.79 | 21.20 | 20.97 | 21.00 | |
PubMed | 73.76 | 78.02 | 78.07 | 74.57 | 76.14 | |
Wikipedia | 71.67 | 71.12 | 70.54 | 72.02 | 72.02 | |
Flickr | 28.44 | 29.11 | 29.62 | 29.66 | 29.71 | |
Blogcatalog | 26.77 | 26.29 | 26.39 | 26.13 | 26.73 | |
PPI | 11.39 | 12.34 | 11.77 | 12.06 | 12.22 | |
POS | 39.09 | 39.94 | 39.62 | 39.14 | 39.12 | |
CiteSeer | 43.55 | 55.76 | 54.42 | 46.22 | 49.21 | |
Snow2014 | 24.97 | 27.50 | 26.80 | 25.37 | 25.95 | |
WebKB | 47.17 | 44.25 | 46.01 | 48.37 | 44.01 | |
Micro-\(F_1\) (%) | Amazon | 56.86 | 64.62 | 64.32 | 57.07 | 59.64 |
Youtube | 43.08 | 45.20 | 45.20 | 44.27 | 45.03 | |
CoRA | 71.72 | 73.59 | 72.34 | 71.78 | 72.34 | |
IMDb | 29.63 | 29.81 | 30.29 | 29.94 | 29.98 | |
PubMed | 73.76 | 78.02 | 78.07 | 74.57 | 76.14 | |
Wikipedia | 73.21 | 72.79 | 72.18 | 73.57 | 73.51 | |
Flickr | 31.99 | 32.56 | 33.09 | 33.03 | 33.09 | |
Blogcatalog | 29.05 | 28.81 | 28.80 | 28.39 | 29.06 | |
PPI | 15.82 | 17.15 | 16.21 | 16.64 | 16.74 | |
POS | 42.36 | 43.71 | 43.29 | 42.48 | 42.56 | |
CiteSeer | 43.55 | 55.76 | 54.42 | 46.22 | 49.21 | |
Snow2014 | 28.87 | 30.55 | 30.24 | 29.09 | 28.94 | |
WebKB | 47.17 | 44.25 | 46.01 | 48.37 | 44.01 | |
Macro-\(F_1\) (%) | Amazon | 53.48 | 61.00 | 61.03 | 53.91 | 57.57 |
Youtube | 34.48 | 37.85 | 36.84 | 35.06 | 37.28 | |
CoRA | 64.57 | 66.36 | 64.60 | 64.54 | 65.80 | |
IMDb | 19.96 | 20.57 | 20.57 | 20.14 | 20.43 | |
PubMed | 72.04 | 76.79 | 76.89 | 72.86 | 74.64 | |
Wikipedia | 64.84 | 65.46 | 64.40 | 65.29 | 64.64 | |
Flickr | 14.85 | 14.80 | 15.17 | 14.91 | 15.68 | |
Blogcatalog | 11.39 | 12.08 | 11.74 | 11.32 | 11.56 | |
PPI | 10.93 | 12.96 | 12.33 | 12.53 | 11.88 | |
POS | 5.88 | 6.68 | 6.65 | 5.87 | 6.12 | |
CiteSeer | 40.36 | 52.05 | 50.43 | 43.08 | 46.32 | |
Snow2014 | 14.00 | 17.82 | 17.51 | 14.42 | 17.02 | |
WebKB | 26.38 | 26.01 | 26.76 | 25.44 | 25.74 |
4.5 Performance of Classifiers
Metric | Network | WvRN | SocioDim | DeepWalk | LINE | SNBC | node2vec | IDRN |
\({{\hbox {IDRN}}}_{\mathrm{c}}\)
|
---|---|---|---|---|---|---|---|---|---|
Hamming score (%) | Amazon | 33.76 | 38.36 | 31.79 | 40.55 | 59.00 | 49.18 | 56.20 | 63.86 |
Youtube | 22.82 | 31.94 | 36.63 | 33.90 | 35.06 | 33.86 | 37.17 | 39.91 | |
CoRA | 55.83 | 63.02 | 71.37 | 65.50 | 66.75 | 72.66 | 71.71 | 73.58 | |
IMDb | 33.59 | 22.21 | 33.12 | 30.39 | 30.18 | 32.97 | 20.33 | 20.79 | |
PubMed | 50.32 | 65.68 | 77.40 | 68.31 | 79.22 | 79.02 | 73.76 | 78.02 | |
Wikipedia | 45.10 | 65.29 | 71.10 | 68.812 | 68.78 | 70.69 | 71.67 | 71.121 | |
Flickr | 21.37 | 29.67 | 28.73 | 30.96 | 24.20 | 30.65 | 28.44 | 29.11 | |
Blogcatalog | 17.89 | 27.04 | 25.63 | 25.32 | 22.40 | 27.46 | 26.77 | 26.29 | |
PPI | 6.28 | 8.61 | 8.14 | 9.27 | 7.97 | 8.88 | 11.39 | 12.34 | |
POS | 23.05 | 21.06 | 31.40 | 38.24 | 37.73 | 34.59 | 39.09 | 39.94 | |
CiteSeer | 31.89 | 20.04 | 23.80 | 22.79 | 54.89 | 50.99 | 43.44 | 55.76 | |
Snow2014 | 15.71 | 9.39 | 8.71 | 12.97 | 22.57 | 20.00 | 24.91 | 27.50 | |
WebKB | 42.67 | 28.06 | 27.58 | 45.57 | 47.97 | 35.79 | 48.34 | 44.25 | |
SocialSpam | 11.29 | – | – | – | – | – | 98.89 | 98.91 | |
Micro-\(F_1\) (%) | Amazon | 34.86 | 39.62 | 33.06 | 42.42 | 59.79 | 50.55 | 56.86 | 64.62 |
Youtube | 27.81 | 36.40 | 40.73 | 38.01 | 39.67 | 38.35 | 43.08 | 45.20 | |
CoRA | 55.85 | 63.00 | 71.36 | 65.47 | 66.78 | 72.66 | 71.72 | 73.59 | |
IMDb | 42.62 | 29.99 | 41.82 | 39.89 | 39.53 | 42.36 | 29.63 | 29.81 | |
PubMed | 50.32 | 65.68 | 77.40 | 68.31 | 79.22 | 79.02 | 73.76 | 78.02 | |
Wikipedia | 48.51 | 66.95 | 72.19 | 70.21 | 70.68 | 72.07 | 73.21 | 72.79 | |
Flickr | 25.40 | 32.91 | 31.66 | 34.03 | 27.60 | 33.76 | 31.99 | 32.56 | |
Blogcatalog | 20.50 | 28.86 | 27.29 | 27.45 | 24.66 | 29.41 | 29.05 | 28.81 | |
PPI | 18.41 | 12.29 | 11.52 | 13.16 | 11.32 | 12.80 | 15.82 | 17.15 | |
POS | 26.04 | 24.42 | 35.98 | 42.70 | 41.99 | 39.09 | 42.36 | 43.71 | |
CiteSeer | 31.89 | 20.04 | 23.80 | 22.79 | 54.89 | 50.99 | 43.44 | 55.76 | |
Snow2014 | 20.43 | 15.23 | 14.50 | 17.26 | 23.36 | 24.55 | 29.02 | 30.55 | |
WebKB | 42.67 | 28.06 | 27.58 | 45.57 | 47.97 | 35.79 | 48.34 | 44.25 | |
SocialSpam | 11.29 | – | – | – | – | – | 98.89 | 98.91 | |
Macro-\(F_1\) (%) | Amazon | 32.00 | 35.95 | 21.64 | 37.52 | 56.84 | 45.85 | 53.48 | 61.00 |
Youtube | 18.17 | 34.19 | 33.92 | 33.47 | 32.07 | 32.60 | 34.48 | 37.85 | |
CoRA | 43.16 | 56.82 | 62.68 | 59.07 | 55.68 | 64.79 | 64.57 | 66.36 | |
IMDb | 18.89 | 18.77 | 18.22 | 18.83 | 17.45 | 18.46 | 19.96 | 20.57 | |
PubMed | 41.57 | 64.85 | 75.92 | 66.66 | 77.16 | 77.50 | 72.04 | 76.79 | |
Wikipedia | 45.58 | 58.93 | 62.29 | 62.17 | 61.99 | 64.90 | 64.84 | 65.46 | |
Flickr | 15.54 | 18.28 | 17.13 | 21.80 | 7.36 | 18.46 | 14.85 | 14.80 | |
Blogcatalog | 11.47 | 18.88 | 14.65 | 15.52 | 8.29 | 17.16 | 11.39 | 12.08 | |
PPI | 7.35 | 10.59 | 9.61 | 10.82 | 8.27 | 11.27 | 10.93 | 12.96 | |
POS | 3.91 | 6.05 | 8.26 | 8.93 | 5.92 | 8.61 | 5.58 | 6.68 | |
CiteSeer | 26.13 | 18.27 | 21.84 | 20.26 | 51.17 | 46.39 | 40.50 | 52.05 | |
Snow2014 | 9.67 | 4.41 | 4.53 | 8.84 | 12.57 | 14.03 | 14.06 | 17.82 | |
WebKB | 15.86 | 20.19 | 20.54 | 24.53 | 25.53 | 20.05 | 26.31 | 26.01 | |
SocialSpam | 10.68 | – | – | – | – | – | 98.75 | 98.77 |
Dataset | Wikipedia |
\({\hbox {Snow2014}}_{\mathrm{all}}\)
|
\({\hbox {Youtube}}_{\mathrm{all}}\)
| Flickr | Spammer |
---|---|---|---|---|---|
IDRN | 2 | 3 | 14 | 518 | 2937 |
\(\text {IDRN}_\mathrm{c}\)
| 2 | 5 | 9 | 482 | 4145 |
4.6 Performance of Classifiers in Partially Labeled Networks
Metric | Network | WvRN | DeepWalk | LINE | node2vec | IDRN |
\({\hbox {IDRN}}_{\mathrm{c}}\)
|
---|---|---|---|---|---|---|---|
Hamming score (%) |
\(\text {Youtube}_{\mathrm{all}}\)
| 24.05 | 10.32 | 27.51 | 33.21 | 39.16 | 35.02 |
\(\text {Snow2014}_{\mathrm{all}}\)
| 15.78 | 7.24 | 21.76 | 20.64 | 22.49 | 24.76 | |
Micro-\(F_1\) (%) |
\(\text {Youtube}_{\mathrm{all}}\)
| 37.42 | 15.18 | 32.95 | 38.38 | 60.46 | 41.05 |
\(\text {Snow2014}_{\mathrm{all}}\)
| 20.53 | 12.06 | 25.62 | 26.45 | 25.63 | 27.23 | |
Macro-\(F_1\) (%) |
\(\text {Youtube}_{\mathrm{all}}\)
| 30.94 | 8.79 | 25.90 | 32.34 | 60.38 | 33.95 |
\(\text {Snow2014}_{\mathrm{all}}\)
| 8.67 | 4.37 | 15.97 | 16.05 | 14.73 | 18.69 |