1 Introduction
2 Problem definition
3 Experimental settings
3.1 Datasets description
3.1.1 The Skype dataset
3.1.2 The LastFM dataset
3.1.3 The Google+ dataset
3.2 Community detection algorithms
3.3 Community feature extraction
Structural features
| |||
N
| Number of nodes |
M
| Number of edges |
D
| Density |
CC
| Global clustering |
\(CC_\mathrm{avg}\)
| Average clustering |
\(A_\mathrm{deg}\)
| Degree assortativity |
\(\hbox {deg}^C_\mathrm{max}\)
| Max degree (community links) |
\(\hbox {deg}^C_\mathrm{avg}\)
| Avg degree (community links) |
\(\hbox {deg}^\mathrm{all}_\mathrm{max}\)
| Max degree (all links) |
\(\hbox {deg}^\mathrm{all}_\mathrm{avg}\)
| Avg degree (all links) |
T
| Closed triads |
\(T_\mathrm{open}\)
| Open triads |
\(O_v\)
| Neighborhood nodes |
\(O_e\)
| Outgoing edges |
\(E_\mathrm{dist}\)
| Num. edges with distance |
d
| Approx. diameter |
r
| Approx. radius |
g
| Conductance |
Community formation features
| |
\(T_f\)
| First user arrival time |
\(IT_\mathrm{avg}\)
| Avg user inter-arrival time |
\(IT_\mathrm{std}\)
| Std of user inter-arrival time |
\(IT_{l,f}\)
| Last–first inter-arrival time |
Geographical features
| |
\(N_{s}\)
| Number of countries |
\(E_{s}\)
| Country entropy |
\(S_\mathrm{max}\)
| Percentage of most represented country |
\(N_{t}\)
| Number of cities |
\(E_{t}\)
| City entropy |
\(\hbox {dist}_\mathrm{avg}\)
| Avg geographical distance |
\(\hbox {dist}_\mathrm{max}\)
| Max geographical distance |
4 Analytical results
4.1 Skype: user engagement
4.1.1 Balanced scenario
Algorithm | Lv. | Scores |
---|---|---|
Video: AUC and accuracy
| ||
HDemon25
| 1 |
.74 (.67) |
HDemon50
| 0 | .71 (.68) |
Louvain
| 0 | .65 (.60) |
Louvain
| 6 | .63 (.59) |
Ego-Nets
| – | .70 (.64) |
BFS
| – | .67 (.62) |
Chat: AUC and accuracy
| ||
HDemon25
| 2 |
.84 (.77) |
HDemon50
| 1 | .81 (.73) |
Louvain
| 0 | .69 (.64) |
Louvain
| 6 | .65 (.60) |
Ego-Nets
| – | .75 (.75) |
BFS
| – | .81 (.72) |
sklearn
Python library.5 We execute 5 iterations, performing data shuffling before each one of them, imposing the elastic-net penalty \(\alpha =0.0001\) and l1-ratio = 0.05. The adoption of elastic-net penalty results in some feature weights set to zero, thus eliminating less important features.4.1.2 Unbalanced scenario
Algorithm | Lv. | Scores |
---|---|---|
Video: AUC and accuracy
| ||
HDemon25
| 1 |
.76 (.68) |
HDemon50
| 0 | .73 (.65) |
Louvain
| 0 | .64 (.59) |
Louvain
| 6 | .61 (.58) |
Ego-Nets
| – | .71 (.63) |
BFS
| – | .68 (.61) |
Baseline | – | .75 |
Chat: AUC and accuracy
| ||
HDemon25
| 2 | .82 (.78) |
HDemon50
| 3 | .80 (.76) |
Louvain
| 0 | .68 (.70) |
Louvain
| 6 | .67 (.66) |
Ego-Nets
| – |
.83 (.79) |
BFS
| – | .82 (.77) |
Baseline | – | .75 |
Algorithm | Lv. | Scores |
---|---|---|
Video: precision–recall
| ||
HDemon25
| 2 |
.42 (.72) |
HDemon50
| 1 | .39 (.70) |
Louvain
| 0 | .33 (.69) |
Louvain
| 6 | .33 (.67) |
Ego-Nets
| – | .37 (.68) |
BFS
| – | .35 (.71) |
Baseline | – | .25 |
Chat: precision–recall
| ||
HDemon25
| 2 | .54 (.69) |
HDemon50
| 3 | .50 (.67) |
Louvain
| 0 | .40 (.41) |
Louvain
| 6 | .44 (.33) |
Ego-Nets
| – |
.57 (.68) |
BFS
| – | .52 (.71) |
Baseline | – | .25 |
4.1.3 Skype community characterization
4.2 LastFM: service engagement
4.2.1 Balanced scenario
Algorithm | Scores | Classifier |
---|---|---|
LastFM: AUC and accuracy
| ||
Demon
| .59 (.63) | Logistic regression |
Louvain
|
.71 (.72) | Decision tree |
Ego-Nets
| .55 (.57) | Logistic regression |
4.2.2 Unbalanced scenario
Algorithm | Scores | Classifier |
---|---|---|
LastFM: AUC and accuracy
| ||
Demon
|
.60 (.78) | Logistic regression |
Louvain
| .55 (.36) | Logistic regression |
Ego-Nets
| .55 (.83) | Random forest |
Baseline | .25 (.25) | – |
Algorithm | Scores | Classifier |
---|---|---|
LastFM: precision–recall
| ||
Demon
| .78 (.03) | Logistic regression |
Louvain
| .33 (.30) | Decision tree |
Ego-Nets
|
.83 (.004) | Random forest |
Baseline | .25 (1.0) | – |
4.3 Google+: community homogeneity
4.3.1 Balanced scenario
Algorithm | Scores | Classifier |
---|---|---|
Google+: AUC and accuracy
| ||
Demon
| .67 (.71) | SGD |
Louvain
|
.74 (.84) | SGD |
Ego-Nets
| .61 (.65) | SGD |
Baseline | .50 (.50) | – |
4.3.2 Unbalanced scenario
Algorithm | Scores | Classifier |
---|---|---|
Google+: AUC and accuracy
| ||
Demon
|
.69 (.70) | Decision tree |
Louvain
| .61 (.50) | Decision tree |
Ego-Nets
| .63 (.50) | Decision tree |
Baseline | .75 | – |
Algorithm | Scores | Classifier |
---|---|---|
Google+: precision–recall
| ||
Demon
|
.70 (.22) | Decision tree |
Louvain
| .50 (.03) | Decision tree |
Ego-Nets
| .50 (.04) | Decision tree |
Baseline | .25 | – |
5 Discussion
-
The social network semantic (i.e., which kind of relation is defined by edges? Are the links among nodes viable proxy for real social connections?);
-
The nature of the target features.