1 Introduction
2 Prelimilary
2.1 Replication Factor and Load Balance Factor
2.2 Modularity
3 Balanced-Size Clustering Technique
3.1 Balanced-Size Modularity Clustering Phase
Symbol | Definition |
---|---|
\(\mathbb {C}\)
| Input cluster set |
k
| Specified number of output clusters |
\(\mathbb {R}\)
| Output cluster set |
\(top\_k\_clusters(\mathbb {C}, k)\)
| Top-k clusters \(\in \mathbb {C}\)
|
\(inner\_edges(c)\)
| Inner edges of cluster c
|
neighbors(c) | Adjacent clusters of cluster c
|
\(cut\_edges(n, m)\)
| Cut edges between cluster n and m
|
3.2 Cluster Merge Phase
3.3 Graph Conversion Phase
4 Experiments
4.1 Benchmark
Dataset | Short name | |V| | |E| | Modularity |
---|---|---|---|---|
email-EuAll [15] | Eu | 265,214 | 420,045 | 0.779 |
web-Stanford [15] | St | 281,903 | 2,312,497 | 0.914 |
com-DBLP [15] | DB | 317,080 | 1,049,866 | 0.806 |
web-NotreDame [15] | No | 325,729 | 1,497,134 | 0.931 |
amazon0505 [15] | am | 410,236 | 3,356,824 | 0.852 |
web-BerkStan [15] | Be | 685,230 | 7,600,595 | 0.930 |
web-Google [15] | Go | 875,713 | 5,105,039 | 0.974 |
soc-Pokec [15] | Po | 1,632,803 | 30,622,564 | 0.633 |
roadNet-CA [15] | CA | 1,965,206 | 2,766,607 | 0.992 |
wiki-Talk [15] | Ta | 2,394,385 | 5,021,410 | 0.566 |
soc-LiveJournal1 [15] | Li | 4,847,571 | 68,993,773 | 0.721 |
uk-2002 [16] | uk | 18,520,486 | 298,113,762 | 0.986 |
webbase-2001 [16] | ba | 118,142,155 | 1,019,903,190 | 0.976 |