Introduction
Related work
Semi-supervised learning in community finding
Noisy constraints in clustering and community finding
Methods
Overview
Outlier detection methods
Process for identifying noisy constraints
AC-SLPA with noise identification
Evaluation
Datasets
Parameter | Description | Value | Parameter | Description | Value |
---|---|---|---|---|---|
N | Number of nodes | 1000–5000 | \(t_1\) | Degree exponent | 2 |
k | Average degree | 10 | \(t_2\) | Community exponent | 1 |
\(K_{max}\) | Max degree | 50 | \(\mu\) | Mixing parameter | 0.1–0.3 |
\(C_{min}\) | Min community size | 10/20 | \(O_n\) | Num. overlapping nodes | 10%/50% |
\(C_{max}\) | Max community size | 50/100 | \(O_m\) | Communities per node | 1–8 |
Real-world Networks | Amazon | YouTube | DBLP |
---|---|---|---|
#Nodes-# Edges-#Communities | 7411-21214-876 | 6426-23226-1058 | 7233-33045-613 |
Average degree | 5 | 7 | 9 |
Maximum community size | 27 | 31 | 38 |
Minimum community size | 5 | 5 | 10 |
Average community size | 10 | 7 | 12 |
Maximum communities per node §(\(O_m\)) | 4 | 11 | 5 |
Number of overlapping nodes (\(O_n\)) | 1394 (18%) | 865 (13%) | 214 (3.3%) |
Clustering coefficient | 0.74 | 0.33 | 0.90 |
Experiment 1: Comparing Outlier Detection Models
Evaluating outlier detection methods
Architecture | Nodes per layer | Small networks | Large networks | ||
---|---|---|---|---|---|
Epochs | Learning rate | Epochs | Learning rate | ||
AE1 | dim:(7,3,7) | 100 | 0.01 | 30 | 0.001 |
AE1_L1 | dim:(7,7,7) | 100 | 0.01 | 30 | 0.001 |
AE2 | dim:(7,5,3,5,7) | 100 | 0.01 | 30 | 0.001 |
AE2_L1 | dim:(7,7,7,7,7) | 100 | 0.01 | 30 | 0.001 |
AE3 | dim:(7,6,5,3,5,6,7) | 100 | 0.01 | 30 | 0.001 |
AE3_L1 | dim:(7,7,7,7,7,7,7) | 100 | 0.01 | 30 | 0.001 |
Architecture | Small Networks | Large Networks | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Must-link constraints | Cannot-link constraints | Must-link constraints | Cannot-link constraints | |||||||||
Overall | \(O_n=10\)% | \(O_n=50\)% | Overall | \(O_n=10\)% | \(O_n=50\)% | Overall | On=10% | \(O_n=50\)% | Overall | \(O_n=10\)% | \(O_n=50\)% | |
AE1 | 0.634 | 0.651 | 0.616 | 0.779 | 0.820 | 0.739 | 0.456 | 0.452 | 0.460 | 0.844 | 0.906 | 0.782 |
AE2 | 0.625 | 0.652 | 0.597 | 0.785 | 0.828 | 0.742 | 0.442 | 0.443 | 0.442 | 0.849 | 0.906 | 0.791 |
AE3 | 0.624 | 0.667 | 0.580 | 0.796 | 0.843 | 0.748 | 0.422 | 0.416 | 0.429 | 0.849 | 0.905 | 0.794 |
AE1_l | 0.659 | 0.705 | 0.614 | 0.737 | 0.777 | 0.697 | 0.453 | 0.445 | 0.461 | 0.789 | 0.842 | 0.735 |
AE2_l | 0.657 | 0.718 | 0.597 | 0.776 | 0.812 | 0.739 | 0.470 | 0.444 | 0.496 | 0.780 | 0.847 | 0.714 |
AE3_l | 0.654 | 0.773 | 0.534 | 0.835 | 0.877 | 0.793 | 0.424 | 0.407 | 0.440 | 0.879 | 0.919 | 0.839 |
All AE | 0.642 | 0.694 | 0.590 | 0.785 | 0.826 | 0.743 | 0.444 | 0.434 | 0.45 | 0.832 | 0.887 | 0.775 |
Architecture | Small networks | Large networks | ||
---|---|---|---|---|
Must-link constraints | Cannot-link constraints | Must-link constraints | Cannot-link constraints | |
AE1 | 3.0 | 4.5 | 2.5 | 4.0 |
AE1_L1 | 1.0 | 6.0 | 2.5 | 5.0 |
AE2 | 5.0 | 3.0 | 4.0 | 2.0 |
AE2_L1 | 2.0 | 4.5 | 1.0 | 6.0 |
AE3 | 6.0 | 2.0 | 6.0 | 3.0 |
AE3_L1 | 4.0 | 1.0 | 5.0 | 1.0 |
Outlier methods | Small networks | Large networks | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Must-link constraints | Cannot-link constraints | Must-link constraints | Cannot-link constraints | |||||||||
Overall | \(O_n=10\)% | \(O_n=50\)% | Overall | \(O_n=10\)% | \(O_n=50\)% | Overall | \(O_n=10\)% | \(O_n=50\)% | Overall | \(O_n=10\)% | \(O_n=50\)% | |
IF | 0.623 | 0.720 | 0.526 | 0.868 | 0.915 | 0.820 | 0.213 | 0.163 | 0.262 | 0.932 | 0.968 | 0.896 |
SVM | 0.691 | 0.751 | 0.631 | 0.765 | 0.827 | 0.702 | 0.554 | 0.572 | 0.536 | 0.886 | 0.917 | 0.854 |
LOF | 0.621 | 0.711 | 0.532 | 0.670 | 0.699 | 0.641 | 0.480 | 0.478 | 0.482 | 0.596 | 0.628 | 0.564 |
Dataset | Must-link constraints | Cannot-link constraints | ||||
---|---|---|---|---|---|---|
Small networks | IF | SVM | LOF | IF | SVM | LOF |
2.3(2) | 1.3(1) | 2.4(3) | 1.0(1) | 2.0(2) | 3.0(3) | |
Large networks | IF | SVM | LOF | IF | SVM | LOF |
3.0(3) | 1.1(1) | 1.9(2) | 1.0(1) | 2.0(2) | 3.0(3) | |
Average Rank | 2.5 | 1 | 2.5 | 1 | 2 | 3 |
Evaluating autoencoders for deep embeddings
Architecture | Small networks | Large networks | ||
---|---|---|---|---|
Encoder + SVM | Encoder + IF | Encoder + SVM | Encoder + IF | |
AE1 | 3 | 2 | 2 | 3 |
AE1_L1 | 4 | 4 | 4 | 4 |
AE2 | 2 | 3 | 1 | 2 |
AE2_L1 | 5.5 | 5.5 | 5.5 | 5.5 |
AE3 | 1 | 1 | 3 | 1 |
AE3_L1 | 5.5 | 5.5 | 5.5 | 5.5 |
Experiment 2: Evaluation of noise removal methods
Cleaning models | Must-link constraints | Cannot-link constraints |
---|---|---|
Small networks | ||
Hybrid (Autoencoder–Encoder Func.+IF) | AE1_L1 | AE3+IF |
Autoencoders(AE) | AE1_L1 | AE3_L1 |
Encoder Func. + Outlier detection (SVM-IF) | AE3+SVM | AE3+IF |
Outlier detection only (SVM-IF) | SVM | IF |
Large networks | ||
Hybrid (Autoencoder–Encoder Func.+IF) | AE2_L1 | AE3+IF |
Autoencoders(AE) | AE2_L1 | AE3_L1 |
Encoder Func. + Outlier detection (SVM-IF) | AE2+SVM | AE3+IF |
Outlier detection only (SVM-IF) | SVM | IF |
Cleaning process | Network parameters | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
\(\mu\) | Comm. Size | \(O_n\) | \(O_m\) | |||||||
0.1 | 0.3 | Small | Large | 10% | 50% | 2 | 4 | 6 | 8 | |
Hybrid | 0.565 | 0.483 | 0.538 | 0.51 | 0.810 | 0.239 | 0.710 | 0.538 | 0.448 | 0.401 |
AE | 0.506 | 0.440 | 0.497 | 0.449 | 0.728 | 0.218 | 0.631 | 0.485 | 0.416 | 0.361 |
AE_SVM_IF | 0.553 | 0.463 | 0.525 | 0.49 | 0.800 | 0.215 | 0.701 | 0.522 | 0.432 | 0.376 |
SVM_IF | 0.547 | 0.458 | 0.522 | 0.483 | 0.797 | 0.208 | 0.701 | 0.517 | 0.424 | 0.369 |
Cleaning Process | Network Parameters | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
\(\mu\) | Comm. Size | \(O_n\) | \(O_m\) | |||||||
0.1 | 0.3 | Small | Large | 10% | 50% | 2 | 4 | 6 | 8 | |
Hybrid | 0.566 | 0.530 | 0.567 | 0.529 | 0.768 | 0.328 | 0.763 | 0.552 | 0.469 | 0.408 |
AE | 0.474 | 0.426 | 0.464 | 0.435 | 0.618 | 0.281 | 0.6 | 0.453 | 0.39 | 0.357 |
AE_SVM_IF | 0.544 | 0.503 | 0.539 | 0.508 | 0.757 | 0.290 | 0.726 | 0.559 | 0.44 | 0.369 |
SVM_IF | 0.559 | 0.527 | 0.543 | 0.543 | 0.785 | 0.301 | 0.748 | 0.572 | 0.464 | 0.388 |
Network category | Hybrid | AE | AE_SVM_IF | SVM_IF |
---|---|---|---|---|
Small networks | 1.6(1) | 3.2(4) | 2.6(2) | 2.7(3) |
Large networks | 1.6(1) | 3.8(4) | 2.7(3) | 2.0(2) |
Experiment 3: End-to-end evaluation
Algorithm | Network parameters | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
\(\mu\) | Comm. Size | \(O_n\) | \(O_m\) | |||||||
0.1 | 0.3 | Small | Large | 10% | 50% | 2 | 4 | 6 | 8 | |
Hybrid | 0.565 | 0.483 | 0.538 | 0.510 | 0.809 | 0.239 | 0.709 | 0.538 | 0.448 | 0.401 |
AC-SLPA | 0.484 | 0.43 | 0.475 | 0.438 | 0.691 | 0.223 | 0.602 | 0.472 | 0.402 | 0.351 |
SLPA | 0.527 | 0.388 | 0.474 | 0.441 | 0.737 | 0.178 | 0.634 | 0.458 | 0.396 | 0.342 |
Algorithm | Network parameters | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
\(\mu\) | Comm. Size | \(O_n\) | \(O_m\) | |||||||
0.1 | 0.3 | Small | Large | 10% | 50% | 2 | 4 | 6 | 8 | |
Hybrid | 0.566 | 0.530 | 0.567 | 0.529 | 0.768 | 0.328 | 0.763 | 0.552 | 0.469 | 0.408 |
AC-SLPA | 0.316 | 0.343 | 0.364 | 0.295 | 0.429 | 0.231 | 0.390 | 0.353 | 0.301 | 0.357 |
SLPA | 0.533 | 0.451 | 0.497 | 0.488 | 0.777 | 0.208 | 0.686 | 0.484 | 0.421 | 0.369 |
Network category | SLPA | AC-SLPA | Hybrid |
---|---|---|---|
Small networks | 2.2(2) | 2.5(3) | 1.3(1) |
Large networks | 2.6(3) | 2.1(2) | 1.3(1) |
Experiment 4: Real-world networks
Network | SLPA | AC-SLPA | AC-SLPA_Hybrid | AC-SLPA_AE | AC-SLPA_AE_SVM_IF | AC-SLPA_SVM_IF |
---|---|---|---|---|---|---|
Amazon | 0.957 | 0.956 | 0.956 | 0.952 | 0.951 | 0.955 |
YouTube | 0.627 | 0.778 | 0.818 | 0.778 | 0.751 | 0.751 |
DBLP | 0.897 | 0.892 | 0.921 | 0.906 | 0.889 | 0.893 |
Avg. Ranks | 3.3 | 3.7 | 1.3 | 3.0 | 5.3 | 4.3 |
Network | AC-SLPA_Hybrid | SLPA | OSLOM | MOSES | COPRA |
---|---|---|---|---|---|
Amazon | 0.956 | 0.957 | 0.967 | 0.908 | 0.962 |
YouTube | 0.818 | 0.627 | 0.449 | 0.421 | 0.191 |
DBLP | 0.921 | 0.897 | 0.849 | 0.771 | 0.914 |
Avg. Ranks | 2.0(1) | 2.7(2.5) | 2.7(2.5) | 4.7(5) | 3.0(4) |
Network | SLPA | ACSLPA with noise | ACSLPA_Hybrid | ACSLPA without noise | ||||
---|---|---|---|---|---|---|---|---|
Distance | p value | Distance | p value | Distance | p value | Distance | p value | |
Amazon | 0.076 | 0.016 | 0.059 | 0.113 | 0.050 | 0.252 | 0.033 | 0.772 |
YouTube | 0.086 | 0.017 | 0.250 | 0.000 | 0.099 | 0.000 | 0.073 | 0.005 |
DBLP | 0.222 | 0.000 | 0.222 | 0.000 | 0.141 | 0.000 | 0.032 | 0.899 |