Introduction
Problem statement
Research contributions
Running scenario
Paper organization
Background and related works
Social network for learning and professional development
Learning analytics
Clustering algorithms
Community of practice apprenticeship model
Career readiness
Career prediction
Career development and SLA
Fuzzy semi-supervised clustering algorithm
-
We have seeds and each class will have at least one seed. The seed labels are always correct.
-
We have pairwise constraints, must-links and cannot-links. These constraints could be wrong.
-
We allow fuzzy labeling, namely each instance can be in more than one cluster.
-
All labels are assigned to both seeds and constraints.
Symbol | Description |
---|---|
X | The input domain |
C | Number of clusters |
μ
c
| Initial centeroids of cluster |
i,j | Indices running over clusters |
a,b | Indices running over instances or output clusters’ labels |
x
a
| Input data instance x
a
∈X
|
y
a
| Output cluster lable y
a
∈ [C] |
D(x
a
,μ
j
) | Distance between instance x
a
and center of cluster j |
C
=
| Must-link constraints |
C
≢
| Cannot-link constraints |
h
∗=a
r
g
m
i
n
h
O
new
| Instance assignment that minimally increases the error terms |
Performance evaluation
Experiment setup
-
Randomly generate the center of the clusters. Then for each cluster, take a radius as input and randomly sample a given number of data points in the circle.
-
To determine if a data point belongs to multiple clusters, consider the distance of the data point to each cluster center. If the distance is no greater than the radius of the cluster, the point belongs to the cluster.
Experiment metrics
-
The precision is the ratio tp/(tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.
-
The recall is the ratio tp/(tp + fn) where tp is the number of true positives and fn the number of false negatives. The recall is intuitively the ability of the classifier to find all the positive samples.
Experiment results
Index | No. of nodes in overlapped region | Accuracy |
---|---|---|
1 | 18 | 0.997778 |
2 | 19 | 0.996667 |
3 | 33 | 0.99625 |
4 | 83 | 0.994074 |
5 | 133 | 0.94037 |
6 | 180 | 0.752593 |
7 | 25 | 0.997778 |
8 | 14 | 0.996667 |
9 | 6 | 0.996111 |
10 | 103 | 0.902593 |
11 | 146 | 0.952963 |
12 | 173 | 0.914074 |
13 | 15 | 0.993889 |
14 | 16 | 0.995556 |
15 | 10 | 0.997222 |
16 | 128 | 0.858889 |
17 | 173 | 0.90963 |
18 | 188 | 0.767778 |