Introduction
Related work
Cosine similarity
Term | SaS | PaP | WH |
---|---|---|---|
Affection | 115 | 58 | 20 |
Jealous | 10 | 7 | 11 |
Gossip | 2 | 0 | 6 |
Wuthering | 0 | 0 | 38 |
Term | SaS | PaP | WH |
---|---|---|---|
Affection | 3.06 | 2.76 | 2.30 |
Jealous | 2.00 | 1.85 | 2.04 |
Gossip | 1.30 | 0 | 1.78 |
Wuthering | 0 | 0 | 2.58 |
Sqrt-cosine similarity
SaS | PaP | WH | |
---|---|---|---|
SaS | 1.00 | 0.94 | 0.79 |
PaP | 0.94 | 1.00 | 0.69 |
WH | 0.79 | 0.69 | 1.00 |
SaS | PaP | WH | |
---|---|---|---|
SaS | 0.15 | 0.16 | 0.11 |
PaP | 0.16 | 0.21 | 0.11 |
WH | 0.11 | 0.11 | 0.11 |
The proposed ISC similarity
Experiments
Data sets
SaS | PaP | WH | |
---|---|---|---|
SaS | 1.00 | 0.89 | 0.83 |
PaP | 0.89 | 1.00 | 0.70 |
WH | 0.83 | 0.70 | 1.00 |
#Sample | #Dim | #Class | |
---|---|---|---|
CSTR | 475 | 1000 | 4 |
DBLP | 1367 | 200 | 9 |
Reuters | 2900 | 1000 | 8–52 |
WebKB4 | 4199 | 1000 | 4 |
Newsgroups | 11293 | 1000 | 20 |
Learners
Performance metrics
Experimental results
Overall results
Similarity | Accuracy | Purity | NMI | |||
---|---|---|---|---|---|---|
Mean | HSD | Mean | HSD | Mean | HSD | |
ISC | 0.3563 | A | 0.5950 | A | 0.1590 | A |
Cosine | 0.3370 | A | 0.5608 | A | 0.1363 | A |
Gaussian | 0.2949 | A | 0.5597 | A | 0.0990 | A |
Similarity | Accuracy | AUC | ||
---|---|---|---|---|
Mean | HSD | Mean | HSD | |
ISC | 0.6562 | A | 0.7901 | A |
Cosine | 0.6371 | A | 0.7780 | A |
Gaussian | 0.2872 | B | 0.5582 | B |
Metric | Similarity | KNN | Naïve Bays | SVM | |||
---|---|---|---|---|---|---|---|
Mean | HSD | Mean | HSD | Mean | HSD | ||
Accuracy | ISC | 0.7079 | A | 0.8589 | A | 0.4019 | A |
Cosine | 0.6476 | A | 0.8633 | A | 0.4004 | A | |
Gaussian | 0.4606 | A | 0.1795 | B | 0.2215 | A | |
AUC | ISC | 0.8779 | A | 0.8806 | A | 0.612 | A |
Cosine | 0.7977 | AB | 0.8892 | A | 0.6473 | A | |
Gaussian | 0.6620 | B | 0.5084 | B | 0.5042 | A |
Metric | Similarity | Kmeans | Ncut | PCA-Kmean | SYM-NMF | ||||
---|---|---|---|---|---|---|---|---|---|
Mean | HSD | Mean | HSD | Mean | HSD | Mean | HSD | ||
Accuracy | ISC | 0.3354 | A | 0.3220 | A | 0.3090 | A | 0.4589 | A |
Cosine | 0.3115 | A | 0.3104 | A | 0.3070 | A | 0.4191 | A | |
Gaussian | 0.3005 | A | 0.3020 | A | 0.3020 | A | 0.2750 | A | |
Purity | ISC | 0.4357 | A | 0.5606 | A | 0.8499 | A | 0.5337 | A |
Cosine | 0.4217 | A | 0.5626 | A | 0.7771 | A | 0.5072 | A | |
Gaussian | 0.3919 | A | 0.5693 | A | 0.8457 | A | 0.4066 | A | |
NMI | ISC | 0.1740 | A | 0.1367 | A | 0.0369 | A | 0.2886 | A |
Cosine | 0.1332 | A | 0.1321 | A | 0.0335 | A | 0.2464 | A | |
Gaussian | 0.0992 | A | 0.1337 | A | 0.0309 | A | 0.1321 | A |
Results using different learners
Metric | Data set | ISC | Cosine | Gaussian | |||
---|---|---|---|---|---|---|---|
Mean | HSD | Mean | HSD | Mean | HSD | ||
Accuracy | WEBKB | 0.6104 | A | 0.5929 | A | 0.3046 | A |
R8 | 0.7166 | A | 0.7361 | A | 0.4485 | A | |
R52 | 0.4975 | A | 0.4230 | A | 0.1945 | A | |
NEWS | 0.6009 | A | 0.5989 | A | 0.2468 | A | |
DBLP | 0.7101 | A | 0.6842 | A | 0.2234 | B | |
CSTR | 0.8019 | A | 0.7873 | A | 0.3052 | B | |
Average #A’s | 0.6562 | 6 | 0.6370 | 6 | 0.2871 | 4 | |
AUC | WEBKB | 0.8162 | A | 0.8304 | A | 0.6171 | A |
R8 | 0.7342 | A | 0.6641 | A | 0.5341 | A | |
R52 | 0.7826 | A | 0.7540 | A | 0.5075 | A | |
NEWS | 0.7514 | A | 0.7570 | A | 0.5852 | A | |
DBLP | 0.9253 | A | 0.9287 | A | 0.6011 | B | |
CSTR | 0.7313 | A | 0.7340 | A | 0.5040 | A | |
Average #A’s | 0.7901 | 6 | 0.7780 | 6 | 0.5581 | 5 |
Metric | Data set | ISC | Cosine | Gaussian | |||
---|---|---|---|---|---|---|---|
Mean | HSD | Mean | HSD | Mean | HSD | ||
Accuracy | WEBKB | 0.4798 | A | 0.4434 | A | 0.3824 | A |
R8 | 0.4384 | A | 0.4291 | A | 0.4472 | A | |
R52 | 0.2320 | A | 0.2283 | A | 0.2395 | A | |
NEWS | 0.1659 | A | 0.1544 | A | 0.1179 | A | |
DBLP | 0.3886 | A | 0.3574 | A | 0.2640 | A | |
CSTR | 0.4332 | A | 0.4095 | A | 0.3182 | A | |
Average #A’s | 0.3563 | 6 | 0.3370 | 6 | 0.2948 | 6 | |
Purity | WEBKB | 0.6248 | A | 0.6091 | A | 0.5548 | A |
R8 | 0.5769 | A | 0.5790 | A | 0.6446 | A | |
R52 | 0.4440 | A | 0.4225 | A | 0.4478 | A | |
NEWS | 0.4234 | A | 0.4410 | A | 0.3948 | A | |
DBLP | 0.7980 | A | 0.6363 | A | 0.6531 | A | |
CSTR | 0.7026 | A | 0.6704 | A | 0.6700 | A | |
Average #A’s | 0.59495 | 6 | 0.55971 | 6 | 0.56085 | 6 | |
NMI | WEBKB | 0.1500 | A | 0.1177 | A | 0.0879 | A |
R8 | 0.1978 | A | 0.1912 | A | 0.2030 | A | |
R52 | 0.1376 | A | 0.1321 | A | 0.1179 | A | |
NEWS | 0.0855 | A | 0.0761 | A | 0.0731 | A | |
DBLP | 0.2439 | A | 0.1940 | A | 0.0948 | A | |
CSTR | 0.1396 | A | 0.1069 | A | 0.0172 | A | |
Average #A’s | 0.1590 | 6 | 0.1363 | 6 | 0.0990 | 6 |