1 Introduction
2 Related work
2.1 Graph drawing based on t-SNE and t-SNE variations
2.2 Definition of perplexity in t-SNE
2.3 Metrics for graph layout evaluation
2.3.1 Normalized stress measure \(M_{\sigma }\)
2.3.2 Neighborhood preservation measure \(M_{np}\)
3 Design of our approach
3.1 Modification on perplexity and bandwidth fitting in standard t-SNE
3.2 Appropriate perplexity estimation for the standard t-SNE
3.3 Appropriate perplexity estimation for sklearn Barnes–Hut TSNE
4 Experiment and results
4.1 Test data
Dataset | Dataset type | \(\vert V\vert \) | \(\vert E\vert \) | \(\vert E\vert /\vert V\vert \) |
---|---|---|---|---|
dtw_72 | Planar, structural | 72 | 75 | 1.0417 |
lesmis | Collaboration network | 77 | 254 | 3.2987 |
can_96 | Mesh-like, structural | 96 | 336 | 3.5000 |
rajat11 | Miscellaneous | 135 | 377 | 2.7926 |
jazz | Collaboration network | 198 | 2742 | 13.8485 |
visbrazil | Collaboration network | 222 | 236 | 1.5135 |
grid17 | Planar, mesh-like, structural | 289 | 544 | 1.8824 |
mesh3e1 | Mesh-like, structural | 289 | 800 | 2.7682 |
netscience | Collaboration network | 379 | 914 | 2.4116 |
dwt_419 | Structural | 419 | 1572 | 3.7518 |
price_1000 | Planar, tree | 1000 | 999 | 0.9990 |
dwt_1005 | Structural | 1005 | 3808 | 3.7891 |
cage8 | Miscellaneous | 1015 | 4994 | 4.9202 |
bcsstk09 | Mesh-like, structural | 1083 | 8677 | 8.0120 |
block_2000 | Clusters | 2000 | 9912 | 4.9560 |
sierpinski3d | Structural | 2050 | 6144 | 2.9971 |
CA-GrQc | Collaboration network | 4158 | 13,422 | 3.2280 |
EVA | Collaboration network | 4475 | 4652 | 1.0396 |
3elt | Planar, mesh-like | 4720 | 13,722 | 2.9072 |
us_powergrid | Structural | 4941 | 6594 | 1.3345 |
4.2 Validation of hypothesis
\(\mathrm{Perp}\_\mathrm{pct}\) with | lm coefficient and intercept | Correlation | |||
---|---|---|---|---|---|
\(\mathrm{lm}\_\mathrm{coef}^{1}\) | \(\mathrm{lm}\_\mathrm{intc}^{2}\) | \(\mathrm{cor}\_\mathrm{coef}^{3}\) | 95% CI | \(p\_\mathrm{value}^{4}\) | |
Average \(M_\sigma \) | \(-\,4.3826\) | 0.7024 | \(-\,0.8299\) | \([-\,0.9741, -\,0.2048]\) | 0.0209 |
Average \(M_{np}\) | \(-\,6.1940\) | 4.2302 | \(-\,0.9450\) | \([-\,0.9921, -\,0.6655]\) | 0.0013 |
4.3 Perplexity estimation based on our approach based on the modified standard t-SNE
4.3.1 \(M_{\sigma }\) and \(M_{np}\)
Dataset | Size/density label | \(M_\sigma ^{1}\) | \(M_{np}^{2}\) | ||
---|---|---|---|---|---|
tsNET* | st-SNE | tsNET* | st-SNE | ||
Perplexity \(= 40\) | |||||
dtw_72 | Small/sparse | 0.048 | 0.0392 | 0.855 | 0.8647 |
rajat11 | Small/sparse | 0.097 | 0.0779 | 0.717 | 0.7220 |
visbrazil | Small/sparse | 0.098 | 0.0722 | 0.584 | 0.5308 |
grid17 | Small/sparse | 0.021 | 0.0493 | 0.785 | 0.8499 |
mesh3e1 | Small/sparse | 0.014 | 0.0166 | 0.904 | 0.9984 |
netscience | Small/sparse | 0.100 | 0.1112 | 0.707 | 0.7154 |
lesmis | Small/dense | 0.111 | 0.0967 | 0.712 | 0.6945 |
can_96 | Small/dense | 0.084 | 0.0743 | 0.671 | 0.6495 |
jazz | Small/dense | 0.128 | 0.1388 | 0.804 | 0.8077 |
dwt_419 | Small/dense | 0.024 | 0.0386 | 0.741 | 0.7108 |
Perplexity \(=\vert V\vert \frac{\mu - 2\sigma }{\mu } \delta (\frac{\vert E \vert }{\vert V \vert }) \) as in Eq. 9 | |||||
price_1000 | Large/sparse | 0.160 | 0.1220 | 0.639 | 0.6229 |
sierpinski3d | Large/sparse | 0.093 | 0.1013 | 0.580 | 0.6531 |
EVA | Large/sparse | 0.161 | 0.2161 | 0.802 | 0.8227 |
3elt | Large/sparse | 0.090 | 0.0717 | 0.715 | 0.8113 |
us_powergrid | Large/sparse | 0.101 | 0.0742 | 0.457 | 0.5424 |
dwt_1005 | Large/dense | 0.035 | 0.0185 | 0.619 | 0.6667 |
cage8 | Large/dense | 0.203 | 0.1658 | 0.437 | 0.4219 |
bcsstk09 | Large/dense | 0.022 | 0.0220 | 0.867 | 0.8509 |
block_2000 | Large/dense | 0.189 | 0.1604 | 0.372 | 0.3640 |
CA-GrQc | Large/dense | 0.189 | 0.2057 | 0.483 | 0.4871 |
Average | Overall | 0.0984 | 0.0936 | 0.6726 | 0.6893 |
Small | 0.0725 | 0.0715 | 0.7480 | 0.7544 | |
Large | 0.1243 | 0.1158 | 0.5971 | 0.6243 | |
Sparse | 0.0894 | 0.0865 | 0.7041 | 0.7394 | |
Dense | 0.1094 | 0.1023 | 0.6340 | 0.6281 |
Perplexity \(=\vert V\vert \frac{\mu - \sigma }{3\mu } \delta (\frac{\vert E\vert }{\vert V\vert }) \) as in Eq. 10 | ||||||
---|---|---|---|---|---|---|
Dataset | TSNE |
\(M_\sigma \)
|
\(M_{np}\)
| |||
runtime (s) | tsNET* | TSNE | tsNET* | TSNE | ||
Large sparse | price_1000 | 2.8 | 0.160 |
0.1666
| 0.639 |
0.6332
|
sierpinski3d | 5.0 | 0.093 |
0.1421
| 0.580 |
0.6538
| |
EVA | 27.4 | 0.161 |
0.1771
| 0.802 |
0.8197
| |
3elt | 23.5 | 0.090 |
0.0660
| 0.715 |
0.8226
| |
us_powergrid | 24.8 | 0.101 |
0.0889
| 0.457 |
0.5789
| |
Large dense | dwt_1005 | 2.4 | 0.035 |
0.0199
| 0.619 |
0.6744
|
cage8 | 2.5 | 0.203 |
0.1551
| 0.437 |
0.4161
| |
bcsstk09 | 3.5 | 0.022 |
0.0242
| 0.867 |
0.8420
| |
block_2000 | 5.6 | 0.189 |
0.1519
| 0.372 |
0.3645
| |
CA-GrQc | 31.3 | 0.189 |
0.2264
| 0.483 |
0.4915
| |
Average | Large-overall | 12.9 | 0.1243 |
0.1218
| 0.5971 |
0.6296
|
Large-sparse | 16.7 | 0.1210 |
0.1281
| 0.6386 |
0.7014
| |
Large-dense | 9.0 | 0.1276 |
0.1155
| 0.5556 |
0.5577
|
4.3.2 Visual inspection
4.3.3 Runtime
4.4 Discussion
5 Conclusion and future work
6 Statement
-
All authors wrote and reviewed the main manuscript, Chun Xiao prepared the figures.
-
I confirm that I understand journal International Journal of Data Science and Analytics is a transformative journal. When research is accepted for publication, there is a choice to publish using either immediate gold open access or the traditional publishing route.
-
I declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
-
The results/data/figures in this manuscript have not been published elsewhere, nor are they under consideration by another publisher.
-
I have read the Springer journal policies on author responsibilities and submit this manuscript in accordance with those policies.
-
All of the material is owned by the authors and/or no permissions are required.
-
This research did not involve Human Participants and/or Animals.