Skip to main content

2016 | OriginalPaper | Buchkapitel

Comparison of Graph Node Distances on Clustering Tasks

verfasst von : Felix Sommer, François Fouss, Marco Saerens

Erschienen in: Artificial Neural Networks and Machine Learning – ICANN 2016

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This work presents recent developments in graph node distances and tests them empirically on social network databases of various sizes and types. We compare two versions of a distance-based kernel k-means algorithm with the well-established Louvain method. The first version is a classic kernel k-means approach, the second version additionally makes use of node weights with the Sum-over-Forests density index. Both kernel k-means algorithms employ a variety of classic and modern distances. We compare the results of all three algorithms using statistical measures and an overall rank-comparison to ascertain their capabilities in community detection. Results show that two recently introduced distances outperform the others, on our tested datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bavaud, F., Guex, G.: Interpolating between random walks and shortest paths: a path functional approach. In: Aberer, K., Flache, A., Jager, W., Liu, L., Tang, J., Guéret, C. (eds.) SocInfo 2012. LNCS, vol. 7710, pp. 68–81. Springer, Heidelberg (2012)CrossRef Bavaud, F., Guex, G.: Interpolating between random walks and shortest paths: a path functional approach. In: Aberer, K., Flache, A., Jager, W., Liu, L., Tang, J., Guéret, C. (eds.) SocInfo 2012. LNCS, vol. 7710, pp. 68–81. Springer, Heidelberg (2012)CrossRef
2.
Zurück zum Zitat Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theor. Exp. 2008, P10008 (2008)CrossRef Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theor. Exp. 2008, P10008 (2008)CrossRef
3.
4.
Zurück zum Zitat Celeux, G., Diday, E., Govaert, G., Lechevallier, Y., Ralambondrainy, H.: Classification Automatique des Données. Dunod, Paris (1989)MATH Celeux, G., Diday, E., Govaert, G., Lechevallier, Y., Ralambondrainy, H.: Classification Automatique des Données. Dunod, Paris (1989)MATH
5.
Zurück zum Zitat Chebotarev, P., Shamis, E.: The matrix-forest theorem and measuring relations in small social groups. Autom. Remote Control 58(9), 1505–1514 (1997)MathSciNetMATH Chebotarev, P., Shamis, E.: The matrix-forest theorem and measuring relations in small social groups. Autom. Remote Control 58(9), 1505–1514 (1997)MathSciNetMATH
6.
Zurück zum Zitat Chebotarev, P.: A class of graph-geodetic distances generalizing the shortest-path and the resistance distances. Discrete Appl. Math. 159(5), 295–302 (2011)MathSciNetCrossRefMATH Chebotarev, P.: A class of graph-geodetic distances generalizing the shortest-path and the resistance distances. Discrete Appl. Math. 159(5), 295–302 (2011)MathSciNetCrossRefMATH
8.
Zurück zum Zitat Collignon, A., Maes, F., Delaere, D., Vandermeulen, D., Suetens, P., Marchal, G.: Automated multi-modality image registration based on information theory. Inf. Process. Med. Imaging 3, 263–274 (1995) Collignon, A., Maes, F., Delaere, D., Vandermeulen, D., Suetens, P., Marchal, G.: Automated multi-modality image registration based on information theory. Inf. Process. Med. Imaging 3, 263–274 (1995)
9.
Zurück zum Zitat Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)MATH Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, 3rd edn. The MIT Press, Cambridge (2009)MATH
10.
Zurück zum Zitat Daniel, W.: Applied Nonparametric Statistics. The Duxbury Advanced Series in Statistics and Decision Sciences. PWS-Kent Publications, Boston (1990) Daniel, W.: Applied Nonparametric Statistics. The Duxbury Advanced Series in Statistics and Decision Sciences. PWS-Kent Publications, Boston (1990)
11.
Zurück zum Zitat Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MathSciNetMATH
12.
Zurück zum Zitat Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)MATH Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)MATH
13.
Zurück zum Zitat Fouss, F., Saerens, M., Shimbo, M.: Algorithms for Exploratory Link Analysis. Cambridge University Press (2016, to appear) Fouss, F., Saerens, M., Shimbo, M.: Algorithms for Exploratory Link Analysis. Cambridge University Press (2016, to appear)
14.
Zurück zum Zitat Françoisse, K., Kivimäki, I., Mantrach, A., Rossi, F., Saerens, M.: A bag-of-paths framework for network data analysis, pp. 1–36 (2013). arXiv:1302.6766 Françoisse, K., Kivimäki, I., Mantrach, A., Rossi, F., Saerens, M.: A bag-of-paths framework for network data analysis, pp. 1–36 (2013). arXiv:​1302.​6766
15.
Zurück zum Zitat Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. In: Proceedings of the National Academy of Sciences, vol. 99, pp. 7821–7826. National Academy of Sciences (2002) Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. In: Proceedings of the National Academy of Sciences, vol. 99, pp. 7821–7826. National Academy of Sciences (2002)
16.
Zurück zum Zitat Grady, L., Schwartz, E.: The graph analysis toolbox: image processing on arbitrary graphs. CAS/CNS Technical report Series (021) (2010) Grady, L., Schwartz, E.: The graph analysis toolbox: image processing on arbitrary graphs. CAS/CNS Technical report Series (021) (2010)
17.
Zurück zum Zitat Hashimoto, T., Sun, Y., Jaakkola, T.: From random walks to distances on unweighted graphs. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 3411–3419. Curran Associates, Inc. (2015) Hashimoto, T., Sun, Y., Jaakkola, T.: From random walks to distances on unweighted graphs. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 3411–3419. Curran Associates, Inc. (2015)
18.
19.
Zurück zum Zitat Kaufmann, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)CrossRef Kaufmann, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990)CrossRef
20.
Zurück zum Zitat Kivimäki, I., Lebichot, B., Saerens, M.: Developments in the theory of randomized shortest paths with a comparison of graph node distances. Physica A Stat. Mech. Appl. 393, 600–616 (2014)CrossRef Kivimäki, I., Lebichot, B., Saerens, M.: Developments in the theory of randomized shortest paths with a comparison of graph node distances. Physica A Stat. Mech. Appl. 393, 600–616 (2014)CrossRef
22.
Zurück zum Zitat Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 46–110 (2008)CrossRef Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 46–110 (2008)CrossRef
24.
25.
Zurück zum Zitat Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3), 036104 (2006)MathSciNetCrossRef Newman, M.E.J.: Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74(3), 036104 (2006)MathSciNetCrossRef
26.
Zurück zum Zitat Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. (USA) 103, 8577–8582 (2006)CrossRef Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. (USA) 103, 8577–8582 (2006)CrossRef
27.
Zurück zum Zitat Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)CrossRef Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)CrossRef
28.
Zurück zum Zitat Saerens, M., Achbany, Y., Fouss, F., Yen, L.: Randomized shortest-path problems: two related models. Neural Comput. 21(8), 2363–2404 (2009)MathSciNetCrossRefMATH Saerens, M., Achbany, Y., Fouss, F., Yen, L.: Randomized shortest-path problems: two related models. Neural Comput. 21(8), 2363–2404 (2009)MathSciNetCrossRefMATH
29.
Zurück zum Zitat Schölkopf, B., Smola, A.: Learning with Kernels. The MIT Press, Cambridge (2002)MATH Schölkopf, B., Smola, A.: Learning with Kernels. The MIT Press, Cambridge (2002)MATH
30.
Zurück zum Zitat Senelle, M., Garcia-Diez, S., Mantrach, A., Shimbo, M., Saerens, M., Fouss, F.: The sum-over-forests density index: identifying dense regions in a graph. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1268–1274 (2014). arXiv:1301.0725 CrossRef Senelle, M., Garcia-Diez, S., Mantrach, A., Shimbo, M., Saerens, M., Fouss, F.: The sum-over-forests density index: identifying dense regions in a graph. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1268–1274 (2014). arXiv:​1301.​0725 CrossRef
31.
Zurück zum Zitat Siegel, S.: Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, New York (1956)MATH Siegel, S.: Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, New York (1956)MATH
32.
Zurück zum Zitat Sommer, F., Fouss, F., Saerens, M.: Clustering using a Sum-Over-Forests weighted kernel k-means approach. LSM Working Paper 22 (2015) Sommer, F., Fouss, F., Saerens, M.: Clustering using a Sum-Over-Forests weighted kernel k-means approach. LSM Working Paper 22 (2015)
33.
Zurück zum Zitat von Luxburg, U., Radl, A., Hein, M.: Getting lost in space: large sample analysis of the commute distance. In: Proceedings of the 23th Neural Information Processing Systems Conference (NIPS 2010), pp. 2622–2630 (2010) von Luxburg, U., Radl, A., Hein, M.: Getting lost in space: large sample analysis of the commute distance. In: Proceedings of the 23th Neural Information Processing Systems Conference (NIPS 2010), pp. 2622–2630 (2010)
34.
Zurück zum Zitat von Luxburg, U., Radl, A., Hein, M.: Hitting and commute times in large random neighborhood graphs. J. Mach. Learn. Res. 15, 1751–1798 (2014)MathSciNetMATH von Luxburg, U., Radl, A., Hein, M.: Hitting and commute times in large random neighborhood graphs. J. Mach. Learn. Res. 15, 1751–1798 (2014)MathSciNetMATH
35.
Zurück zum Zitat Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering based on the commute-time Kernel. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1037–1045. Springer, Heidelberg (2007)CrossRef Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering based on the commute-time Kernel. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 1037–1045. Springer, Heidelberg (2007)CrossRef
36.
Zurück zum Zitat Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering with the sigmoid commute-time kernel: a comprehensive study. Data Knowl. Eng. 68(3), 338–361 (2008)CrossRef Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering with the sigmoid commute-time kernel: a comprehensive study. Data Knowl. Eng. 68(3), 338–361 (2008)CrossRef
37.
Zurück zum Zitat Yen, L., Mantrach, A., Shimbo, M., Saerens, M.: A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In: Proceedings of the 14th SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 785–793 (2008) Yen, L., Mantrach, A., Shimbo, M., Saerens, M.: A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In: Proceedings of the 14th SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2008), pp. 785–793 (2008)
38.
Zurück zum Zitat Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977)CrossRef Zachary, W.W.: An information flow model for conflict and fission in small groups. J. Anthropol. Res. 33, 452–473 (1977)CrossRef
Metadaten
Titel
Comparison of Graph Node Distances on Clustering Tasks
verfasst von
Felix Sommer
François Fouss
Marco Saerens
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-44778-0_23