Skip to main content

2022 | OriginalPaper | Buchkapitel

Dissecting Graph Measure Performance for Node Clustering in LFR Parameter Space

verfasst von : Vladimir Ivashkin, Pavel Chebotarev

Erschienen in: Complex Networks & Their Applications X

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Graph measures that express closeness or distance between nodes can be employed for graph nodes clustering using metric clustering algorithms. There are numerous measures applicable to this task, and which one performs better is an open question. We study the performance of 25 graph measures on generated graphs with different parameters. While usually measure comparisons are limited to general measure ranking on a particular dataset, we aim to explore the performance of various measures depending on graph features. Using an LFR graph generator, we create a dataset of 11780 graphs covering the whole LFR parameter space. For each graph, we assess the quality of clustering with k-means algorithm for each considered measure. Based on this, we determine the best measure for each area of the parameter space. We find that the parameter space consists of distinct zones where one particular measure is the best. We analyze the geometry of the resulting zones and describe it with simple criteria. Given particular graph parameters, this allows us to recommend a particular measure to use for clustering.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. Stanford University, Technical report (2006) Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. Stanford University, Technical report (2006)
5.
Zurück zum Zitat Barber, M.J., Clark, J.W.: Detecting network communities by propagating labels under constraints. Phys. Rev. E 80(2), 026129 (2009) Barber, M.J., Clark, J.W.: Detecting network communities by propagating labels under constraints. Phys. Rev. E 80(2), 026129 (2009)
6.
Zurück zum Zitat Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theor. Exp. 2008(10), P10008 (2008)CrossRefMATH Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theor. Exp. 2008(10), P10008 (2008)CrossRefMATH
7.
Zurück zum Zitat Borg, I., Groenen, P.J.F.: Modern Multidimensional Scaling: Theory and Applications. Springer Science & Business Media (2005) Borg, I., Groenen, P.J.F.: Modern Multidimensional Scaling: Theory and Applications. Springer Science & Business Media (2005)
8.
Zurück zum Zitat Buckley, F., Harary, F.: Distance in Graphs. Addison-Wesley, Boston (1990) Buckley, F., Harary, F.: Distance in Graphs. Addison-Wesley, Boston (1990)
10.
Zurück zum Zitat Chebotarev, P., Shamis, E.: On the proximity measure for graph vertices provided by the inverse Laplacian characteristic matrix. In: Abstracts of the Conference “Linear Algebra and its Applications”, pp. 6–7. University of Manchester, Manchester, UK (1995) Chebotarev, P., Shamis, E.: On the proximity measure for graph vertices provided by the inverse Laplacian characteristic matrix. In: Abstracts of the Conference “Linear Algebra and its Applications”, pp. 6–7. University of Manchester, Manchester, UK (1995)
11.
Zurück zum Zitat Chebotarev, P., Shamis, E.: On a duality between metrics and \({\rm \Sigma }\)-proximities. Autom. Remote Control 59(4), 608–612 (1998)MATH Chebotarev, P., Shamis, E.: On a duality between metrics and \({\rm \Sigma }\)-proximities. Autom. Remote Control 59(4), 608–612 (1998)MATH
12.
Zurück zum Zitat Chebotarev, P., Shamis, E.: On proximity measures for graph vertices. Autom. Remote Control 59(10), 1443–1459 (1998)MATH Chebotarev, P., Shamis, E.: On proximity measures for graph vertices. Autom. Remote Control 59(10), 1443–1459 (1998)MATH
13.
Zurück zum Zitat Chung, F.: The heat kernel as the pagerank of a graph. Proc. Nat. Acad. Sci. 104(50), 19735–19740 (2007)CrossRef Chung, F.: The heat kernel as the pagerank of a graph. Proc. Nat. Acad. Sci. 104(50), 19735–19740 (2007)CrossRef
14.
Zurück zum Zitat Chung, F., Yau, S.T.: Coverings, heat kernels and spanning trees. J. Comb. 6, 163–184 (1998) Chung, F., Yau, S.T.: Coverings, heat kernels and spanning trees. J. Comb. 6, 163–184 (1998)
15.
Zurück zum Zitat Chung, F.R.K.: Spectral Graph Theory, vol. 92. American Mathematical Society (1997) Chung, F.R.K.: Spectral Graph Theory, vol. 92. American Mathematical Society (1997)
16.
Zurück zum Zitat Courtain, S., Leleux, P., Kivimäki, I., Guex, G., Saerens, M.: Randomized shortest paths with net flows and capacity constraints. Inform. Sci. 556, 341–360 (2020) Courtain, S., Leleux, P., Kivimäki, I., Guex, G., Saerens, M.: Randomized shortest paths with net flows and capacity constraints. Inform. Sci. 556, 341–360 (2020)
17.
Zurück zum Zitat Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30(7), 1575–1584 (2002)CrossRef Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30(7), 1575–1584 (2002)CrossRef
18.
Zurück zum Zitat Estrada, E., Hatano, N.: Statistical-mechanical approach to subgraph centrality in complex networks. Chem. Phys. Lett. 439(1–3), 247–251 (2007)CrossRef Estrada, E., Hatano, N.: Statistical-mechanical approach to subgraph centrality in complex networks. Chem. Phys. Lett. 439(1–3), 247–251 (2007)CrossRef
19.
Zurück zum Zitat Estrada, E., Hatano, N.: Communicability in complex networks. Phys. Rev. E 77(3), 036111 (2008) Estrada, E., Hatano, N.: Communicability in complex networks. Phys. Rev. E 77(3), 036111 (2008)
20.
Zurück zum Zitat Estrada, E., Silver, G.: Accounting for the role of long walks on networks via a new matrix function. J. Math. Anal. Appl. 449(2), 1581–1600 (2017)MathSciNetCrossRefMATH Estrada, E., Silver, G.: Accounting for the role of long walks on networks via a new matrix function. J. Math. Anal. Appl. 449(2), 1581–1600 (2017)MathSciNetCrossRefMATH
21.
Zurück zum Zitat Fortunato, S., Barthelemy, M.: Resolution limit in community detection. Proc. Nat. Acad. Sci. 104(1), 36–41 (2007)CrossRef Fortunato, S., Barthelemy, M.: Resolution limit in community detection. Proc. Nat. Acad. Sci. 104(1), 36–41 (2007)CrossRef
22.
Zurück zum Zitat Fotouhi, B., Momeni, N., Allen, B., Nowak, M.A.: Evolution of cooperation on large networks with community structure. J. R. Soc. Interface 16(152), 20180677 (2019)CrossRef Fotouhi, B., Momeni, N., Allen, B., Nowak, M.A.: Evolution of cooperation on large networks with community structure. J. R. Soc. Interface 16(152), 20180677 (2019)CrossRef
23.
Zurück zum Zitat Fouss, F., Francoisse, K., Yen, L., Pirotte, A., Saerens, M.: An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Netw. 31, 53–72 (2012)CrossRefMATH Fouss, F., Francoisse, K., Yen, L., Pirotte, A., Saerens, M.: An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Netw. 31, 53–72 (2012)CrossRefMATH
24.
Zurück zum Zitat Fouss, F., Saerens, M., Shimbo, M.: Algorithms and Models for Network Data and Link Analysis. Cambridge University Press, Cambridge (2016) Fouss, F., Saerens, M., Shimbo, M.: Algorithms and Models for Network Data and Link Analysis. Cambridge University Press, Cambridge (2016)
25.
Zurück zum Zitat Fouss, F., Yen, L., Pirotte, A., Saerens, M.: An experimental investigation of graph kernels on a collaborative recommendation task. In: Sixth International Conference on Data Mining (ICDM’06), pp. 863–868. IEEE (2006) Fouss, F., Yen, L., Pirotte, A., Saerens, M.: An experimental investigation of graph kernels on a collaborative recommendation task. In: Sixth International Conference on Data Mining (ICDM’06), pp. 863–868. IEEE (2006)
27.
Zurück zum Zitat Gösgens, M., Prokhorenkova, L., Tikhonov, A.: Systematic analysis of cluster similarity indices: Towards bias-free cluster validation. arXiv preprint arXiv:1911.04773 (2019) Gösgens, M., Prokhorenkova, L., Tikhonov, A.: Systematic analysis of cluster similarity indices: Towards bias-free cluster validation. arXiv preprint arXiv:​1911.​04773 (2019)
28.
Zurück zum Zitat Guex, G., Courtain, S., Saerens, M.: Covariance and correlation kernels on a graph in the generalized bag-of-paths formalism. arXiv preprint arXiv:1902.03002 (2019) Guex, G., Courtain, S., Saerens, M.: Covariance and correlation kernels on a graph in the generalized bag-of-paths formalism. arXiv preprint arXiv:​1902.​03002 (2019)
29.
Zurück zum Zitat Guex, G., Kivimäki, I., Saerens, M.: Randomized optimal transport on a graph: framework and new distance measures. arXiv preprint arXiv:1806.03232 (2018) Guex, G., Kivimäki, I., Saerens, M.: Randomized optimal transport on a graph: framework and new distance measures. arXiv preprint arXiv:​1806.​03232 (2018)
30.
Zurück zum Zitat Holland, P.W., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps. Soc. Netw. 5(2), 109–137 (1983)MathSciNetCrossRef Holland, P.W., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps. Soc. Netw. 5(2), 109–137 (1983)MathSciNetCrossRef
31.
32.
Zurück zum Zitat Ivashkin, V., Chebotarev, P.: Do logarithmic proximity measures outperform plain ones in graph clustering? In: International Conference on Network Analysis, pp. 87–105. Springer (2016) Ivashkin, V., Chebotarev, P.: Do logarithmic proximity measures outperform plain ones in graph clustering? In: International Conference on Network Analysis, pp. 87–105. Springer (2016)
33.
34.
Zurück zum Zitat Kandola, J., Cristianini, N., Shawe-Taylor, J.S.: Learning semantic similarity. In: Advances in Neural Information Processing Systems, pp. 673–680 (2003) Kandola, J., Cristianini, N., Shawe-Taylor, J.S.: Learning semantic similarity. In: Advances in Neural Information Processing Systems, pp. 673–680 (2003)
35.
Zurück zum Zitat Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)CrossRefMATH Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)CrossRefMATH
36.
Zurück zum Zitat Kirkland, S.J., Neumann, M.: Group Inverses of M-matrices and Their Applications. CRC Press, Boca Raton (2012) Kirkland, S.J., Neumann, M.: Group Inverses of M-matrices and Their Applications. CRC Press, Boca Raton (2012)
37.
Zurück zum Zitat Kivimäki, I., Shimbo, M., Saerens, M.: Developments in the theory of randomized shortest paths with a comparison of graph node distances. Phys. A: Stat. Mech. Appl. 393, 600–616 (2014)CrossRefMATH Kivimäki, I., Shimbo, M., Saerens, M.: Developments in the theory of randomized shortest paths with a comparison of graph node distances. Phys. A: Stat. Mech. Appl. 393, 600–616 (2014)CrossRefMATH
38.
Zurück zum Zitat Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008) Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78(4), 046110 (2008)
39.
Zurück zum Zitat Leleux, P., Courtain, S., Guex, G., Saerens, M.: Sparse randomized shortest paths routing with tsallis divergence regularization. arXiv preprint arXiv:2007.00419 (2020) Leleux, P., Courtain, S., Guex, G., Saerens, M.: Sparse randomized shortest paths routing with tsallis divergence regularization. arXiv preprint arXiv:​2007.​00419 (2020)
41.
Zurück zum Zitat Luxburg, U.V., Radl, A., Hein, M.: Getting lost in space: Large sample analysis of the resistance distance. In: Advances in Neural Information Processing Systems, pp. 2622–2630 (2010) Luxburg, U.V., Radl, A., Hein, M.: Getting lost in space: Large sample analysis of the resistance distance. In: Advances in Neural Information Processing Systems, pp. 2622–2630 (2010)
42.
Zurück zum Zitat MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 14, pp. 281–297. Oakland, CA, USA (1967) MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 14, pp. 281–297. Oakland, CA, USA (1967)
43.
Zurück zum Zitat Mika, S., Ratsch, G., Weston, J., Scholkopf, B., Mullers, K.R.: Fisher discriminant analysis with kernels. In: Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop, pp. 41–48. IEEE (1999) Mika, S., Ratsch, G., Weston, J., Scholkopf, B., Mullers, K.R.: Fisher discriminant analysis with kernels. In: Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop, pp. 41–48. IEEE (1999)
44.
Zurück zum Zitat Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004) Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
45.
Zurück zum Zitat Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab (1999) Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab (1999)
46.
Zurück zum Zitat Pasta, M.Q., Zaidi, F.: Topology of complex networks and performance limitations of community detection algorithms. IEEE Access 5, 10901–10914 (2017)CrossRef Pasta, M.Q., Zaidi, F.: Topology of complex networks and performance limitations of community detection algorithms. IEEE Access 5, 10901–10914 (2017)CrossRef
48.
Zurück zum Zitat Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007) Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007)
49.
Zurück zum Zitat Shawe-Taylor, J., Cristianini, N., et al.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004) Shawe-Taylor, J., Cristianini, N., et al.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
52.
Zurück zum Zitat Van Dongen, S.M.: Graph Clustering by Flow Smulation. Ph.D. thesis, Utrecht University (2000) Van Dongen, S.M.: Graph Clustering by Flow Smulation. Ph.D. thesis, Utrecht University (2000)
55.
Zurück zum Zitat Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering with the sigmoid commute-time kernel: a comparative study. Data Knowl. Eng. 68(3), 338–361 (2009)CrossRef Yen, L., Fouss, F., Decaestecker, C., Francq, P., Saerens, M.: Graph nodes clustering with the sigmoid commute-time kernel: a comparative study. Data Knowl. Eng. 68(3), 338–361 (2009)CrossRef
56.
Zurück zum Zitat Yen, L., Saerens, M., Mantrach, A., Shimbo, M.: A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–793 (2008) Yen, L., Saerens, M., Mantrach, A., Shimbo, M.: A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–793 (2008)
Metadaten
Titel
Dissecting Graph Measure Performance for Node Clustering in LFR Parameter Space
verfasst von
Vladimir Ivashkin
Pavel Chebotarev
Copyright-Jahr
2022
DOI
https://doi.org/10.1007/978-3-030-93409-5_28

Premium Partner