Skip to main content
Erschienen in: Advances in Data Analysis and Classification 4/2022

08.01.2022 | Regular Article

Independence versus indetermination: basis of two canonical clustering criteria

verfasst von: Pierre Bertrand, Michel Broniatowski, Jean-François Marcotorchino

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 4/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper aims at comparing two coupling approaches as basic layers for building clustering criteria, suited for modularizing and clustering very large networks. We briefly use “optimal transport theory” as a starting point, and a way as well, to derive two canonical couplings: “statistical independence” and “logical indetermination”. A symmetric list of properties is provided and notably the so called “Monge’s properties”, applied to contingency matrices, and justifying the \(\otimes \) versus \(\oplus \) notation. A study is proposed, highlighting “logical indetermination”, because it is, by far, lesser known. Eventually we estimate the average difference between both couplings as the key explanation of their usually close results in network clustering.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In 1961 Alan Hoffman (IBM Fellow and US Science Academy member) rediscovered Monges’s observation see Hoffman (1963). Hoffman showed that the Hitchcock-Kantorovich transportation problem can be solved by a very simple approach if its underlying cost matrix satisfies those Monge’s properties.
 
2
Factually this is the method of S. Lloyd(1957) rewritten by E.W. Forgy (1965) which corresponds to the oldest version of the K-means really used.
 
Literatur
Zurück zum Zitat Ah-Pine J (2007) Sur des aspects algébriques et combinatoires de l’analyse relationnelle: applications en classification automatique, en théorie du choix social et en théorie des tresses. Ph.D. thesis, Paris 6 Ah-Pine J (2007) Sur des aspects algébriques et combinatoires de l’analyse relationnelle: applications en classification automatique, en théorie du choix social et en théorie des tresses. Ph.D. thesis, Paris 6
Zurück zum Zitat Ah-Pine J (2010) On aggregating binary relations using 0-1 integer linear programming. In: ISAIM, pp 1–10 Ah-Pine J (2010) On aggregating binary relations using 0-1 integer linear programming. In: ISAIM, pp 1–10
Zurück zum Zitat Asano T, Bhattacharya B, Keil M, Yao F (1988) Clustering algorithms based on minimum and maximum spanning trees. In: proceedings of the fourth annual symposium on Computational Geometry, pp 252–257 Asano T, Bhattacharya B, Keil M, Yao F (1988) Clustering algorithms based on minimum and maximum spanning trees. In: proceedings of the fourth annual symposium on Computational Geometry, pp 252–257
Zurück zum Zitat Bertrand P (2021) Transport optimal, matrices de monge et pont relationnel. Ph.D. thesis, Paris 6 Bertrand P (2021) Transport optimal, matrices de monge et pont relationnel. Ph.D. thesis, Paris 6
Zurück zum Zitat Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):10008CrossRefMATH Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):10008CrossRefMATH
Zurück zum Zitat Campigotto R, Conde-Céspedes P, Guillaume JL (2014) A generalized and adaptive method for community detection. arXiv preprint arXiv:1406.2518 Campigotto R, Conde-Céspedes P, Guillaume JL (2014) A generalized and adaptive method for community detection. arXiv preprint arXiv:​1406.​2518
Zurück zum Zitat Conde-Céspedes P (2013) Modélisations et extensions du formalisme de l’analyse relationnelle mathématique à la modularisation des grands graphes. Ph.D. thesis, Paris 6 Conde-Céspedes P (2013) Modélisations et extensions du formalisme de l’analyse relationnelle mathématique à la modularisation des grands graphes. Ph.D. thesis, Paris 6
Zurück zum Zitat Coron JL (2017) Quelques exemples de jeux à champ moyen. Ph.D. thesis, Paris Sciences et Lettres (ComUE) Coron JL (2017) Quelques exemples de jeux à champ moyen. Ph.D. thesis, Paris Sciences et Lettres (ComUE)
Zurück zum Zitat Csiszár I et al (1991) Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems. Ann Stat 19(4):2032–2066MathSciNetCrossRefMATH Csiszár I et al (1991) Why least squares and maximum entropy? an axiomatic approach to inference for linear inverse problems. Ann Stat 19(4):2032–2066MathSciNetCrossRefMATH
Zurück zum Zitat Doreian P, Batagelj V, Ferligoj A (2020) Advances in network clustering and blockmodeling. John Wiley & Sons, HobokenMATH Doreian P, Batagelj V, Ferligoj A (2020) Advances in network clustering and blockmodeling. John Wiley & Sons, HobokenMATH
Zurück zum Zitat Fortunato S, Barthelemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci 104(1):36–41CrossRef Fortunato S, Barthelemy M (2007) Resolution limit in community detection. Proc Natl Acad Sci 104(1):36–41CrossRef
Zurück zum Zitat Fréchet M (1951) Sur les tableaux de corrélations dont les marges sont données. Annales de lUniversité de Lyon. Sect A 14:53–77 Fréchet M (1951) Sur les tableaux de corrélations dont les marges sont données. Annales de lUniversité de Lyon. Sect A 14:53–77
Zurück zum Zitat Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America p 7821-7826 Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America p 7821-7826
Zurück zum Zitat Hamilton RS et al (1982) Three-manifolds with positive ricci curvature. J Diff Geom 17(2):255–306MathSciNetMATH Hamilton RS et al (1982) Three-manifolds with positive ricci curvature. J Diff Geom 17(2):255–306MathSciNetMATH
Zurück zum Zitat Hoffman AJ (1963) On simple linear programming problems. Proceedings of the Seventh Symposium in Pure Mathematics of the AMS 317–327 Hoffman AJ (1963) On simple linear programming problems. Proceedings of the Seventh Symposium in Pure Mathematics of the AMS 317–327
Zurück zum Zitat Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43CrossRefMATH Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43CrossRefMATH
Zurück zum Zitat Lancichinetti A, Fortunato S (2011) Limits of modularity maximization in community detection. Physical review E 84(6):066122CrossRef Lancichinetti A, Fortunato S (2011) Limits of modularity maximization in community detection. Physical review E 84(6):066122CrossRef
Zurück zum Zitat Marcotorchino JF (1984) Utilisation des comparaisons par paires en statistique des contingences. Publication du Centre Scientifique IBM de Paris et Cahiers du Séminaire Analyse des Données et Processus Stochastiques Université Libre de Bruxelles pp 1–57 Marcotorchino JF (1984) Utilisation des comparaisons par paires en statistique des contingences. Publication du Centre Scientifique IBM de Paris et Cahiers du Séminaire Analyse des Données et Processus Stochastiques Université Libre de Bruxelles pp 1–57
Zurück zum Zitat Marcotorchino JF (1986) Maximal association theory as a tool of research. In: Gaul W, Schader M (eds) Classification as a tool of research. North Holland Amsterdam Marcotorchino JF (1986) Maximal association theory as a tool of research. In: Gaul W, Schader M (eds) Classification as a tool of research. North Holland Amsterdam
Zurück zum Zitat Marcotorchino JF, Conde-Céspedes P (2013) Optimal transport and minimal trade problem, impacts on relational metrics and applications to large graphs and networks modularity. In: international conference on geometric science of information, pp 169–179. Springer Marcotorchino JF, Conde-Céspedes P (2013) Optimal transport and minimal trade problem, impacts on relational metrics and applications to large graphs and networks modularity. In: international conference on geometric science of information, pp 169–179. Springer
Zurück zum Zitat Marcotorchino JF, Michaud P (1979) Optimisation en Analyse Ordinale des, Données. Masson, ParisMATH Marcotorchino JF, Michaud P (1979) Optimisation en Analyse Ordinale des, Données. Masson, ParisMATH
Zurück zum Zitat Messatfa H (1990) Maximal association for the sum of squares of a contingency table. Revue RAIRO, Recherche Opérationnelle 24:29–47MathSciNetMATH Messatfa H (1990) Maximal association for the sum of squares of a contingency table. Revue RAIRO, Recherche Opérationnelle 24:29–47MathSciNetMATH
Zurück zum Zitat Nascimento MC (2014) Community detection in networks via a spectral heuristic based on the clustering coefficient. Disc Appl Math 176:89–99MathSciNetCrossRefMATH Nascimento MC (2014) Community detection in networks via a spectral heuristic based on the clustering coefficient. Disc Appl Math 176:89–99MathSciNetCrossRefMATH
Zurück zum Zitat Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113CrossRef Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113CrossRef
Zurück zum Zitat Ni CC, Lin YY, Luo F, Gao J (2019) Community detection on networks with Ricci flow. Sci Rep 9(1):1–12 Ni CC, Lin YY, Luo F, Gao J (2019) Community detection on networks with Ricci flow. Sci Rep 9(1):1–12
Zurück zum Zitat Opitz O, Paul H (2005) Aggregation of ordinal judgements based on condorcet’s majority rule. Data Analysis and Decision Support. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg Opitz O, Paul H (2005) Aggregation of ordinal judgements based on condorcet’s majority rule. Data Analysis and Decision Support. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg
Zurück zum Zitat Sklar A (1973) Random variables, joint distribution functions, and copulas. Kybernetika 9(6):449–460MathSciNetMATH Sklar A (1973) Random variables, joint distribution functions, and copulas. Kybernetika 9(6):449–460MathSciNetMATH
Zurück zum Zitat Steinhaus H (1957) Sur la division des corps matériels en parties. Bull de lacadéemie Polonaise des Sci 4(12):801–804MATH Steinhaus H (1957) Sur la division des corps matériels en parties. Bull de lacadéemie Polonaise des Sci 4(12):801–804MATH
Zurück zum Zitat Stemmelen E (1977) Tableaux déchanges, description et prévision. Cahiers du Bureau Universitaire de Recherche Opérationnelle 28 Stemmelen E (1977) Tableaux déchanges, description et prévision. Cahiers du Bureau Universitaire de Recherche Opérationnelle 28
Zurück zum Zitat Wilson AG (1967) A statistical theory of spatial distribution models. Transp Res 1:253–269CrossRef Wilson AG (1967) A statistical theory of spatial distribution models. Transp Res 1:253–269CrossRef
Zurück zum Zitat Wilson AG (1969) The use of entropy maximising models. J Transp Econ Policy 3:108–126 Wilson AG (1969) The use of entropy maximising models. J Transp Econ Policy 3:108–126
Metadaten
Titel
Independence versus indetermination: basis of two canonical clustering criteria
verfasst von
Pierre Bertrand
Michel Broniatowski
Jean-François Marcotorchino
Publikationsdatum
08.01.2022
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 4/2022
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-021-00484-1

Weitere Artikel der Ausgabe 4/2022

Advances in Data Analysis and Classification 4/2022 Zur Ausgabe

Premium Partner