Skip to main content
Erschienen in: Advances in Data Analysis and Classification 1/2019

09.10.2018 | Regular Article

sARI: a soft agreement measure for class partitions incorporating assignment probabilities

verfasst von: Abby Flynt, Nema Dean, Rebecca Nugent

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 1/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Agreement indices are commonly used to summarize the performance of both classification and clustering methods. The easy interpretation/intuition and desirable properties that result from the Rand and adjusted Rand indices, has led to their popularity over other available indices. While more algorithmic clustering approaches like k-means and hierarchical clustering produce hard partition assignments (assigning observations to a single cluster), other techniques like model-based clustering include information about the certainty of allocation of objects through class membership probabilities (soft partitions). To assess performance using traditional indices, e.g., the adjusted Rand index (ARI), the soft partition is mapped to a hard set of assignments, which commonly overstates the certainty of correct assignments. This paper proposes an extension of the ARI, the soft adjusted Rand index (sARI), with similar intuition and interpretation but also incorporating information from one or two soft partitions. It can be used in conjunction with the ARI, comparing the similarities of hard to soft, or soft to soft partitions to the similarities of the mapped hard partitions. Simulation study results support the intuition that in general, mapping to hard partitions tends to increase the measure of similarity between partitions. In applications, the sARI more accurately reflects the cluster boundary overlap commonly seen in real data.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Amodio S, D’Ambrosio A, Iorio C, Siciliano R (2015) Adjusted concordance index, an extension of the adjusted rand index to fuzzy partitions. arXiv preprint arXiv:1509.00803 Amodio S, D’Ambrosio A, Iorio C, Siciliano R (2015) Adjusted concordance index, an extension of the adjusted rand index to fuzzy partitions. arXiv preprint arXiv:​1509.​00803
Zurück zum Zitat Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3): 803–821 Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3): 803–821
Zurück zum Zitat Bezdek JC (1981) Objective function clustering. In: Pattern recognition with fuzzy objective function algorithms. Springer, Boston, MA, pp 43–93 Bezdek JC (1981) Objective function clustering. In: Pattern recognition with fuzzy objective function algorithms. Springer, Boston, MA, pp 43–93
Zurück zum Zitat Campello RJGB (2007) A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognit Lett 28:833–841CrossRef Campello RJGB (2007) A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognit Lett 28:833–841CrossRef
Zurück zum Zitat Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38MathSciNetMATH Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–38MathSciNetMATH
Zurück zum Zitat Downton M, Brennan T (1980) Comparing classifications: an evaluation of several coefficients of partition agreement. Classif Soc Bull 4(4):53–54 Downton M, Brennan T (1980) Comparing classifications: an evaluation of several coefficients of partition agreement. Classif Soc Bull 4(4):53–54
Zurück zum Zitat Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57MathSciNetCrossRefMATH Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57MathSciNetCrossRefMATH
Zurück zum Zitat Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. J Am Stat Assoc 78(383):553–569CrossRefMATH Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. J Am Stat Assoc 78(383):553–569CrossRefMATH
Zurück zum Zitat Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MathSciNetCrossRefMATH Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MathSciNetCrossRefMATH
Zurück zum Zitat Fraley C, Raftery AE (2007) Model-based methods of classification: using the mclust software in chemometrics. J Stat Softw 18(6):1–13CrossRef Fraley C, Raftery AE (2007) Model-based methods of classification: using the mclust software in chemometrics. J Stat Softw 18(6):1–13CrossRef
Zurück zum Zitat Hartigan JA (1975) Clustering algorithms. Wiley, New YorkMATH Hartigan JA (1975) Clustering algorithms. Wiley, New YorkMATH
Zurück zum Zitat Huellermeyer E, Rifqi M, Henzgen S, Senge R (2012) Comparing fuzzy partitions: a generalization of the Rand index and related measures. IEEE Trans Fuzzy Syst 20(3):546–556CrossRef Huellermeyer E, Rifqi M, Henzgen S, Senge R (2012) Comparing fuzzy partitions: a generalization of the Rand index and related measures. IEEE Trans Fuzzy Syst 20(3):546–556CrossRef
Zurück zum Zitat Jaccard P (1901) Étude comparative de la distribution florale dans une portion des alpes et du jura. Bull de la Société Vaudoise des Sciences Naturelles 37(142):547–579 Jaccard P (1901) Étude comparative de la distribution florale dans une portion des alpes et du jura. Bull de la Société Vaudoise des Sciences Naturelles 37(142):547–579
Zurück zum Zitat MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA 1:281–297 MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA 1:281–297
Zurück zum Zitat McLachlan G, Krishnan T (2007) The EM algorithm and extensions, vol 382. Wiley, New YorkMATH McLachlan G, Krishnan T (2007) The EM algorithm and extensions, vol 382. Wiley, New YorkMATH
Zurück zum Zitat McLachlan G, Peel D (2004) Finite mixture models. Wiley, New YorkMATH McLachlan G, Peel D (2004) Finite mixture models. Wiley, New YorkMATH
Zurück zum Zitat Miyamoto S, Ichihashi H, Honda K (2008) Algorithms for fuzzy clustering. Springer, BerlinMATH Miyamoto S, Ichihashi H, Honda K (2008) Algorithms for fuzzy clustering. Springer, BerlinMATH
Zurück zum Zitat Morey LC, Agresti A (1984) The measurement of classification agreement: an adjustment to the Rand statistic for chance agreement. Educ Psychol Meas 44(1):33–37CrossRef Morey LC, Agresti A (1984) The measurement of classification agreement: an adjustment to the Rand statistic for chance agreement. Educ Psychol Meas 44(1):33–37CrossRef
Zurück zum Zitat R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria
Zurück zum Zitat Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850CrossRef Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850CrossRef
Zurück zum Zitat Scrucca L, Fop M, Murphy TB, Raftery AE (2016) mclust 5: Clustering, classification and density estimation using gaussian finite mixture models. R J 8(1):289CrossRef Scrucca L, Fop M, Murphy TB, Raftery AE (2016) mclust 5: Clustering, classification and density estimation using gaussian finite mixture models. R J 8(1):289CrossRef
Zurück zum Zitat Steinley D (2004) Properties of the Hubert–Arabie adjusted Rand index. Psychol Methods 9(3):386CrossRef Steinley D (2004) Properties of the Hubert–Arabie adjusted Rand index. Psychol Methods 9(3):386CrossRef
Zurück zum Zitat Wolfe JH (1963) Object cluster analysis of social areas. Ph.D. thesis, University of California Wolfe JH (1963) Object cluster analysis of social areas. Ph.D. thesis, University of California
Metadaten
Titel
sARI: a soft agreement measure for class partitions incorporating assignment probabilities
verfasst von
Abby Flynt
Nema Dean
Rebecca Nugent
Publikationsdatum
09.10.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 1/2019
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-018-0346-x

Weitere Artikel der Ausgabe 1/2019

Advances in Data Analysis and Classification 1/2019 Zur Ausgabe