Skip to main content
Erschienen in:

16.07.2019

A Modified k-Means Clustering Procedure for Obtaining a Cardinality-Constrained Centroid Matrix

verfasst von: Naoto Yamashita, Kohei Adachi

Erschienen in: Journal of Classification | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

k-means clustering is a well-known procedure for classifying multivariate observations. The resulting centroid matrix of clusters by variables is noted for interpreting which variables characterize clusters. However, between-clusters differences are not always clearly captured in the centroid matrix. We address this problem by proposing a new procedure for obtaining a centroid matrix, so that it has a number of exactly zero elements. This allows easy interpretation of the matrix, as we may focus on only the nonzero centroids. The development of an iterative algorithm for the constrained minimization is described. A cardinality selection procedure for identifying the optimal cardinality is presented, as well as a modified version of the proposed procedure, in which some restrictions are imposed on the positions of nonzero elements. The behaviors of our proposed procedure were evaluated in simulation studies and are illustrated with three real data examples, which demonstrate that the performances of the procedure is promising.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Adachi, K. (2009). Joint Procrustes analysis for simultaneous nonsingular transformation of component score and loading matrices. Psychometrika, 74, 667–683.MathSciNetCrossRef Adachi, K. (2009). Joint Procrustes analysis for simultaneous nonsingular transformation of component score and loading matrices. Psychometrika, 74, 667–683.MathSciNetCrossRef
Zurück zum Zitat Adachi, K., & Trendafilov, N. T. (2015). Sparse principal component analysis subject to prespecified cardinality of loadings. Computational Statistics, 31, 1–25.MathSciNetMATH Adachi, K., & Trendafilov, N. T. (2015). Sparse principal component analysis subject to prespecified cardinality of loadings. Computational Statistics, 31, 1–25.MathSciNetMATH
Zurück zum Zitat Adachi, K. (2006). Multivariate data analysis. Tokyo: Nakanishiya Shuppan. in Japanese. Adachi, K. (2006). Multivariate data analysis. Tokyo: Nakanishiya Shuppan. in Japanese.
Zurück zum Zitat Adachi, K., & Trendafilov, N. T. (2017). Sparsest factor analysis for clustering variables: a matrix decomposition approach. Advances in Data Analysis and Classification, 25, 1–29.MATH Adachi, K., & Trendafilov, N. T. (2017). Sparsest factor analysis for clustering variables: a matrix decomposition approach. Advances in Data Analysis and Classification, 25, 1–29.MATH
Zurück zum Zitat Aggarwal, C.C., & Reddy, C.K. (2013). Data clustering: algorithms and applications. Boca Raton: CRC Press.CrossRef Aggarwal, C.C., & Reddy, C.K. (2013). Data clustering: algorithms and applications. Boca Raton: CRC Press.CrossRef
Zurück zum Zitat Alsius, A., Wayne, R. V., Paré, M., & Munhall, K. G. (2016). High visual resolution matters in audiovisual speech perception, but only for some, Attention. Perception, & Psychophysics, 78, 1472–1487.CrossRef Alsius, A., Wayne, R. V., Paré, M., & Munhall, K. G. (2016). High visual resolution matters in audiovisual speech perception, but only for some, Attention. Perception, & Psychophysics, 78, 1472–1487.CrossRef
Zurück zum Zitat Basu, S., Davidson, I., & Wagstaff, K. (Eds.). (2008). Constrained clustering: advances in algorithms, theory, and applications. Boca Raton: CRC Press. Basu, S., Davidson, I., & Wagstaff, K. (Eds.). (2008). Constrained clustering: advances in algorithms, theory, and applications. Boca Raton: CRC Press.
Zurück zum Zitat Bock, H. H. (1996). Probabilistic models in cluster analysis. Computational Statistics & Data Analysis, 23, 5–28.CrossRef Bock, H. H. (1996). Probabilistic models in cluster analysis. Computational Statistics & Data Analysis, 23, 5–28.CrossRef
Zurück zum Zitat Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36, 111–150.CrossRef Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36, 111–150.CrossRef
Zurück zum Zitat Brusco, M. J., & Cradit, J. D. (2001). A variable-selection heuristic for K-means clustering. Psychometrika, 66, 249–270.MathSciNetCrossRef Brusco, M. J., & Cradit, J. D. (2001). A variable-selection heuristic for K-means clustering. Psychometrika, 66, 249–270.MathSciNetCrossRef
Zurück zum Zitat Cortina, L. M., & Wasti, S. A. (2005). Profiles in coping: responses to sexual harassment across persons, organizations, and cultures. Journal of Applied Psychology, 90, 182–192.CrossRef Cortina, L. M., & Wasti, S. A. (2005). Profiles in coping: responses to sexual harassment across persons, organizations, and cultures. Journal of Applied Psychology, 90, 182–192.CrossRef
Zurück zum Zitat Dalton, C., Jennings, E., O’dwyer, B., & Taylor, D. (2016). Integrating observed, inferred and simulated data to illuminate environmental change: a limnological case study. Biology and Environment: Proceedings of the Royal Irish Academy, 116, 279–294. Dalton, C., Jennings, E., O’dwyer, B., & Taylor, D. (2016). Integrating observed, inferred and simulated data to illuminate environmental change: a limnological case study. Biology and Environment: Proceedings of the Royal Irish Academy, 116, 279–294.
Zurück zum Zitat DeSarbo, W. S., & Mahajan, V. (1984). Constrained classification: the use of a priori information in cluster analysis. Psychometrika, 49, 187–215.CrossRef DeSarbo, W. S., & Mahajan, V. (1984). Constrained classification: the use of a priori information in cluster analysis. Psychometrika, 49, 187–215.CrossRef
Zurück zum Zitat Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Human Genetics, 7, 179–188. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Human Genetics, 7, 179–188.
Zurück zum Zitat Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78, 553–569.CrossRef Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78, 553–569.CrossRef
Zurück zum Zitat Gordon, A.D. (1973). 359. Note: Classification in the presence of constraints. Biometrics, 29, 821–827.CrossRef Gordon, A.D. (1973). 359. Note: Classification in the presence of constraints. Biometrics, 29, 821–827.CrossRef
Zurück zum Zitat Harman, H. H. (1976). Modern factor analysis, 3rd edn. Chicago: University of Chicago Press.MATH Harman, H. H. (1976). Modern factor analysis, 3rd edn. Chicago: University of Chicago Press.MATH
Zurück zum Zitat Hendrickson, A. E., & White, P. O. (1964). PROMAX: a quick method for rotation to oblique simple structure. British Journal of Mathematical and Statistical Psychology, 17, 65–70.CrossRef Hendrickson, A. E., & White, P. O. (1964). PROMAX: a quick method for rotation to oblique simple structure. British Journal of Mathematical and Statistical Psychology, 17, 65–70.CrossRef
Zurück zum Zitat Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.CrossRef Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.CrossRef
Zurück zum Zitat Hyland, J. J., Jones, D. L., Parkhill, K. A., Barnes, A. P., & Williams, A. P. (2016). Farmers’ perceptions of climate change: identifying types. Agriculture and Human Values, 33, 323–339.CrossRef Hyland, J. J., Jones, D. L., Parkhill, K. A., Barnes, A. P., & Williams, A. P. (2016). Farmers’ perceptions of climate change: identifying types. Agriculture and Human Values, 33, 323–339.CrossRef
Zurück zum Zitat Jetti, S. K., Vendrell-Llopis, N., & Yaksi, E. (2014). Spontaneous activity governs olfactory representations in spatially organized habenular microcircuits. Current Biology, 24, 434–439.CrossRef Jetti, S. K., Vendrell-Llopis, N., & Yaksi, E. (2014). Spontaneous activity governs olfactory representations in spatially organized habenular microcircuits. Current Biology, 24, 434–439.CrossRef
Zurück zum Zitat Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39, 31–36.CrossRef Kaiser, H. F. (1974). An index of factorial simplicity. Psychometrika, 39, 31–36.CrossRef
Zurück zum Zitat Kuerbis, A., Armeli, S., Muench, F., & Morgenstern, J. (2014). Profiles of confidence and commitment to change as predictors of moderated drinking: a person-centered approach. Psychology of Addictive Behaviors, 28, 1065–1076.CrossRef Kuerbis, A., Armeli, S., Muench, F., & Morgenstern, J. (2014). Profiles of confidence and commitment to change as predictors of moderated drinking: a person-centered approach. Psychology of Addictive Behaviors, 28, 1065–1076.CrossRef
Zurück zum Zitat MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol. 1, pp. 281–297. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, vol. 1, pp. 281–297.
Zurück zum Zitat Miyamoto, S., Ichihashi, H., & Honda, K. (2008). Algorithms for fuzzy clustering. Berlin: Springer.MATH Miyamoto, S., Ichihashi, H., & Honda, K. (2008). Algorithms for fuzzy clustering. Berlin: Springer.MATH
Zurück zum Zitat Peng, X., Zhou, C., & Hepburn, D. M. (2013). Application of K-Means method to pattern recognition in on-line cable partial discharge monitoring. IEEE Transactions on Dielectrics and Electrical Insulation, 20, 754–761.CrossRef Peng, X., Zhou, C., & Hepburn, D. M. (2013). Application of K-Means method to pattern recognition in on-line cable partial discharge monitoring. IEEE Transactions on Dielectrics and Electrical Insulation, 20, 754–761.CrossRef
Zurück zum Zitat Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.CrossRef Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66, 846–850.CrossRef
Zurück zum Zitat Satomura, H., & Adachi, K. (2013). Oblique rotation in canonical correlation analysis reformulated as maximizing the generalized coefficient of determination. Psychometrika, 78, 526–573.MathSciNetCrossRef Satomura, H., & Adachi, K. (2013). Oblique rotation in canonical correlation analysis reformulated as maximizing the generalized coefficient of determination. Psychometrika, 78, 526–573.MathSciNetCrossRef
Zurück zum Zitat Schloss, K. B., Hawthorne-Madell, D., & Palmer, S. E. (2015). Ecological influences on individual differences in color preference, Attention. Perception, & Psychophysics, 77, 2803–2816.CrossRef Schloss, K. B., Hawthorne-Madell, D., & Palmer, S. E. (2015). Ecological influences on individual differences in color preference, Attention. Perception, & Psychophysics, 77, 2803–2816.CrossRef
Zurück zum Zitat Slobodenyuk, N., Jraissati, Y., Kanso, A., Ghanem, L., & Elhajj, I. (2015). Cross-modal associations between color and haptics, Attention. Perception, & Psychophysics, 77, 1379–1395.CrossRef Slobodenyuk, N., Jraissati, Y., Kanso, A., Ghanem, L., & Elhajj, I. (2015). Cross-modal associations between color and haptics, Attention. Perception, & Psychophysics, 77, 1379–1395.CrossRef
Zurück zum Zitat Steinley, D. (2006). K-means clustering: a half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 1–34.MathSciNetCrossRef Steinley, D. (2006). K-means clustering: a half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 1–34.MathSciNetCrossRef
Zurück zum Zitat Steinley, D., & Brusco, M. J. (2008). Selection of variables in cluster analysis: an empirical comparison of eight procedures. Psychometrika, 73, 125–144.MathSciNetCrossRef Steinley, D., & Brusco, M. J. (2008). Selection of variables in cluster analysis: an empirical comparison of eight procedures. Psychometrika, 73, 125–144.MathSciNetCrossRef
Zurück zum Zitat Steinley, D., Brusco, M. J., & Hubert, L. (2016). The variance of the adjusted Rand index. Psychological Methods, 21, 261–272.CrossRef Steinley, D., Brusco, M. J., & Hubert, L. (2016). The variance of the adjusted Rand index. Psychological Methods, 21, 261–272.CrossRef
Zurück zum Zitat Steinley, D., & Hubert, L. (2008). Order-constrained solutions in K-means clustering: even better than being globally optimal. Psychometrika, 73, 647–664.MathSciNetCrossRef Steinley, D., & Hubert, L. (2008). Order-constrained solutions in K-means clustering: even better than being globally optimal. Psychometrika, 73, 647–664.MathSciNetCrossRef
Zurück zum Zitat Thurstone, L.L. (1947). Multiple-factor analysis. Chicago: University of Chicago Press.MATH Thurstone, L.L. (1947). Multiple-factor analysis. Chicago: University of Chicago Press.MATH
Zurück zum Zitat Ullman, J. B. (2006). Structural equation modeling: reviewing the basics and moving forward. Journal of Personality Assessment, 87, 33–50.CrossRef Ullman, J. B. (2006). Structural equation modeling: reviewing the basics and moving forward. Journal of Personality Assessment, 87, 33–50.CrossRef
Zurück zum Zitat Yamashita, N. (2012). Canonical correlation analysis formulated as maximizing sum of squared correlations and rotation of structure matrices. The Japanese Journal of Behaviormetrics, 39, 1–9. (in Japanese).MathSciNetCrossRef Yamashita, N. (2012). Canonical correlation analysis formulated as maximizing sum of squared correlations and rotation of structure matrices. The Japanese Journal of Behaviormetrics, 39, 1–9. (in Japanese).MathSciNetCrossRef
Zurück zum Zitat Yamashita, N., & Mayekawa, S. (2015). A new biplot procedure with joint classification of objects and variables by fuzzy c-means clustering. Advances in Data Analysis and Classification, 9, 243—266.MathSciNetCrossRef Yamashita, N., & Mayekawa, S. (2015). A new biplot procedure with joint classification of objects and variables by fuzzy c-means clustering. Advances in Data Analysis and Classification, 9, 243—266.MathSciNetCrossRef
Metadaten
Titel
A Modified k-Means Clustering Procedure for Obtaining a Cardinality-Constrained Centroid Matrix
verfasst von
Naoto Yamashita
Kohei Adachi
Publikationsdatum
16.07.2019
Verlag
Springer US
Erschienen in
Journal of Classification / Ausgabe 2/2020
Print ISSN: 0176-4268
Elektronische ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-019-09324-6