Abstract
We propose a novel approach to finding an optimal subspace of multi-dimensional variables for identifying a cluster structure of objects. When some variables are irrelevant to the cluster structure and are correlated between themselves, they are likely to have an adverse effect on clustering of objects. In such situations, the proposed method aims to obtain an optimal subspace for partitioning objects by eliminating the effects of these irrelevant variables. The proposed method can be considered an extension of reduced k-means analysis and factorial k-means analysis for the settings where irrelevant variables are correlated. The proposed method is applied for the analyses of artificial and real data to investigate how it performs as compared to the existing methods.
Similar content being viewed by others
References
Arabie, P. & Hubert, L. (1994). Cluster analysis in marketting research (pp.160–189). In Bagozzi, R.P., editor, Advanced methods of marketing research. Blackwell, Oxford.
Ben-Hur, A. & Guyon, I. (2003). Detecting stable clusters using principal component analysis. In Brownstein, M.J. and Khodursky, A.B. (Eds.) Functional Genomics (pp.159–182). Human Press.
De Soete, G. & Carroll, J.D. (1994). K-means clustering in a low-dimensional Euclidean space. In Diday, E. and Lechevallier, Y. and Schader, M. and Bertrand, P. and Burtschy, B. (Eds.) New Approaches in Classification and Data Analysis (pp.212–219). Springer, Heidelberg
DeSarbo, W.S., Jedidi, K., Cool, K., & Schendel, D. (1990). Simultaneous multidimensional unfolding and cluster analysis: An investigation of strategic groups. Marketing Letters, 2, 129–146.
Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.
Gattone, S.A. & Rocci, R. (2012). Clustering curves on a reduced subspace. Journal of Computational and Graphical Statistics, 21, 361–379.
Hartigan, J.A. & Wong, M.A. (1979). Algorithm AS 136: A K-means clustering algorithm. Journal of the Royal Statistical Society, Series C, 28, 100–108.
Holzinger, K. J. & Swineford, F. (1939). A study in factor analysis: The stability of a bi-factor solution. Supplementary Educational Monographs, No.48. The University of Chicago.
Hubert, L. & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.
James, G.M. & Sugar, C.A. (2003). Clustering of sparsely sampled functional data. Journal of the American Statistical Association, 98, 397–408.
Jennrich, R.I. (2001). A simple general procedure for orthogonal rotation. Psychometrika, 66, 289–306.
Jennrich, R.I. (2002). A simple general procedure for oblique rotation. Psychometrika, 67, 7–20.
Kaiser, H.F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200.
Lloyd, S. (1982). Least squares quantization in pem. IEEE Transactions on Information Theory, 28, 128–137.
MacQueen, J. (1967). Some methods of classification and analysis of multivariate observations. In L.M. Le Cam & J. Neyman (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1, 281–297. Berkeley, CA: University of California Press.
Milligan, G.W. & Cooper, M.C. (1988). A study of standardization of variables in cluster analysis. Journal of Classification, 5, 181–204.
Niu, D., Dy, J.G., & Jordan, M.I. (2011). Dimensionality reduction for spectral clustering. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics 2011, 552–560.
Rocci, R., Gattone, S.A., & Vichi, M. (2011). A new dimension reduction method: Factor discriminant k-means. Journal of Classification, 28, 210–226.
Sun, W., Wang, J., & Fang, Y. (2012). Regularized k-means clustering of high-dimensional data and its asymptotic consistency. Electronic Journal of Statistics, 6, 148–167.
Terada, Y. (2013a). Strong consistency of reduced k-means clustering. arXiv:1212.4942.
Terada, Y. (2013b). Strong consistency of factorial k-means clustering. arXiv:1301.0676.
Timmerman, M.E., Ceulemans, E., Kiers, H.A.L., & Vichi, M. (2010). Factorial and reduced k-means reconsidered. Computational Statistics & Data Analysis, 54, 1858–1871.
Tukey, J.W. (1977). Exploratory data analysis. Addison-Wesley.
Verbeek, J.J. (2004). Mixture models for clustering and dimension reduction. Thesis. University of Amsterdam.
Vidal, R. (2011). Subspace clustering. Signal Processing Magazine, IEEE, 28, 52–68.
Vichi, M. & Kiers H.A.L. (2001). Factorial k-means analysis for two-way data. Computational Statistics & Data Analysis, 37, 49–64.
Wang, J. (2010). Consistent selection of the number of clusters via crossvalidation. Biometrika, 97, 893–904.
Yamamoto, M. (2012). Clustering of functional data in a low-dimensional subspace. Advances in Data Analysis and Classification, 6, 219–247.
Yamamoto, M. & Terada, Y. (2013). Functional factorial k-means analysis. arXiv:1311.0463.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Yamamoto, M., Hwang, H. A General Formulation of Cluster Analysis with Dimension Reduction and Subspace Separation. Behaviormetrika 41, 115–129 (2014). https://doi.org/10.2333/bhmk.41.115
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.2333/bhmk.41.115