Skip to main content
Log in

A General Formulation of Cluster Analysis with Dimension Reduction and Subspace Separation

  • Published:
Behaviormetrika Aims and scope Submit manuscript

Abstract

We propose a novel approach to finding an optimal subspace of multi-dimensional variables for identifying a cluster structure of objects. When some variables are irrelevant to the cluster structure and are correlated between themselves, they are likely to have an adverse effect on clustering of objects. In such situations, the proposed method aims to obtain an optimal subspace for partitioning objects by eliminating the effects of these irrelevant variables. The proposed method can be considered an extension of reduced k-means analysis and factorial k-means analysis for the settings where irrelevant variables are correlated. The proposed method is applied for the analyses of artificial and real data to investigate how it performs as compared to the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arabie, P. & Hubert, L. (1994). Cluster analysis in marketting research (pp.160–189). In Bagozzi, R.P., editor, Advanced methods of marketing research. Blackwell, Oxford.

    Google Scholar 

  • Ben-Hur, A. & Guyon, I. (2003). Detecting stable clusters using principal component analysis. In Brownstein, M.J. and Khodursky, A.B. (Eds.) Functional Genomics (pp.159–182). Human Press.

    Chapter  Google Scholar 

  • De Soete, G. & Carroll, J.D. (1994). K-means clustering in a low-dimensional Euclidean space. In Diday, E. and Lechevallier, Y. and Schader, M. and Bertrand, P. and Burtschy, B. (Eds.) New Approaches in Classification and Data Analysis (pp.212–219). Springer, Heidelberg

    Chapter  Google Scholar 

  • DeSarbo, W.S., Jedidi, K., Cool, K., & Schendel, D. (1990). Simultaneous multidimensional unfolding and cluster analysis: An investigation of strategic groups. Marketing Letters, 2, 129–146.

    Google Scholar 

  • Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.

    Article  Google Scholar 

  • Gattone, S.A. & Rocci, R. (2012). Clustering curves on a reduced subspace. Journal of Computational and Graphical Statistics, 21, 361–379.

    Article  MathSciNet  Google Scholar 

  • Hartigan, J.A. & Wong, M.A. (1979). Algorithm AS 136: A K-means clustering algorithm. Journal of the Royal Statistical Society, Series C, 28, 100–108.

    MATH  Google Scholar 

  • Holzinger, K. J. & Swineford, F. (1939). A study in factor analysis: The stability of a bi-factor solution. Supplementary Educational Monographs, No.48. The University of Chicago.

    Google Scholar 

  • Hubert, L. & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.

    Article  Google Scholar 

  • James, G.M. & Sugar, C.A. (2003). Clustering of sparsely sampled functional data. Journal of the American Statistical Association, 98, 397–408.

    Article  MathSciNet  Google Scholar 

  • Jennrich, R.I. (2001). A simple general procedure for orthogonal rotation. Psychometrika, 66, 289–306.

    Article  MathSciNet  Google Scholar 

  • Jennrich, R.I. (2002). A simple general procedure for oblique rotation. Psychometrika, 67, 7–20.

    Article  MathSciNet  Google Scholar 

  • Kaiser, H.F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200.

    Article  Google Scholar 

  • Lloyd, S. (1982). Least squares quantization in pem. IEEE Transactions on Information Theory, 28, 128–137.

    Article  Google Scholar 

  • MacQueen, J. (1967). Some methods of classification and analysis of multivariate observations. In L.M. Le Cam & J. Neyman (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1, 281–297. Berkeley, CA: University of California Press.

    MathSciNet  MATH  Google Scholar 

  • Milligan, G.W. & Cooper, M.C. (1988). A study of standardization of variables in cluster analysis. Journal of Classification, 5, 181–204.

    Article  MathSciNet  Google Scholar 

  • Niu, D., Dy, J.G., & Jordan, M.I. (2011). Dimensionality reduction for spectral clustering. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics 2011, 552–560.

    Google Scholar 

  • Rocci, R., Gattone, S.A., & Vichi, M. (2011). A new dimension reduction method: Factor discriminant k-means. Journal of Classification, 28, 210–226.

    Article  MathSciNet  Google Scholar 

  • Sun, W., Wang, J., & Fang, Y. (2012). Regularized k-means clustering of high-dimensional data and its asymptotic consistency. Electronic Journal of Statistics, 6, 148–167.

    Article  MathSciNet  Google Scholar 

  • Terada, Y. (2013a). Strong consistency of reduced k-means clustering. arXiv:1212.4942.

    Google Scholar 

  • Terada, Y. (2013b). Strong consistency of factorial k-means clustering. arXiv:1301.0676.

    Google Scholar 

  • Timmerman, M.E., Ceulemans, E., Kiers, H.A.L., & Vichi, M. (2010). Factorial and reduced k-means reconsidered. Computational Statistics & Data Analysis, 54, 1858–1871.

    Article  MathSciNet  Google Scholar 

  • Tukey, J.W. (1977). Exploratory data analysis. Addison-Wesley.

    MATH  Google Scholar 

  • Verbeek, J.J. (2004). Mixture models for clustering and dimension reduction. Thesis. University of Amsterdam.

    Google Scholar 

  • Vidal, R. (2011). Subspace clustering. Signal Processing Magazine, IEEE, 28, 52–68.

    Article  Google Scholar 

  • Vichi, M. & Kiers H.A.L. (2001). Factorial k-means analysis for two-way data. Computational Statistics & Data Analysis, 37, 49–64.

    Article  MathSciNet  Google Scholar 

  • Wang, J. (2010). Consistent selection of the number of clusters via crossvalidation. Biometrika, 97, 893–904.

    Article  MathSciNet  Google Scholar 

  • Yamamoto, M. (2012). Clustering of functional data in a low-dimensional subspace. Advances in Data Analysis and Classification, 6, 219–247.

    Article  MathSciNet  Google Scholar 

  • Yamamoto, M. & Terada, Y. (2013). Functional factorial k-means analysis. arXiv:1311.0463.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michio Yamamoto.

About this article

Cite this article

Yamamoto, M., Hwang, H. A General Formulation of Cluster Analysis with Dimension Reduction and Subspace Separation. Behaviormetrika 41, 115–129 (2014). https://doi.org/10.2333/bhmk.41.115

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.2333/bhmk.41.115

Key Words and Phrases

Navigation