Skip to main content
Top
Published in: Journal of Classification 1/2024

13-11-2023 | Original Research

Model-Based Clustering with Nested Gaussian Clusters

Authors: Jason Hou-Liu, Ryan P. Browne

Published in: Journal of Classification | Issue 1/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A dataset may exhibit multiple class labels for each observation; sometimes, these class labels manifest in a hierarchical structure. A textbook analogy would be that a book can be labelled as statistics as well as the encompassing label of non-fiction. To capture this behaviour in a model-based clustering context, we describe a model formulation and estimation procedure for performing clustering with nested Gaussian clusters in orthogonal intrinsic variable subspaces. We elucidate a two-stage clustering model, whereby the observed manifest variables are assumed to be a rotation of intrinsic primary and secondary clustering subspaces with additional noise subspaces. In a hierarchical sense, secondary clusters are presumed to be subclusters of primary clusters and so share Gaussian cluster parameters in the primary cluster subspace. An estimation procedure using the expectation-maximization algorithm is provided, with model selection via Bayesian information criterion. Real-world datasets are evaluated under the proposed model.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Biernacki, C., Celeux, G., & Govaert, G. (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Computational Statistics & Data Analysis, 41(3), 561–575. recent Developments in Mixture Model.MathSciNetCrossRef Biernacki, C., Celeux, G., & Govaert, G. (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Computational Statistics & Data Analysis, 41(3), 561–575. recent Developments in Mixture Model.MathSciNetCrossRef
go back to reference Bouveyron, C., & Brunet, C. (2012). Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Statistics and Computing, 22(1), 301–324.MathSciNetCrossRef Bouveyron, C., & Brunet, C. (2012). Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Statistics and Computing, 22(1), 301–324.MathSciNetCrossRef
go back to reference van Breukelen, M., Duin, R. (1998). Neural network initialization by combined classifiers. In: Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170), vol 1, pp 215–218 van Breukelen, M., Duin, R. (1998). Neural network initialization by combined classifiers. In: Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170), vol 1, pp 215–218
go back to reference van Breukelen, M., Duin, R. P., Tax, D. M., & Den Hartog, J. (1998). Handwritten digit recognition by combined classifiers. Kybernetika, 34(4), 381–386. van Breukelen, M., Duin, R. P., Tax, D. M., & Den Hartog, J. (1998). Handwritten digit recognition by combined classifiers. Kybernetika, 34(4), 381–386.
go back to reference Browne, R. P., & McNicholas, P. D. (2014). Estimating common principal components in high dimensions. Advances in Data Analysis and Classification, 8(2), 217–226.MathSciNetCrossRef Browne, R. P., & McNicholas, P. D. (2014). Estimating common principal components in high dimensions. Advances in Data Analysis and Classification, 8(2), 217–226.MathSciNetCrossRef
go back to reference Campbell, N. A., & Mahon, R. J. (1974). A multivariate study of variation in two species of rock crab of the genus Leptograpsus. Australian Journal of Zoology, 22(3), 417–425.CrossRef Campbell, N. A., & Mahon, R. J. (1974). A multivariate study of variation in two species of rock crab of the genus Leptograpsus. Australian Journal of Zoology, 22(3), 417–425.CrossRef
go back to reference Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.CrossRef Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.CrossRef
go back to reference Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22.MathSciNet Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22.MathSciNet
go back to reference Dua, D., Graff, C. (2017). UCI machine learning repository Dua, D., Graff, C. (2017). UCI machine learning repository
go back to reference Forina, M., Armanino, C., Lanteri, S., & Tiscornia, E. (1983). Classification of olive oils from their fatty acid composition. London: Applied Science Publishers. Forina, M., Armanino, C., Lanteri, S., & Tiscornia, E. (1983). Classification of olive oils from their fatty acid composition. London: Applied Science Publishers.
go back to reference Galimberti, G., & Soffritti, G. (2007). Model-based methods to identify multiple cluster structures in a data set. Computational Statistics & Data Analysis, 52(1), 520–536.MathSciNetCrossRef Galimberti, G., & Soffritti, G. (2007). Model-based methods to identify multiple cluster structures in a data set. Computational Statistics & Data Analysis, 52(1), 520–536.MathSciNetCrossRef
go back to reference Galimberti, G., & Soffritti, G. (2010). Finite mixture models for clustering multilevel data with multiple cluster structures. Statistical Modelling, 10(3), 265–290.MathSciNetCrossRef Galimberti, G., & Soffritti, G. (2010). Finite mixture models for clustering multilevel data with multiple cluster structures. Statistical Modelling, 10(3), 265–290.MathSciNetCrossRef
go back to reference Galimberti, G., Manisi, A., & Soffritti, G. (2018). Modelling the role of variables in model-based cluster analysis. Statistics and Computing, 28(1), 145–169.MathSciNetCrossRef Galimberti, G., Manisi, A., & Soffritti, G. (2018). Modelling the role of variables in model-based cluster analysis. Statistics and Computing, 28(1), 145–169.MathSciNetCrossRef
go back to reference Holzmann, H., Munk, A., & Gneiting, T. (2006). Identifiability of finite mixtures of elliptical distributions. Scandinavian Journal of Statistics, 33(4), 753–763.MathSciNetCrossRef Holzmann, H., Munk, A., & Gneiting, T. (2006). Identifiability of finite mixtures of elliptical distributions. Scandinavian Journal of Statistics, 33(4), 753–763.MathSciNetCrossRef
go back to reference Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRef Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRef
go back to reference Jain, A., Duin, R., & Mao, J. (2000). Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 4–37.CrossRef Jain, A., Duin, R., & Mao, J. (2000). Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 4–37.CrossRef
go back to reference Kiers, H. A. (2002). Setting up alternating least squares and iterative majorization algorithms for solving various matrix optimization problems. Computational Statistics & Data Analysis, 41(1), 157–170. matrix Computations and Statistics.MathSciNetCrossRef Kiers, H. A. (2002). Setting up alternating least squares and iterative majorization algorithms for solving various matrix optimization problems. Computational Statistics & Data Analysis, 41(1), 157–170. matrix Computations and Statistics.MathSciNetCrossRef
go back to reference Lee, J. M. (2012). Smooth manifolds (pp. 1–31). New York, New York, NY: Springer.CrossRef Lee, J. M. (2012). Smooth manifolds (pp. 1–31). New York, New York, NY: Springer.CrossRef
go back to reference Lock, R. H. (1993). 1993 new car data. Journal of Statistics Education, 1(1) Lock, R. H. (1993). 1993 new car data. Journal of Statistics Education, 1(1)
go back to reference Marbac, M., & Vandewalle, V. (2019). A tractable multi-partitions clustering. Computational Statistics & Data Analysis, 132, 167–179. special Issue on Biostatistics.MathSciNetCrossRef Marbac, M., & Vandewalle, V. (2019). A tractable multi-partitions clustering. Computational Statistics & Data Analysis, 132, 167–179. special Issue on Biostatistics.MathSciNetCrossRef
go back to reference McNicholas, P. D., & Murphy, T. B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.MathSciNetCrossRef McNicholas, P. D., & Murphy, T. B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.MathSciNetCrossRef
go back to reference Scrucca, L., Fop, M., Murphy, T. B., & Raftery, A. E. (2016). mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8(1), 289–317.CrossRef Scrucca, L., Fop, M., Murphy, T. B., & Raftery, A. E. (2016). mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8(1), 289–317.CrossRef
go back to reference Teicher, H. (1961). Maximum likelihood characterization of distributions. The Annals of Mathematical Statistics, 32(4), 1214–1222.MathSciNetCrossRef Teicher, H. (1961). Maximum likelihood characterization of distributions. The Annals of Mathematical Statistics, 32(4), 1214–1222.MathSciNetCrossRef
go back to reference Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S (4th ed.). New York: Springer. ISBN 0-387-95457-0.CrossRef Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S (4th ed.). New York: Springer. ISBN 0-387-95457-0.CrossRef
go back to reference Yakowitz, S. J., & Spragins, J. D. (1968). On the identifiability of finite mixtures. The Annals of Mathematical Statistics, 39(1), 209–214.MathSciNetCrossRef Yakowitz, S. J., & Spragins, J. D. (1968). On the identifiability of finite mixtures. The Annals of Mathematical Statistics, 39(1), 209–214.MathSciNetCrossRef
Metadata
Title
Model-Based Clustering with Nested Gaussian Clusters
Authors
Jason Hou-Liu
Ryan P. Browne
Publication date
13-11-2023
Publisher
Springer US
Published in
Journal of Classification / Issue 1/2024
Print ISSN: 0176-4268
Electronic ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-023-09453-z

Other articles of this Issue 1/2024

Journal of Classification 1/2024 Go to the issue

Premium Partner