Skip to main content
Erschienen in: Journal of Classification 1/2024

13.11.2023 | Original Research

Model-Based Clustering with Nested Gaussian Clusters

verfasst von: Jason Hou-Liu, Ryan P. Browne

Erschienen in: Journal of Classification | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A dataset may exhibit multiple class labels for each observation; sometimes, these class labels manifest in a hierarchical structure. A textbook analogy would be that a book can be labelled as statistics as well as the encompassing label of non-fiction. To capture this behaviour in a model-based clustering context, we describe a model formulation and estimation procedure for performing clustering with nested Gaussian clusters in orthogonal intrinsic variable subspaces. We elucidate a two-stage clustering model, whereby the observed manifest variables are assumed to be a rotation of intrinsic primary and secondary clustering subspaces with additional noise subspaces. In a hierarchical sense, secondary clusters are presumed to be subclusters of primary clusters and so share Gaussian cluster parameters in the primary cluster subspace. An estimation procedure using the expectation-maximization algorithm is provided, with model selection via Bayesian information criterion. Real-world datasets are evaluated under the proposed model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Biernacki, C., Celeux, G., & Govaert, G. (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Computational Statistics & Data Analysis, 41(3), 561–575. recent Developments in Mixture Model.MathSciNetCrossRef Biernacki, C., Celeux, G., & Govaert, G. (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Computational Statistics & Data Analysis, 41(3), 561–575. recent Developments in Mixture Model.MathSciNetCrossRef
Zurück zum Zitat Bouveyron, C., & Brunet, C. (2012). Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Statistics and Computing, 22(1), 301–324.MathSciNetCrossRef Bouveyron, C., & Brunet, C. (2012). Simultaneous model-based clustering and visualization in the Fisher discriminative subspace. Statistics and Computing, 22(1), 301–324.MathSciNetCrossRef
Zurück zum Zitat van Breukelen, M., Duin, R. (1998). Neural network initialization by combined classifiers. In: Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170), vol 1, pp 215–218 van Breukelen, M., Duin, R. (1998). Neural network initialization by combined classifiers. In: Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170), vol 1, pp 215–218
Zurück zum Zitat van Breukelen, M., Duin, R. P., Tax, D. M., & Den Hartog, J. (1998). Handwritten digit recognition by combined classifiers. Kybernetika, 34(4), 381–386. van Breukelen, M., Duin, R. P., Tax, D. M., & Den Hartog, J. (1998). Handwritten digit recognition by combined classifiers. Kybernetika, 34(4), 381–386.
Zurück zum Zitat Browne, R. P., & McNicholas, P. D. (2014). Estimating common principal components in high dimensions. Advances in Data Analysis and Classification, 8(2), 217–226.MathSciNetCrossRef Browne, R. P., & McNicholas, P. D. (2014). Estimating common principal components in high dimensions. Advances in Data Analysis and Classification, 8(2), 217–226.MathSciNetCrossRef
Zurück zum Zitat Campbell, N. A., & Mahon, R. J. (1974). A multivariate study of variation in two species of rock crab of the genus Leptograpsus. Australian Journal of Zoology, 22(3), 417–425.CrossRef Campbell, N. A., & Mahon, R. J. (1974). A multivariate study of variation in two species of rock crab of the genus Leptograpsus. Australian Journal of Zoology, 22(3), 417–425.CrossRef
Zurück zum Zitat Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.CrossRef Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.CrossRef
Zurück zum Zitat Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22.MathSciNet Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1), 1–22.MathSciNet
Zurück zum Zitat Dua, D., Graff, C. (2017). UCI machine learning repository Dua, D., Graff, C. (2017). UCI machine learning repository
Zurück zum Zitat Forina, M., Armanino, C., Lanteri, S., & Tiscornia, E. (1983). Classification of olive oils from their fatty acid composition. London: Applied Science Publishers. Forina, M., Armanino, C., Lanteri, S., & Tiscornia, E. (1983). Classification of olive oils from their fatty acid composition. London: Applied Science Publishers.
Zurück zum Zitat Galimberti, G., & Soffritti, G. (2007). Model-based methods to identify multiple cluster structures in a data set. Computational Statistics & Data Analysis, 52(1), 520–536.MathSciNetCrossRef Galimberti, G., & Soffritti, G. (2007). Model-based methods to identify multiple cluster structures in a data set. Computational Statistics & Data Analysis, 52(1), 520–536.MathSciNetCrossRef
Zurück zum Zitat Galimberti, G., & Soffritti, G. (2010). Finite mixture models for clustering multilevel data with multiple cluster structures. Statistical Modelling, 10(3), 265–290.MathSciNetCrossRef Galimberti, G., & Soffritti, G. (2010). Finite mixture models for clustering multilevel data with multiple cluster structures. Statistical Modelling, 10(3), 265–290.MathSciNetCrossRef
Zurück zum Zitat Galimberti, G., Manisi, A., & Soffritti, G. (2018). Modelling the role of variables in model-based cluster analysis. Statistics and Computing, 28(1), 145–169.MathSciNetCrossRef Galimberti, G., Manisi, A., & Soffritti, G. (2018). Modelling the role of variables in model-based cluster analysis. Statistics and Computing, 28(1), 145–169.MathSciNetCrossRef
Zurück zum Zitat Holzmann, H., Munk, A., & Gneiting, T. (2006). Identifiability of finite mixtures of elliptical distributions. Scandinavian Journal of Statistics, 33(4), 753–763.MathSciNetCrossRef Holzmann, H., Munk, A., & Gneiting, T. (2006). Identifiability of finite mixtures of elliptical distributions. Scandinavian Journal of Statistics, 33(4), 753–763.MathSciNetCrossRef
Zurück zum Zitat Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRef Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRef
Zurück zum Zitat Jain, A., Duin, R., & Mao, J. (2000). Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 4–37.CrossRef Jain, A., Duin, R., & Mao, J. (2000). Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 4–37.CrossRef
Zurück zum Zitat Kiers, H. A. (2002). Setting up alternating least squares and iterative majorization algorithms for solving various matrix optimization problems. Computational Statistics & Data Analysis, 41(1), 157–170. matrix Computations and Statistics.MathSciNetCrossRef Kiers, H. A. (2002). Setting up alternating least squares and iterative majorization algorithms for solving various matrix optimization problems. Computational Statistics & Data Analysis, 41(1), 157–170. matrix Computations and Statistics.MathSciNetCrossRef
Zurück zum Zitat Lee, J. M. (2012). Smooth manifolds (pp. 1–31). New York, New York, NY: Springer.CrossRef Lee, J. M. (2012). Smooth manifolds (pp. 1–31). New York, New York, NY: Springer.CrossRef
Zurück zum Zitat Lock, R. H. (1993). 1993 new car data. Journal of Statistics Education, 1(1) Lock, R. H. (1993). 1993 new car data. Journal of Statistics Education, 1(1)
Zurück zum Zitat Marbac, M., & Vandewalle, V. (2019). A tractable multi-partitions clustering. Computational Statistics & Data Analysis, 132, 167–179. special Issue on Biostatistics.MathSciNetCrossRef Marbac, M., & Vandewalle, V. (2019). A tractable multi-partitions clustering. Computational Statistics & Data Analysis, 132, 167–179. special Issue on Biostatistics.MathSciNetCrossRef
Zurück zum Zitat McNicholas, P. D., & Murphy, T. B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.MathSciNetCrossRef McNicholas, P. D., & Murphy, T. B. (2008). Parsimonious Gaussian mixture models. Statistics and Computing, 18(3), 285–296.MathSciNetCrossRef
Zurück zum Zitat Scrucca, L., Fop, M., Murphy, T. B., & Raftery, A. E. (2016). mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8(1), 289–317.CrossRef Scrucca, L., Fop, M., Murphy, T. B., & Raftery, A. E. (2016). mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8(1), 289–317.CrossRef
Zurück zum Zitat Teicher, H. (1961). Maximum likelihood characterization of distributions. The Annals of Mathematical Statistics, 32(4), 1214–1222.MathSciNetCrossRef Teicher, H. (1961). Maximum likelihood characterization of distributions. The Annals of Mathematical Statistics, 32(4), 1214–1222.MathSciNetCrossRef
Zurück zum Zitat Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S (4th ed.). New York: Springer. ISBN 0-387-95457-0.CrossRef Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with S (4th ed.). New York: Springer. ISBN 0-387-95457-0.CrossRef
Zurück zum Zitat Yakowitz, S. J., & Spragins, J. D. (1968). On the identifiability of finite mixtures. The Annals of Mathematical Statistics, 39(1), 209–214.MathSciNetCrossRef Yakowitz, S. J., & Spragins, J. D. (1968). On the identifiability of finite mixtures. The Annals of Mathematical Statistics, 39(1), 209–214.MathSciNetCrossRef
Metadaten
Titel
Model-Based Clustering with Nested Gaussian Clusters
verfasst von
Jason Hou-Liu
Ryan P. Browne
Publikationsdatum
13.11.2023
Verlag
Springer US
Erschienen in
Journal of Classification / Ausgabe 1/2024
Print ISSN: 0176-4268
Elektronische ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-023-09453-z

Weitere Artikel der Ausgabe 1/2024

Journal of Classification 1/2024 Zur Ausgabe

Premium Partner