Variable selection for model-based clustering using the integrated complete-data likelihood

Marbac, Matthieu; Sedki, Mohammed

doi:10.1007/s11222-016-9670-1

Variable selection for model-based clustering using the integrated complete-data likelihood

Published: 23 May 2016

Volume 27, pages 1049–1063, (2017)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Matthieu Marbac¹ &
Mohammed Sedki²

2323 Accesses
35 Citations
1 Altmetric
Explore all metrics

Abstract

Variable selection in cluster analysis is important yet challenging. It can be achieved by regularization methods, which realize a trade-off between the clustering accuracy and the number of selected variables by using a lasso-type penalty. However, the calibration of the penalty term can suffer from criticisms. Model selection methods are an efficient alternative, yet they require a difficult optimization of an information criterion which involves combinatorial problems. First, most of these optimization algorithms are based on a suboptimal procedure (e.g. stepwise method). Second, the algorithms are often computationally expensive because they need multiple calls of EM algorithms. Here we propose to use a new information criterion based on the integrated complete-data likelihood. It does not require the maximum likelihood estimate and its maximization appears to be simple and computationally efficient. The original contribution of our approach is to perform the model selection without requiring any parameter estimation. Then, parameter inference is needed only for the unique selected model. This approach is used for the variable selection of a Gaussian mixture model with conditional independence assumed. The numerical experiments on simulated and benchmark datasets show that the proposed method often outperforms two classical approaches for variable selection. The proposed approach is implemented in the R package VarSelLCM available on CRAN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing the selection of a model-based clustering with external categorical variables

Article 06 June 2014

Modelling the role of variables in model-based cluster analysis

Article 12 January 2017

Unifying data units and models in (co-)clustering

Article 25 May 2018

References

Bertoletti, M., Friel, N., Rastelli, R.: Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion. METRON 73(2), 177–199 (2015)
Article MathSciNet MATH Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. Pattern Anal. Mach. Intell. IEEE Trans. 22(7), 719–725 (2000)
Article Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: Exact and Monte Carlo calculations of integrated likelihoods for the latent class model. J. Stat. Plan. Inference 140(11), 2991–3002 (2010)
Article MathSciNet MATH Google Scholar
Celeux, G., Govaert, G.: Clustering criteria for discrete data and latent class models. J. Classif. 8(2), 157–176 (1991)
Article MATH Google Scholar
Celeux, G., Martin-Magniette, M., Maugis-Rabusseau, C., Raftery, A.: Comparing model selection and regularization approaches to variable selection in model-based clustering. Journal de la Societe francaise de statistique 155(2), 57 (2014)
MathSciNet MATH Google Scholar
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Flury, B., Riedwyl, H.: Multivariate Statistics: A Practical Approach. Chapman and Hall, London (1988)
Book MATH Google Scholar
Forina, M., et al.: PARVUS an extendible package for data exploration, classification and correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Genoa, Italy (1991)
Friedman, J., Meulman, J.: Clustering objects on subsets of attributes (with discussion). J. R. Stat. Soc. Ser. B (Stat. Methodol.) 66(4), 815–849 (2004)
Article MathSciNet MATH Google Scholar
Friel, N., Wyse, J.: Estimating the evidence-a review. Stat. Neerl. 66(3), 288–308 (2012)
Article MathSciNet Google Scholar
Golub, T., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Article Google Scholar
Govaert, G.: Data Analysis. ISTE Wiley, New York (2009)
Book MATH Google Scholar
Green, P.: On use of the EM for penalized likelihood estimation. J. R. Stat. Soc. Ser. B (Methodol.) 52(3), 443–452 (1990)
MathSciNet MATH Google Scholar
Hand, D., Keming, Y.: Idiot’s Bayes, not so stupid after all? Int. Stat. Rev. 69(3), 385–398 (2001)
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979)
MATH Google Scholar
Haughton, D.: On the choice of a model to fit data from an exponential family. Ann. Stat. 16(1), 342–355 (1988)
Article MathSciNet MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Article MATH Google Scholar
Keribin, C.: Consistent estimation of the order of mixture models. Sankhyā, 49–66 (2000)
Maugis, C., Celeux, G., Martin-Magniette, M.: Variable selection for clustering with Gaussian mixture models. Biometrics 65(3), 701–709 (2009a)
Article MathSciNet MATH Google Scholar
Maugis, C., Celeux, G., Martin-Magniette, M.: Variable selection in model-based clustering: a general variable role modeling. Comput. Stat. Data Anal. 53(11), 3872–3882 (2009b)
Article MathSciNet MATH Google Scholar
Moustaki, I., Papageorgiou, I.: Latent class models for mixed variables with applications in Archaeometry. Comput. Stat. Data Anal. 48(3), 659–675 (2005)
Article MathSciNet MATH Google Scholar
Pan, W., Shen, X.: Penalized model-based clustering with application to variable selection. J. Mach. Learn. Res. 8, 1145–1164 (2007)
MATH Google Scholar
Raftery, A., Dean, N.: Variable selection for model-based clustering. J. Am. Stat. assoc. 101(473), 168–178 (2006)
Article MathSciNet MATH Google Scholar
Robert, C.: The Bayesian choice: from decision-theoretic foundations to computational implementation. Springer, New York (2007)
MATH Google Scholar
Rusakov, D., Geiger, D.: Asymptotic model selection for Naive Bayesian networks. J. Mach. Learn. Res. 6, 1–35 (2005)
MathSciNet MATH Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)
Article MathSciNet MATH Google Scholar
Scrucca L., Raftery, A. E.: clustvarsel: A package implementing variable selection for model-based clustering in R. Pre-print available at http://arxiv.org/abs/1411.0606 (2015)
Street, W., Wolberg, W., Mangasarian, O.: Nuclear feature extraction for breast tumor diagnosis. IST/SPIE 1993 international symposium on electronic imaging. Sci. Technol. 1905, 861–870 (1993)
Google Scholar
Streuli, H.: Der heutige stand der kaffeechemie. In: Association Scientifique International du Cafe, 6th International Colloquium on Coffee Chemisrty, pp. 61–72 (1973)
Tadesse, M., Sha, N., Vannucci, M.: Bayesian variable selection in clustering high-dimensional data. J. Am. Stat. Assoc. 100(470), 602–617 (2005)
Article MathSciNet MATH Google Scholar
White, A., Wyse, J., Murphy, T.B.: Bayesian variable selection for latent class analysis using a collapsed Gibbs sampler. Stat. Comput., 1–17 (2014)
Witten, D., Tibshirani, R.: A framework for feature selection in clustering. J. Am. Stat. Assoc. 105(490), 713–726 (2010)
Article MathSciNet MATH Google Scholar
Witten, D., Tibshirani, R.: sparcl: Perform sparse hierarchical clustering and sparse k-means clustering. R package version 1, 3 (2013)
Google Scholar

Download references

Acknowledgments

The authors are grateful to Gilles Celeux, Paul Diver and Jean-Michel Marin for their leading comments.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, McMaster Univeristy, Hamilton, Canada
Matthieu Marbac
INSERM U1181 and University of Paris Sud, Orsay, France
Mohammed Sedki

Authors

Matthieu Marbac
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Sedki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthieu Marbac.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 510 KB)

Appendices

Appendix 1: Consistency of the MICL criterion

This section is devoted to the proof of consistency of our $\text {MICL}$ criterion with a fixed number of components. The first part deals with non-nested models and requires a bias-entropy compensation assumption. The second part covers the nested models, i.e, when the competing model contains the true model. In what follows, we consider the true model $\varvec{m}^{(0)} = \big (g^{(0)}, \varvec{\omega }^{(0)}\big )$, its set of relevant variables is ${\varOmega }^{(0)} = \left\{ j : \omega ^{(0)}_j = 1 \right\} $ and the parameter is $\varvec{\theta }^{(0)}$.

Case of non-nested modelWe need to introduce the entropy notation given by

$$\begin{aligned} \xi \big (\varvec{\theta }; \mathbf {z}, \varvec{m}\big ) = \sum _{i = 1}^n \sum _{k = 1}^g z_{ik} \ln \tau _{ik}\big (\varvec{\theta }\mid \varvec{m}\big ), \end{aligned}$$

where $\tau _{ik}\big (\varvec{\theta }\mid \varvec{m}\big ) = \dfrac{\tau _k \phi \big (\varvec{x}_i \mid \theta _k, \varvec{m}\big )}{\sum _h^g\tau _h \phi \big (\varvec{x}_i \mid \theta _h, \varvec{m}\big )}.$

Proposition 1

Assume that $\varvec{m}^{(1)}$ is a model such that $\varvec{m}^{(0)}$ is a non-nested model within $\varvec{m}^{(1)}$. Assume that

$$\begin{aligned}&- \mathbb {E}\left[ \ln \dfrac{\sum _{k = 1}^{g^{(0)}}\tau _k \prod _{j = 1}^d\phi \big (x_{1j} \mid \mu ^{(0)}_{kj}, \sigma ^{(0)2}_{kj}\big )\mathbbm {1}_{G^{(0)}_k}\big (\varvec{x}_1\big )}{p\big ( \varvec{x}_1 \mid \varvec{\theta }^{(0)},\varvec{m}^{(0)}\big )}\right] \nonumber \\&\quad \le \mathbf {KL}\Big [\varvec{m}^{(0)}||\varvec{m}^{(1)}\Big ], \end{aligned}$$

(22)

where $\mathbf {KL}\Big [\varvec{m}^{(0)}||\varvec{m}^{(1)}\Big ]$ is the Kullback-Leibler divergence of $p\big (\cdot \mid \varvec{\theta }^{(0)},\varvec{m}^{(0)}\big )$ from $p\big (\cdot \mid \varvec{\theta }^{(1)},\varvec{m}^{(1)}\big )$ and

$$\begin{aligned} G^{(0)}_k = \left\{ x \in \mathbb R^d : k = \underset{1 \le h \le g^{(0)}}{\text {argmax}}\, \tau _h \prod _{j = 1}^d\phi \big (x_{1j} \mid \mu ^{(0)}_{hj}, \sigma ^{2 (0)}_{hj}\big )\right\} . \end{aligned}$$

When $n \rightarrow \infty $, we have

$$\begin{aligned} \mathbb {P}\bigg (\text {MICL}\big (\varvec{m}^{(1)}\big ) > \text {MICL}\big (\varvec{m}^{(0)}\big )\bigg ) \longrightarrow 0. \end{aligned}$$

Proof

For any model $\varvec{m}$, we have the following inequalities,

$$\begin{aligned} \text {ICL}\big (\varvec{m}\big ) \le \text {MICL}\big (\varvec{m}\big ) \le \ln p\big (\mathbf {x}\mid \varvec{m}\big ). \end{aligned}$$

It follows,

$$\begin{aligned}&\mathbb {P}\bigg \{\text {MICL}\big (\varvec{m}^{(1)}\big ) - \text {MICL}\big (\varvec{m}^{(0)}\big )> 0\bigg \}\\&\quad \le \mathbb {P}\bigg \{\ln p\big (\mathbf {x}\mid \varvec{m}^{(1)}\big ) - \text {ICL}\big (\varvec{m}^{(0)}\big ) > 0\bigg \}. \end{aligned}$$

Now set ${\varDelta }\nu = \nu ^{(1)} - \nu ^{(0)}$ where $\nu ^{(1)}$ and $\nu ^{(0)}$ are the numbers of free parameters in the models $\varvec{m}^{(1)}$ and $\varvec{m}^{(0)}$ respectively. Using Laplace’s approximation, we have

$$\begin{aligned} \text {ICL}\big (\varvec{m}^{(0)}\big )= & {} \ln p\left( \mathbf {x}\mid \widehat{\varvec{\theta }}^{(0)}, \varvec{m}^{(0)}\right) + \xi \left( \widehat{\varvec{\theta }}^{(0)}; \widehat{\mathbf {z}}^{(0)}, \varvec{m}^{(0)}\right) \nonumber \\&- \dfrac{\nu ^{(0)}}{2} \ln n+ \mathcal {O}_p(1), \end{aligned}$$

where $\widehat{\varvec{\theta }}^{(0)}$ and $\widehat{\mathbf {z}}^{(0)}$ are respectively the MLE and the partition given by the corresponding MAP rule. In the same way, we have

$$\begin{aligned} \ln p\big (\mathbf {x}\mid \varvec{m}^{(1)}\big ) = \ln p\big (\mathbf {x}\mid \widehat{\varvec{\theta }}^{(1)}, \varvec{m}^{(1)}\big ) - \dfrac{\nu ^{(1)}}{2} \ln n + \mathcal {O}_p(1), \end{aligned}$$

where $\widehat{\varvec{\theta }}^{(1)}$ is the MLE of $\varvec{\theta }^{(1)}$. Note that

$$\begin{aligned}&\ln p\big (\mathbf {x}\mid \varvec{m}^{(1)}\big ) - \text {ICL}\big (\varvec{m}^{(0)}\big )\\&= \dfrac{A_n}{2} + n B_n - \dfrac{{\varDelta }\nu }{2} \ln n +\, \mathcal {O}_p(1), \end{aligned}$$

where

$$\begin{aligned} A_n = 2 \ln \dfrac{p\left( \mathbf {x}\mid \widehat{\varvec{\theta }}^{(1)},\varvec{m}^{(1)}\right) }{ p\left( \mathbf {x}\mid \varvec{\theta }^{(1)}, \varvec{m}^{(1)}\right) } - 2 \ln \dfrac{p\left( \mathbf {x}\mid \widehat{\varvec{\theta }}^{(0)}, \varvec{m}^{(0)}\right) }{p\left( \mathbf {x}\mid \varvec{\theta }^{(0)}, \varvec{m}^{(0)}\right) }, \end{aligned}$$

and

$$\begin{aligned} B_n = \dfrac{1}{n} \ln \dfrac{p\left( \mathbf {x}\mid \varvec{\theta }^{(1)},\varvec{m}^{(1)}\right) }{p\left( \mathbf {x}\mid \varvec{\theta }^{(0)},\varvec{m}^{(0)}\right) } - \dfrac{1}{n}\xi \left( \widehat{\varvec{\theta }}^{(0)}; \widehat{\mathbf {z}}^{(0)},\varvec{m}^{(0)}\right) . \end{aligned}$$

When $n \rightarrow \infty $, we have $A_n \rightarrow \chi ^2_{{\varDelta }\nu }$ in distribution and $B_n$ tends to

$$\begin{aligned}&-\mathbf {KL}\Big [\varvec{m}^{(0)}||\varvec{m}^{(1)}\Big ]\\&- \mathbb {E}\left[ \ln \dfrac{\sum _{k = 1}^{g^{(0)}}\tau _k \prod _{j = 1}^d\phi \big (x_{1j} \mid \mu ^{(0)}_{kj}, \sigma ^{(0)2}_{kj}\big )\mathbbm {1}_{G^{(0)}_k}\big (\varvec{x}_1\big )}{p\big ( \varvec{x}_1 \mid \varvec{\theta }^{(0)},\varvec{m}^{(0)}\big )}\right] \end{aligned}$$

in probability. Thus, under the assumption (22), $\text {MICL}$ is consistent since when $n \rightarrow \infty $, we have

$$\begin{aligned}&\mathbb {P}\bigg \{\text {MICL}\big (\varvec{m}^{(1)}\big ) - \text {MICL}\big (\varvec{m}^{(0)}\big )> 0 \bigg \}\\&\quad \le \mathbb {P}\bigg [A_n + \mathcal {O}_p(1)> {\varDelta }\nu \ln n\bigg ] + \mathbb {P}\bigg [B_n > 0 \bigg ]\longrightarrow 0. \end{aligned}$$

Case of nested model Recall that $\text {MICL}\big (\varvec{m}^{(0)}\big ) = \ln p\big (\mathbf {x}, \mathbf {z}^{(0)}\mid \varvec{m}^{(0)}\big )$, where $\mathbf {z}^{(0)} = \underset{\mathbf {z}}{\text {argmax}} \ln p\big (\mathbf {x}, \mathbf {z}\mid \varvec{m}^{(0)}\big )$. We have

$$\begin{aligned} \mathbf {z}^{(0)} {=} \underset{\mathbf {z}}{\text {argmax}}\Big \{\ln p\left( \mathbf {z}\mid g^{(0)}\right) {+} \underset{j \in {\varOmega }_0}{\sum }\ln p\big (\mathbf {x}_{\bullet j} \mid \omega _j^{(0)}, g^{(0)}, \mathbf {z}\big ) \Big \}, \end{aligned}$$

where ${\varOmega }_0 = \left\{ j : \omega ^{(0)}_j = 1\right\} $. Let $\varvec{m}^{(1)} = \left( g^{(0)}, {\varOmega }_1\right) $ where ${\varOmega }_1 = {\varOmega }_0 \cup {\varOmega }_{01}$ and $ {\varOmega }_{01} = \left\{ j : \omega _j^{(1)} = 1, \omega _j^{(0)} = 0 \right\} $. Then, in the same way, we have $\text {MICL}\big (\varvec{m}^{(1)}\big ) = \ln p\big (\mathbf {x}, \mathbf {z}^{(1)}\!\mid \varvec{m}^{(1)}\big )$, where

$$\begin{aligned} \mathbf {z}^{(1)} {=} \underset{\mathbf {z}}{\text {argmax}}\left[ \ln p(\mathbf {z}\mid g^{(0)}) {+} \underset{j \in {\varOmega }_1}{\sum }\ln p\left( \mathbf {x}_{\bullet j} {\mid } \omega _j^{(1)}, g^{(0)}, \mathbf {z}\right) \right] . \end{aligned}$$

Let $j \in {\varOmega }_{01}$, Laplace’s approximation gives us,

$$\begin{aligned} \ln p\left( \mathbf {x}_{\bullet j} \mid \omega _j^{(1)}, g^{(0)}, \mathbf {z}\right)= & {} \sum _{i =1}^n \sum _{k = 1}^g z_{ik}\ln \phi \left( x_{ij} \mid \tilde{\mu }^{(1)}_{kj}, \tilde{\sigma }^{(1)2}_{kj}\right) \nonumber \\&- g^{(0)} \ln n + \mathcal {O}_p(1), \end{aligned}$$

where

$$\begin{aligned} \left( \tilde{\mu }^{(1)}_{kj}, \tilde{\sigma }^{(1)2}_{kj}\right) = \underset{\mu ^{(1)}_{kj}, \sigma ^{(1)2}_{kj}}{\text {argmax}} \sum _{i =1}^n z_{ik}\ln \phi \left( x_{ij} \mid \mu ^{(1)}_{kj}, \sigma ^{(1)2}_{kj}\right) . \end{aligned}$$

Proposition 2

Assume that $\varvec{m}^{(1)}$ is a model such that $g^{(1)} = g^{(0)}$ and ${\varOmega }_1 = {\varOmega }_0 \cup {\varOmega }_{01}$ where ${\varOmega }_{01} \ne \emptyset $, i.e, the model $\varvec{m}^{(0)}$ is nested within the model $\varvec{m}^{(1)}$ with the same number of components. When $n \rightarrow \infty $,

$$\begin{aligned} \mathbb {P}\bigg (\text {MICL}\big (\varvec{m}^{(1)}\big ) > \text {MICL}\big (\varvec{m}^{(0)}\big )\bigg ) \longrightarrow 0. \end{aligned}$$

Proof

We have

$$\begin{aligned}&\mathbb {P}\bigg \{\text {MICL}\big (\varvec{m}^{(1)}\big )> \text {MICL}\big (\varvec{m}^{(0)}\big )\bigg \}\\&\quad \le \mathbb {P}\left\{ \sum _{j \in {\varOmega }_{01}} \ln \dfrac{p\big (\mathbf {x}_{\bullet j} \mid \omega ^{(1)}_j, g^{(0)}, \mathbf {z}^{(1)}\big )}{ p\big (\mathbf {x}_{\bullet j} \mid \omega ^{(0)}_j, g^{(0)}, \mathbf {z}^{(0)}\big )} > 0 \right\} , \end{aligned}$$

And for each $j \in {\varOmega }_{01}$, when $n \rightarrow \infty $

$$\begin{aligned} 2 \sum _{i =1}^n \sum _{k =1}^{g^{(0)}} z^{(1)}_{ik}\ln \dfrac{\phi \left( x_{ij} \mid \tilde{\mu }^{(1)}_{kj}, \tilde{\sigma }^{(1)2}_{kj}\right) }{\phi \left( x_{ij} \mid \mu ^{(0)}_{1j},\sigma ^{(0)2}_{1j}\right) } \longrightarrow \chi ^2_{2g^{(0)}} \quad \text {in distribution}. \end{aligned}$$

We have

$$\begin{aligned}&\mathbb {P}\bigg (\sum _{j \in {\varOmega }_{01}} \ln \dfrac{p\big (\mathbf {x}_{\bullet j} \mid \omega ^{(1)}_j, g^{(0)}, \mathbf {z}^{(1)}\big )}{ p\big (\mathbf {x}_{\bullet j} \mid \omega ^{(0)}_j, g^{(0)}, \mathbf {z}^{(0)}\big )}> 0 \bigg )\\&\quad = \mathbb {P}\Big (\chi ^2_{2(g^{(0)}-1)} - 2 (g^{(0)}-1) \ln n > 0\Big )\\&\qquad \longrightarrow 0 \quad \text { by Chebyshev's inequality}. \end{aligned}$$

Appendix 2: Details on the partition step

At iteration [r], the partition $\mathbf {z}^{[r]}$ is defined as a partition which increase the value of the integrated complete-data likelihood for the current model $\varvec{m}^{[r]}$. This partition is obtained by an iterative method initialized with the partition $\mathbf {z}^{[r-1]}$. Each iteration consists in sampling uniformly an individual which is affiliated to the class maximizing the integrated complete-data likelihood while the other class memberships are unchanged.

At iteration [r] of the global algorithm, the algorithm used to obtained $\mathbf {z}^{[r]}$ is inialized at partition $\mathbf {z}^{(0)}=\mathbf {z}^{[r-1]}$. It performs S iterations where iteration (s) is defined as follows:

Individual sampling $i^{(s)} \sim \mathcal {U}\{1,\ldots ,n\}$

Partition optimization defined the set of partition $\mathcal {Z}^{(s)}=\{\mathbf {z}: \varvec{z}_i=\varvec{z}_i^{(s-1)},\; \forall i\ne i^{(s)}\}$ and compute the optimized partition $\mathbf {z}^{(s)}$ defined by

$$\begin{aligned} \mathbf {z}^{(s)} = \text {argmax}_{\mathbf {z}\in \mathcal {Z}^{(s)}} \ln p\left( \mathbf {x},\mathbf {z}|\varvec{m}^{[r]}\right) . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marbac, M., Sedki, M. Variable selection for model-based clustering using the integrated complete-data likelihood. Stat Comput 27, 1049–1063 (2017). https://doi.org/10.1007/s11222-016-9670-1

Download citation

Received: 10 June 2015
Accepted: 05 May 2016
Published: 23 May 2016
Issue Date: July 2017
DOI: https://doi.org/10.1007/s11222-016-9670-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable selection for model-based clustering using the integrated complete-data likelihood

Abstract

Access this article

Similar content being viewed by others

Enhancing the selection of a model-based clustering with external categorical variables

Modelling the role of variables in model-based cluster analysis

Unifying data units and models in (co-)clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (zip 510 KB)

Appendices

Appendix 1: Consistency of the MICL criterion

Proposition 1

Proof

Proposition 2

Proof

Appendix 2: Details on the partition step

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Variable selection for model-based clustering using the integrated complete-data likelihood

Abstract

Access this article

Similar content being viewed by others

Enhancing the selection of a model-based clustering with external categorical variables

Modelling the role of variables in model-based cluster analysis

Unifying data units and models in (co-)clustering

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (zip 510 KB)

Appendices

Appendix 1: Consistency of the MICL criterion

Proposition 1

Proof

Proposition 2

Proof

Appendix 2: Details on the partition step

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation