Skip to main content
Log in

Finite mixtures of multivariate skew t-distributions: some recent and new results

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Finite mixtures of multivariate skew t (MST) distributions have proven to be useful in modelling heterogeneous data with asymmetric and heavy tail behaviour. Recently, they have been exploited as an effective tool for modelling flow cytometric data. A number of algorithms for the computation of the maximum likelihood (ML) estimates for the model parameters of mixtures of MST distributions have been put forward in recent years. These implementations use various characterizations of the MST distribution, which are similar but not identical. While exact implementation of the expectation-maximization (EM) algorithm can be achieved for ‘restricted’ characterizations of the component skew t-distributions, Monte Carlo (MC) methods have been used to fit the ‘unrestricted’ models. In this paper, we review several recent fitting algorithms for finite mixtures of multivariate skew t-distributions, at the same time clarifying some of the connections between the various existing proposals. In particular, recent results have shown that the EM algorithm can be implemented exactly for faster computation of ML estimates for mixtures with unrestricted MST components. The gain in computational time is effected by noting that the semi-infinite integrals on the E-step of the EM algorithm can be put in the form of moments of the truncated multivariate non-central t-distribution, similar to the restricted case, which subsequently can be expressed in terms of the non-truncated form of the central t-distribution function for which fast algorithms are available. We present comparisons to illustrate the relative performance of the restricted and unrestricted models, and demonstrate the usefulness of the recently proposed methodology for the unrestricted MST mixture, by some applications to three real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Akaike, H.: A new look at the statistical model identification. Autom. Control 19, 716–723 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  • Arellano-Valle, R., Bolfarine, H., Lachos, V.: Bayesian inference for skew-normal linear mixed models. J. Appl. Stat. 34(6), 663–682 (2007)

    Article  MathSciNet  Google Scholar 

  • Arellano-Valle, R.B., Azzalini, A.: On the unification of families of skew-normal distributions. Scand. J. Stat. 33, 561–574 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Arellano-Valle, R.B., Genton, M.G.: On fundamental skew distributions. J. Multivar. Anal. 96, 93–116 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Arnold, B.C., Beaver, R.J.: Skewed multivariate models related to hidden truncation and/or selective reporting. Test 11, 7–54 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  • Azzalini, A.: A class of distributions which includes the normal ones. Scand. J. Stat. 12, 171–178 (1985)

    MATH  MathSciNet  Google Scholar 

  • Azzalini, A.: The skew-normal distribution and related multivariate families. Scand. J. Stat. 32, 159–188 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Azzalini, A., Capitanio, A.: Distribution generated by perturbation of symmetry with emphasis on a multivariate skew t distribution. J. R. Stat. Soc., Ser. B 65, 367–389 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Azzalini, A., Dalla, Valle A.: The multivariate skew-normal distribution. Biometrika 83(4), 715–726 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  • Banfield, J.D., Raftery, A.: Model-based gaussian and non-gaussian clustering. Biometrics 49, 803–821 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • Basso, R.M., Lachos, V.H., Cabral, C.R.B., Ghosh, P.: Robust mixture modeling based on scale mixtures of skew-normal distributions. Comput. Stat. Data Anal. 54, 2926–2941 (2010)

    Article  MathSciNet  Google Scholar 

  • Böhning, D.: Computer-Assisted Analysis of Mixtures and Applications: Meta-Analysis, Discase Mapping and Others. Chapman and Hall, New York (1999)

    Google Scholar 

  • Branco, M.D., Dey, D.K.: A general class of multivariate skew-elliptical distributions. J. Multivar. Anal. 79, 99–113 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Brinkman, R., Gaspareto, M., Lee, S.J., Ribickas, A., Perkins, J., Janssen, W., Smiley, R., Smith, C.: High content flow cytometry and temporal data analysis for defining a cellular signature of graft versus host disease. Biol. Blood Marrow Transplant. 13, 691–700 (2007)

    Article  Google Scholar 

  • Cabral, C., Bolfarine, H., Pereira, J.: Bayesian density estimation using skew student-t-normal mixtures. Comput. Stat. Data Anal. 52, 5075–5090 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Cabral, C., Lachos, V., Prates, M.: Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data Anal. 56, 126–142 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  • Dempster, A., Laird, N.M., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc., Ser. B 39, 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  • Everitt, B.S., Hand, D.J.: Finite Mixture Distributions. Chapman and Hall, London (1981)

    Book  MATH  Google Scholar 

  • Fraley, C., Raftery, A.E.: How many clusters? Which clustering methods? Answers via model-based cluster analysis. Comput. J. 41, 578–588 (1999)

    Article  Google Scholar 

  • Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)

    MATH  Google Scholar 

  • Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions. Biostatistics 11, 317–336 (2010)

    Article  Google Scholar 

  • Genz, A., Bretz, F.: Methods for the computation of multivariate t-probabilities. J. Comput. Graph. Stat. 11, 950–971 (2002)

    Article  MathSciNet  Google Scholar 

  • Gómez, H., Venegas, O., Bolfarine, H.: Skew-symmetric distributions generated by the distribution function of the normal distribution. Environmetrics 18, 395–407 (2007)

    Article  MathSciNet  Google Scholar 

  • González-Farás, G., Domínguez-Molinz, J.A., Gupta, A.K.: Additive properties of skew normal random vectors. J. Stat. Plan. Inference 126, 521–534 (2004)

    Article  Google Scholar 

  • Green, P.J.: On use of the em algorithm for penalized likelihood estimation. J. R. Stat. Soc. B 52, 443–452 (1990)

    MATH  Google Scholar 

  • Gupta, A.K.: Multivariate skew-t distribution. Statistics 37, 359–363 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Ho, H., Lin, T., Chen, H., Wang, W.: Some results on the truncated multivariate t distribution. J. Stat. Plan. Inference 142, 25–40 (2012a)

    Article  MATH  MathSciNet  Google Scholar 

  • Ho, H., Pyne, S., Lin, T.: Maximum likelihood inference for mixtures of skew student-t-normal distributions through practical em-type algorithms. Stat. Comput. 22, 287–299 (2012b)

    Article  MathSciNet  Google Scholar 

  • Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19, 73–83 (2009)

    Article  MathSciNet  Google Scholar 

  • Karlis, D., Xekalaki, E.: Choosing initial values for the em algorithm for finite mixtures. Comput. Stat. Data Anal. 41, 577–590 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Kotz, S., Nadarajah, S.: Multivariate t Distributions and Their Applications. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  • Lachos, V.H., Ghosh, P., Arellano-Valle, R.B.: Likelihood based inference for skew normal independent linear mixed models. Stat. Sin. 20, 303–322 (2010)

    MATH  MathSciNet  Google Scholar 

  • Lee, S., McLachlan, G.: On the fitting of mixtures of multivariate skew t-distributions via the em algorithm (2011). arXiv:1109.4706 [statME]

  • Lin, T.I.: Maximum likelihood estimation for multivariate skew-normal mixture models. J. Multivar. Anal. 100, 257–265 (2009)

    Article  MATH  Google Scholar 

  • Lin, T.I.: Robust mixture modeling using multivariate skew t distribution. Stat. Comput. 20, 343–356 (2010)

    Article  MathSciNet  Google Scholar 

  • Lin, T.I., Lee, J.C., Hsieh, W.J.: Robust mixture modeling using the skew-t distribution. Stat. Comput. 17, 81–92 (2007a)

    Article  MathSciNet  Google Scholar 

  • Lin, T.I., Lee, J.C., Yen, S.Y.: Finite mixture modelling using the skew normal distribution. Stat. Sin. 17, 909–927 (2007b)

    MATH  MathSciNet  Google Scholar 

  • Lindsay, B.G.: Mixture Models: Theory, Geometry, and Applications. NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5. Institute of Mathematical Statistics, Hayward (1995)

    MATH  Google Scholar 

  • Liseo, B., Loperfido, N.: A bayesian interpretation of the multivariate skew-normal distribution. Stat. Probab. Lett. 61, 395–401 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Liu, C., Rubin, D.: The ecme algorithm: a simple extension of the em and ecm with faster monotone convergence. Biometrika 81, 633–648 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  • Maier, L.M., Anderson, D.E., De Jager, P.L., Wicker, L., Hafler, D.A.: Allelic variant in ctla4 alters t cell phosphorylation patterns. Proc. Natl. Acad. Sci. USA 104, 18607–18612 (2007)

    Article  Google Scholar 

  • McLachlan, G., Peel, D.: Robust cluster analysis via mixtures of multivariate t-distributions. In: Amin, A., Dori, D., Pudil, P., Freeman, H. (eds.) Lecture Notes in Computer Science, vol. 1451, pp. 658–666. Springer, Berlin (1998)

    Google Scholar 

  • McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications. Dekker, New York (1988)

    MATH  Google Scholar 

  • McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley Series in Probability and Statistics (2000)

    Book  MATH  Google Scholar 

  • O’Hagan, A.: Bayes estimation of a convex quadratic. Biometrika 60, 565–571 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  • O’Hagan, A.: Moments of the truncated multivariate-t distribution (1976). http://www.tonyohagan.co.uk/academic/pdf/trunc_multi_t.PDF

  • O’Hagan, A., Murphy, T., Gormley, I.: Computational aspects of fitting mixture models via the expectation-maximization algorithm. Comput. Stat. Data Anal. 56, 3843–3864 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  • Peel, D., McLachlan, G.: Robust mixture modelling using the t distribution. Stat. Comput. 10, 339–348 (2000)

    Article  Google Scholar 

  • Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L.M., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., De Jager, P.L., Mesirow, J.P.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA 106, 8519–8524 (2009)

    Article  Google Scholar 

  • Sahu, S., Dey, D., Branco, M.: A new class of multivariate skew distributions with applications to bayesian regression models. Can. J. Stat. 31, 129–150 (2003). Eratum: Can. J. Stat. 37, 301–302 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)

    Article  MATH  Google Scholar 

  • Titterington, D.M., Smith, A.F.M., Markov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985)

    MATH  Google Scholar 

  • Vrbik, I., McNicholas, P.: Analytic calculations for the em algorithm for multivariate skew t-mixture models. Stat. Probab. Lett. 82, 1169–1174 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  • Wang, K.: EMMIX-skew: EM algorithm for mixture of multivariate skew normal/t distributions (2009). http://www.maths.uq.edu.au/gjm/mix_soft/EMMIX-skew, R package version 1.0-12

  • Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew t mixture models: applications: applications to fluorescence-activated cell sorting data. In: Shi, H., Zhang, Y., Botema, M., Lovell, B., Maoder, A. (eds.) DICTA 2009 (Conference of Digital Image Computing: Techniques and Applications, Melbourne), pp. 526–531. IEEE Comput. Soc., Los Alamitos (2009)

    Chapter  Google Scholar 

Download references

Acknowledgements

We would like to thank Professor Seung-Gu Kim for comments and corrections, and Drs. Kui (Sam) Wang and Saumyadipta Pyne for their helpful discussions on this topic.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Geoffrey J. McLachlan.

Appendices

Appendix A: The truncated multivariate t-distribution

In this appendix, we briefly describe the truncated multivariate t-distribution and provide some formulas for computing its moments (Lee and McLachlan 2011). These expressions are crucial for the swift evaluation of the conditional expectations on the E-step of the FM-uMST model discussed in Sect. 5. We follow the approach of Lee and McLachlan (2011). A alternative description is given by Ho et al. (2012a), which provides equivalent expressions for the doubly truncated case.

Let X be a p-dimensional random variable having a multivariate t-distribution with location vector μ, scale matrix Σ, and ν degrees of freedom. Truncating x to the hyperplane region \(\mathbb{A} = \{\boldsymbol{x} \geq\boldsymbol{a}, \boldsymbol{a} \in\mathbb{R}^{p}\}\), where xa means each element x i =(x) i is greater than or equal to a i =(a) i for i=1,…,p, results in a left-truncated t-distribution whose density is given by

$$ f_{\mathbb{A}}(\boldsymbol{x}; \boldsymbol{\mu}, \boldsymbol{\varSigma }, \nu) = T_{p,\nu}^{-1} (\boldsymbol{a};\boldsymbol{\mu},\boldsymbol { \varSigma} ) t_{p,\nu} (\boldsymbol{x};\boldsymbol{\mu}, \boldsymbol{ \varSigma } ),\quad \boldsymbol{x} \in\mathbb{A}. $$
(69)

For a random vector X with density (69), we write \(\boldsymbol{X} \sim tt_{p,\nu} (\boldsymbol{\mu}, \boldsymbol{\varSigma}; \mathbb{A})\). For our purposes, we will be concerned with the first two moments of X, specifically E(X) and E(XX T). Explicit formulas for the truncated central t-distribution in the univariate case \(tt_{1,\nu}(0, \sigma^{2}; \mathbb {A})\) were provided by O’Hagan (1973), who expressed the moments in terms of the non-truncated t-distribution. The multivariate case was studied in O’Hagan (1976), but still considering the central case only. Here we describe a generalization of the results in O’Hagan (1976) to the multivariate non-central case and express them in a form suitable for undertaking the E-step in the direct application of the EM algorithm to the fitting of mixtures of MST distributions.

Before presenting the expressions, it will be convenient to introduce some notation. Let x be a vector, then

x i :

denotes the ith element,

x ij :

is a two-dimensional vector with elements x i and x j ,

x i :

represents the (p−1)-dimensional vector with the ith element removed, and

x ij :

represents the (p−2)-dimensional vector with the ith and jth elements removed.

For a matrix X, let

x ij :

denote the ijth element,

X ij :

defines the 2×2 matrix consisting of the elements x ii , x ij , x ji and x jj ,

X i :

be created by removing the ith row and column from X,

X ij :

be the (p−2) square matrix resulting from the removal of the ith and jth row and column from X, and

X (ij) :

be the ith and jth column of X with the elements of X ij removed, yielding a (p−2)×2 matrix.

We now proceed to the expressions for the first two moments of X.

One can show that the first moment of (69) is

$$ E (\boldsymbol{X} ) = \boldsymbol{\mu}+ \boldsymbol {\epsilon}, $$
(70)

where ϵ=c −1 Σξ and c=T p,ν (μa;0,Σ), and ξ is a p×1 vector with elements

for i=1,…,p, and where

The second moment is given by

(71)

where H is a p×p matrix with off-diagonal elements

and diagonal elements,

It is worth noting that evaluation of the expressions (70) and (71) rely on algorithms for computing the multivariate central t-distribution function for which highly efficient procedures are readily available in many statistical packages. For example, an implementation of Genz’s algorithm (Genz and Bretz 2002; Kotz and Nadarajah 2004) is provided by the mvtnorm package available from the R website.

Appendix B: E-step for uMST

Derivations of \(e_{1,j}^{(k)}\), \(\boldsymbol{e}_{3,j}^{(k)}\) and \(\boldsymbol{e}_{4,j}^{(k)}\) are detailed as follows.

2.1 B.1 Calculation of \(e_{1,j}^{(k)}\)

Concerning the calculation of the expectation \(e_{1,j}^{(k)}\), the conditional density of W j given y j , is given by

(72)

where

and 0 is the zero vector of appropriate dimension.

The conditional expectation \(E_{\boldsymbol{\theta}^{(k)}}\{\log(W_{j}) \mid\boldsymbol {y}_{j}\}\) can be reduced to

(73)

where

$$\boldsymbol{y}_{2j}^{(k)} = \boldsymbol{q}_j^{(k)} \sqrt{\frac{\nu ^{(k)}+p+2}{\nu^{(k)}+d^{(k)}(\boldsymbol{y}_j)}}, $$

and where the last term S is given by

(74)

and \(S_{1,j}^{(k)}\) is an integral given by

(75)

Combining (73) and (74), \(e_{1,j}^{(k)}\) can be reduced to

(76)

We note that the term \(S_{1,j}^{(k)}\) will be very small in practice since it would be zero if we adopted an OSL EM algorithm. In which case, there would be no need to calculate the multiple integral \(S_{1,j}^{(k)}\) in (74). Hence then, \(e_{1,j}^{(k)}\) can be reduced to

(77)

2.2 B.2 Calculation of \(\boldsymbol{e}_{3,j}^{(k)}\) and \(\boldsymbol{e}_{4,j}^{(k)}\)

To obtain \(\boldsymbol{e}_{3,j}^{(k)}\) and \(\boldsymbol{e}_{4,j}^{(k)}\), first note that the joint density of y j , u j , and w j is given by

(78)

Using Bayes’ rule, the conditional density of u j and w j given y j can be written as

(79)

From (79), standard conditional expectation calculations yield

(80)

where X j is a p-dimensional t-variate truncated to the positive hyperplane ℝ+, which is conditionally distributed as

$$ \boldsymbol{X}_j \mid\boldsymbol{y}_j \sim tt_{p,\nu^{(k)}+p+2} \biggl(\boldsymbol{q}_j^{(k)}, \biggl( \frac{\nu^{(k)}+d^{(k)}(\boldsymbol{y}_j)}{\nu^{(k)}+p+2} \biggr) \boldsymbol{\varLambda}^{(k)}; \mathbb{R}^+ \biggr). $$
(81)

Analogously, \(\boldsymbol{e}_{4,j}^{(k)}\) can be reduced to

$$ \boldsymbol{e}_{4,j}^{(k)} = e_{2,j}^{(k)} E \bigl(\boldsymbol{X}_j \boldsymbol{X}_j^T \mid \boldsymbol{y}_j\bigr). $$
(82)

The truncated moments E(X j y j ) and \(E(\boldsymbol{X}_{j} \boldsymbol{X}_{j}^{T} \mid\boldsymbol{y}_{j})\) can be swiftly evaluated using the expressions (70) and (71) in Sect. 3.2.

Appendix C: E-step for FM-uMST

The four conditional expectations \(e_{1,hj}^{(k)}\), \(e_{2,hj}^{(k)}\), \(\boldsymbol{e}_{3,hj}^{(k)}\), and \(\boldsymbol{e}_{4,hj}^{(k)}\) involved in the E-step are given by

(83)
(84)
(85)
(86)

where \(S_{1,hj}^{(k)}\) is a scalar defined by

(87)

and X hj is a truncated p-dimensional t-variate given by

$$\boldsymbol{X}_{hj} \mid\boldsymbol{y}_j \sim tt_{p,\nu_h+p+2} \biggl(\boldsymbol{q}_{hj}^{(k)}, \biggl( \frac{\nu_h^{(k)}+d_h^{(k)}(\boldsymbol{y}_j)}{ \nu_h^{(k)}+p+2} \biggr)\boldsymbol{\varLambda}_h^{(k)}, \mathbb{R}^+ \biggr). $$

The first two moments of X hj can be implicitly expressed in terms of the parameters \(\boldsymbol{q}_{hj}^{(k)}\), \(d_{h}^{(k)}(\boldsymbol{y}_{j})\), \(\boldsymbol{\varLambda}_{h}^{(k)}\), \(\nu_{h}^{(k)}\) using results (70) and (71). It is worth emphasizing that computation of \(\boldsymbol{e}_{3hj}^{(k)}\) and \(\boldsymbol{e}_{4hj}^{(k)}\) depends on algorithms for evaluating the multivariate t-distribution function, for which fast procedures are available.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, S., McLachlan, G.J. Finite mixtures of multivariate skew t-distributions: some recent and new results. Stat Comput 24, 181–202 (2014). https://doi.org/10.1007/s11222-012-9362-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-012-9362-4

Keywords

Navigation