Abstract
This paper introduces a finite mixture of canonical fundamental skew \(t\) (CFUST) distributions for a model-based approach to clustering where the clusters are asymmetric and possibly long-tailed (in: Lee and McLachlan, arXiv:1401.8182 [statME], 2014b). The family of CFUST distributions includes the restricted multivariate skew \(t\) and unrestricted multivariate skew \(t\) distributions as special cases. In recent years, a few versions of the multivariate skew \(t\) (MST) mixture model have been put forward, together with various EM-type algorithms for parameter estimation. These formulations adopted either a restricted or unrestricted characterization for their MST densities. In this paper, we examine a natural generalization of these developments, employing the CFUST distribution as the parametric family for the component distributions, and point out that the restricted and unrestricted characterizations can be unified under this general formulation. We show that an exact implementation of the EM algorithm can be achieved for the CFUST distribution and mixtures of this distribution, and present some new analytical results for a conditional expectation involved in the E-step.
Similar content being viewed by others
References
Aas, K., Haff, I.H.: The generalized hyperbolic skew student’s \(t\)-distribution. J. Financ. Econom. 4, 275–309 (2005)
Aghaeepour, N., Finak, G., The FLOWCAP Consortium, The DREAM Consortium, Hoos, H., Mosmann, T., Gottardo, R., Brinkman, R.R., Scheuermann, R.H.: Critical assessment of automated flow cytometry analysis techniques. Nat. Methods 10, 228–238 (2013)
Anderson, E.: The irises of the gaspé peninsula. Bull. Am. Iris Soc. 59, 2–5 (1935)
Arellano-Valle, R.B., Azzalini, A.: On the unification of families of skew-normal distributions. Scand. J. Stat. 33, 561–574 (2006)
Arellano-Valle, R.B., Genton, M.G.: On fundamental skew distribtuions. J. Multivar. Anal. 96, 93–116 (2005)
Arellano-Valle, R.B., Branco, M.D., Genton, M.G.: A unified view on skewed distributions arising from selections. Can. J. Stat. 34, 581–601 (2006)
Asparouhov, T., Muthén, B.: Structural equation models and mixture models with continuous non-normal skewed distributions. Mplus Web Notes 19, 1–49 (2014)
Azzalini, A.: The skew-normal distribution and related multivariate families. Scand. J. Stat. 32, 159–188 (2005)
Azzalini, A.: The Skew-Normal and Related Families. Institute of Mathematical Statistics Monographs, Cambridge University Press, Cambridge (2014)
Banfield, J.D., Raftery, A.E.: Model-based gaussian and non-gaussian clustering. Biometrics 49, 803–821 (1993)
Bernardi, M.: Risk measures for skew normal mixtures. Stat. Probab. Lett. 83, 1819–1824 (2013)
Böhning, D.: Computer-Assisted Analysis of Mixtures and Applications: Meta-Analysis, Disease Mapping and Others. Chapman and Hall, London (1999)
Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., Lindsay, B.: The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann. Inst. Stat. Math. 46, 373–388 (1994)
Browne, R.P., McNicholas, P.D.: A mixture of generalized hyperbolic distributions. arXiv:1305.1036 [statME] (2013)
Cabral, C.S., Lachos, V.H., Prates, M.O.: Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data Anal. 56, 126–142 (2012)
Contreras-Reyes, J.E., Arellano-Valle, R.B.: Growth estimates of cardinalfish (Epigonus crassicaudus) based on scale mixtures of skew-normal distributions. Fish. Res. 147, 137–144 (2013)
Cook, R.D., Weisberg, S.: An Introduction to Regression Graphics. Wiley, New York (1994)
Everitt, B.S., Hand, D.J.: Finite Mixture Distributions. Chapman and Hall, London (1981)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7, 179–188 (1936)
Forbes, F., Wraith, D.: A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: application to robust clustering. Stat. Comput. (2013). doi:10.1007/s11222-013-9414-4
Fraley, C., Raftery, A.E.: How many clusters? which clustering methods? answers via model-based cluster analysis. Comput. J. 41, 578–588 (1999)
Franczak, B.C., Browne, R.P., McNicholas, P.D.: Mixtures of shifted asymmetric laplace distributions. IEEE Trans. Pattern Anal. Mach. Intell. (2013). doi:10.1109/TPAMI.2013.216
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)
Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions. Biostatistics 11, 317–336 (2010)
Genton, MGe: Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality. Chapman and Hall, London (2004)
Ho, H.J., Lin, T.I., Chang, H.H., Haase, H.B., Huang, S., Pyne, S.: Parametric modeling of cellular state transitions as measured with flow cytometry different tissues. BMC Bioinform. 13(Suppl 5), S5 (2012a)
Ho, H.J., Lin, T.I., Chen, H.Y., Wang, W.L.: Some results on the truncated multivariate \(t\) distribution. J. Stat. Plan. Inference 142, 25–40 (2012b)
Hu, X., Kim, H., Brennan, P.J., Han, B., Baecher-Allan, C.M., De Jager, P.L., Brenner, M.B., Raychaudhuri, S.: Application of user-guided automated cytometric data analysis to large-scale immunoprofiling of invariant natural killer t cells. Proc. Natl. Acad. Sci. USA 110, 19,030–19,035 (2013). doi:10.1073/pnas.1318322110
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Karlis, D., Santourian, A.: Model-based clustering with non-elliptically contoured distributions. Stat. Comput. 19, 73–83 (2009)
Lee, S., McLachlan, G.J.: On the fitting of mixtures of multivariate skew \(t\)-distributions via the EM algorithm. arXiv:1109.4706 [statME] (2011)
Lee, S., McLachlan, G.J.: Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results. Stat. Comput. 24, 181–202 (2014a)
Lee, S.X., McLachlan, G.J.: Model-based clustering and classification with non-normal mixture distributions. Stat. Methods Appl. 22, 427–454 (2013a)
Lee, S.X., McLachlan, G.J.: Modelling asset return using multivariate asymmetric mixture models with applications to estimation of value-at-risk. In: Piantadosi, J., Anderssen, R.S., Boland, J. (eds.) MODSIM 2013 (20th International Congress on Modelling and Simulation), pp. 1228–1234. Adelaide (2013)
Lee, S.X., McLachlan, G.J.: On mixtures of skew-normal and skew \(t\)-distributions. Adv. Data Anal. Classif. 7, 241–266 (2013c)
Lee, S.X., McLachlan, G.J.: Maximum likelihood estimation for finite mixtures of canonical fundamental skew \(t\)-distributions: the unification of the unrestricted and restricted skew t-mixture models. arXiv:1401.8182 [statME] (2014b)
Lee, Y.W., Poon, S.H.: Systemic and systematic factors for loan portfolio loss distribution. Econometrics and applied economics workshops pp. 1–61. School of Social Science, University of Manchester (2011)
Lin, T.I.: Robust mixture modeling using multivariate skew \(t\) distribution. Stat. Comput. 20, 343–356 (2010)
Lindsay, B.G.: Mixture Models: Theory, Geometry, and Applications. NSF-CBMS Regional Conference Series in probability and Statistics, vol. 5. Institute of Mathematical Statistics and the American Statistical Association, Alexandria (1995)
McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications. Marcel Dekker, New York (1988)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley, New York (1997)
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley Series in Probability and Statistics, New York (2000)
McNicholas, P.D., Murphy, T.B., McDaid, A.F., Frost, D.: Serial and parallel implementations of model-based clustering via parsimonious gaussian mixture models. Comput. Stat. Data Anal. 54, 711–723 (2010)
Mengersen, K.L., Robert, C.P., Titterington, D.M.: Mixtures: Estimation and Applications. Wiley, New York (2011)
Murray, P.M., Browne, B.P., McNicholas, P.D.: Mixtures of skew-\(t\) factor analyzers. Comput. Stat. Data Anal. 77, 326–335 (2014)
Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L.M., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., De Jager, P.L., Mesirow, J.P.: Automated high-dimensional flow cytometric data analysis. Proc. Natl. Acad. Sci. USA 106, 8519–8524 (2009)
Pyne, S., Lee, S.X., Wang, K., Irish, J., Tamayo, P., Nazaire, M.D., Duong, T., Ng, S.K., Hafler, D., Levy, R., Nolan, G.P., Mesirov, J., McLachlan, G.: Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data. PLoS One 9(e100), 334 (2014). doi:10.1371/journal.pone.0100334
Riggi, S., Ingrassia, S.: A model-based clustering approach for mass composition analysis of high energy cosmic rays. Astropart. Phys. 48, 86–96 (2013)
Rossin, E., Lin, T.I., Ho, H.J., Mentzer, S.J., Pyne, S.: A framework for analytical characterization of monoclonal antibodies based on reactivity profiles in different tissues. Bioinformatics 27, 2746–2753 (2011)
Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003)
Sahu, S.K., Dey, D.K., Branco, M.D.: Erratum: a new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 37, 301–302 (2009)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978)
Soltyk, S., Gupta, R.: Application of the multivariate skew normal mixture model with the EM algorithm to value-at-risk. In: Chan, F., Marinova, D., Anderssen, R.S. (eds.) MODSIM 2011 (19th International Congress on Modelling and Simulation), pp. 1638–1644. Perth (2011)
Titterington, D.M., Smith, A.F.M., Markov, U.E.: Statistical Analysis of Finite Mixture Distributions. Wiley, New York (1985)
Tortora, C., Franczak, B.C., Browne, B.P., McNicholas, P.D.: Model-based clustering using mixtures of coalesced generalized hyperbolic distributions. Preprint arXiv:1403.2332 [statME] (2014)
Vrbik, I., McNicholas, P.D.: Analytic calculations for the EM algorithm for multivariate skew \(t\)-mixture models. Stat. Probab. Lett. 82, 1169–1174 (2012)
Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew \(t\) mixture models: applications to fluorescence-activated cell sorting data. In: Shi, H., Zhang, Y., Bottema, M.J., Lovell, B.C., Maeder, A.J. (eds.) DICTA 2009 (Conference of Digital Image Computing: Techniques and Applications, Melbourne), pp. 526–531. IEEE Computer Society, Los Alamitos (2009)
Wendel, J.G.: Note on the gamma function. Am. Math. Mon. 55, 563–564 (1948)
Wraith, D., Forbes, F.: Clustering using skewed multivariate heavy tailed distributions with flexible tail behaviour. Preprint. arXiv:1408.0711 [statME] (2014)
Acknowledgments
We would like to thank Professor Seung-Gu Kim for helpful comments on this topic. The work of the authors was supported by an Australian Research Council Discovery Grant.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Lee, S.X., McLachlan, G.J. Finite mixtures of canonical fundamental skew \(t\)-distributions. Stat Comput 26, 573–589 (2016). https://doi.org/10.1007/s11222-015-9545-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-015-9545-x