Skip to main content
Log in

A robust factor analysis model using the restricted skew-\(t\) distribution

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

Factor analysis is a classical data-reduction technique that seeks a potentially lower number of unobserved variables that can account for the correlations among the observed variables. This paper presents an extension of the factor analysis model, called the skew-\(t\) factor analysis model, constructed by assuming a restricted version of the multivariate skew-\(t\) distribution for the latent factors and a symmetric \(t\)-distribution for the unobservable errors jointly. The proposed model shows robustness to violations of normality assumptions of the underlying latent factors and provides flexibility in capturing extra skewness as well as heavier tails of the observed data. A computationally feasible expectation conditional maximization algorithm is developed for computing maximum likelihood estimates of model parameters. The usefulness of the proposed methodology is illustrated using both simulated and real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aas K, Haff IH (2006) The generalised hyperbolic skew student’s \(t\)-distribution. J Financ Econ 4:275–309

    Google Scholar 

  • Aitken AC (1926) On Bernoulli’s numerical solution of algebraic equations. Proc R Soc Edinburgh 46:289–305

    Article  MATH  Google Scholar 

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (Eds.) 2nd international symposium on information theory. Akademiai Kiado, Budapest, pp 267–281

  • Anderson TW (2003) An introduction to multivariate statistical analysis, 3rd edn. Wiley, New York

    MATH  Google Scholar 

  • Azzalini A (1985) A class of distributions which includes the normal ones. Scand J Stat 12:171–178

    MathSciNet  MATH  Google Scholar 

  • Azzalini A (2005) The skew-normal distribution and related multivariate families. Scand J Stat 32:159–188

    Article  MathSciNet  MATH  Google Scholar 

  • Azzalini A, Capitanio A (1999) Statistical applications of the multivariate skew normal distribution. J R Stat Soc Ser B 61:579–602

    Article  MathSciNet  MATH  Google Scholar 

  • Azzalini A, Capitanio A (2003) Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution. J R Stat Soc Ser B 65:367–389

    Article  MathSciNet  MATH  Google Scholar 

  • Azzalini A, Dalla Valle A (1996) The multivariate skew-normal distribution. Biometrika 83:715–726

    Article  MathSciNet  MATH  Google Scholar 

  • Azzalini A, Genton MG (2008) Robust likelihood methods based on the skew-\(t\) and related distributions. Int Stat Rev 76:106–129

    Article  MATH  Google Scholar 

  • Barndorff-Nielsen O, Shephard N (2001) Non-Gaussian Ornstein–Uhlenbeck-based models and some of their uses in financial economics. J R Stat Soc Ser B 63:167–241

    Article  MathSciNet  MATH  Google Scholar 

  • Basilevsky A (2008) Statistical factor analysis and related methods: theory and applications. Wiley, New York

    Google Scholar 

  • Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay B (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46:373–388

    Article  MATH  Google Scholar 

  • Bozdogan H (1987) Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52:345–370

    Article  MathSciNet  MATH  Google Scholar 

  • Branco MD, Dey DK (2001) A general class of multivariate skew-elliptical distributions. J Multivar Anal 79:99–113

    Article  MathSciNet  MATH  Google Scholar 

  • Cook RD, Weisberg S (1994) An introduction to regression graphics. Wiley, New York

    Book  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J R Stat Soc Ser B 39:1–38

    MathSciNet  MATH  Google Scholar 

  • Efron B, Hinkley DV (1978) Assessing the accuracy of the maximum likelihood estimator: observed versus expected fisher information (with discussion). Biometrika 65:457–487

    Article  MathSciNet  MATH  Google Scholar 

  • Efron B, Tibshirani R (1986) Bootstrap method for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci 1:54–77

    Article  MathSciNet  Google Scholar 

  • Fokoué E, Titterington DM (2003) Mixtures of factor analyzers. Bayesian estimation and inference by stochastic simulation. Mach Learn 50:73–94

    Article  MATH  Google Scholar 

  • Hannan EJ, Quinn BG (1979) The determination of the order of an autoregression. J R Stat Soc Ser B 41:190–195

    MathSciNet  MATH  Google Scholar 

  • Healy MJR (1968) Multivariate normal plotting. Appl Stat 17:157–161

    Article  MathSciNet  Google Scholar 

  • Ho HJ, Lin TI, Chang HH, Haase HB, Huang S, Pyne S (2012) Parametric modeling of cellular state transitions as measured with flow cytometry different tissues. BMC Bioinform 13(Suppl 5):S5

    Article  Google Scholar 

  • Jamshidian M (1997) An EM algorithm for ML factor analysis with missing data. In: Berkane M (ed) Latent variable modeling and applications to causality. Springer, New York, pp 247–258

    Chapter  Google Scholar 

  • Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 6th edn. Pearson Prentice-Hall, Upper Saddle River

    MATH  Google Scholar 

  • Jones MC, Faddy MJ (2003) A skew extension of the \(t\)-distribution with applications. J R Stat Soc Ser B 65:159–174

    Article  MathSciNet  MATH  Google Scholar 

  • Kotz S, Nadarajah S (2004) Multivariate \(t\) distributions and their applications. Cambridge University Press, Cambridge

    Google Scholar 

  • Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew normal independent linear mixed models. Stat Sin 20:303–322

    MathSciNet  MATH  Google Scholar 

  • Lange KL, Little RJA, Taylor JMG (1989) Robust statistical modeling using the \(t\) distribution. J Am Stat Assoc 84:881–896

    MathSciNet  Google Scholar 

  • Lawley DN, Maxwell AE (1971) Factor analysis as a statistical method, 2nd edn. Butterworth, London

    MATH  Google Scholar 

  • Lee S, McLachlan GJ (2013) On mixtures of skew normal and skew \(t\)-distributions. Adv Data Anal Classif 7:241–266

    Article  MathSciNet  MATH  Google Scholar 

  • Lee S, McLachlan GJ (2014) Finite mixtures of multivariate skew \(t\)-distributions: some recent and new results. Stat Comp 24:181–202

    Article  MathSciNet  Google Scholar 

  • Lee YW, Poon SH (2011) Systemic and systematic factors for loan portfolio loss distribution. Econometrics and applied economics workshops, School of Social Science, University of Manchester, pp 1–61

  • Lin TI, Ho HJ, Chen CL (2009) Analysis of multivariate skew normal models with incomplete data. J Multivari Anal 100:2337–2351

  • Lin TI, Lee JC, Ho HJ (2006) On fast supervised learning for normal mixture models with missing information. Pattern Recog 39:1177–1187

    Article  MATH  Google Scholar 

  • Lin TI, Lee JC, Hsieh WJ (2007a) Robust mixture modeling using the skew \(t\) distribution. Stat Compt 17:81–92

    Article  MathSciNet  Google Scholar 

  • Lin TI, Lee JC, Yen SY (2007b) Finite mixture modelling using the skew normal distribution. Stat Sin 17:909–927

    MathSciNet  MATH  Google Scholar 

  • Lin TI, Lin TC (2011) Robust statistical modelling using the multivariate skew \(t\) distribution with complete and incomplete data. Stat Model 11:253–277

    Article  MathSciNet  MATH  Google Scholar 

  • Lin TI, McLachlan GJ, Lee SX (2013) Extending mixtures of factor models using the restricted multivariate skew-normal distribution. Preprint arXiv:1307.1748

  • Lindsay B (1995) Mixture models: theory. Geometry and applications. Institute of Mathematical Statistics, Hayward

    MATH  Google Scholar 

  • Liu M, Lin TI (2014) Skew-normal factor analysis models with incomplete data. J Appl Statist. doi:10.1080/02664763.2014.986437

  • Lopes HF, West M (2004) Bayesian model assessment in factor analysis. Stat Sin 14:41–67

    MathSciNet  MATH  Google Scholar 

  • Louis TA (1982) Finding the observed information when using the EM algorithm. J R Stat Soc Ser B 44:226–232

    MathSciNet  MATH  Google Scholar 

  • McLachlan GJ, Peel D (2000) Finite Mixture Models. Wiley, New York

    Book  MATH  Google Scholar 

  • McLachlan GJ, Bean RW, Jones LBT (2007) Extension of the mixture of factor analyzers model to incorporate the multivariate \(t\)-distribution. Comput Stat Data Anal 51:5327–5338

    Article  MATH  Google Scholar 

  • McNicholas PD, Murphy TB, McDaid AF, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54:711–723

    Article  MathSciNet  MATH  Google Scholar 

  • Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278

    Article  MathSciNet  MATH  Google Scholar 

  • Montanari A, Viroli C (2010) Heteroscedastic factor mixture analysis. Stat Model 10:441–460

    Article  MathSciNet  Google Scholar 

  • Murray PM, Browne RP, McNicholas PD (2013) Mixtures of ‘unrestricted’ skew-\(t\) factor analyzers. Preprint arXiv:1310.6224v1

  • Murray PM, Browne RP, McNicholas PD (2014a) Mixtures of skew-\(t\) factor analyzers. Comput Stat Data Anal 77:326–335

    Article  MathSciNet  Google Scholar 

  • Murray PM, McNicholas PD, Browne RP (2014b) Mixtures of common skew-\(t\) factor analyzers. Stat 3:68–82

    Article  Google Scholar 

  • Pyne S, Hu X, Wang K, Rossin E, Lin TI, Maier LM, Baecher-Allan C, McLachlan GJ, Tamayo P, Hafler DA, De Jager PL, Mesirov JP (2009) Automated high-dimensional flow cytometric data analysis. Proc Natl Acad Sci USA 106:8519–8524

    Article  Google Scholar 

  • Rossin E, Lin TI, Ho HJ, Mentzer SJ, Pyne S (2011) A framework for analytical characterization of monoclonal antibodies based on reactivity profiles in different tissues. Bioinformatics 27:2746–2753

    Article  Google Scholar 

  • Sahu SK, Dey DK, Branco MD (2003) A new class of multivariate skew distributions with application to Bayesian regression models. Can J Stat 31:129–150

    Article  MathSciNet  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MATH  Google Scholar 

  • Sclove LS (1987) Application of model-selection criteria to some problems in multivariate analysis. Psychometrika 52:333–343

  • Spearman C (1904) General intelligence, objectively determined and measured. Am J Psychol 15:201–292

    Article  Google Scholar 

  • Tortora C, McNicholas PD, Browne R (2013) A mixture of generalized hyperbolic factor analyzers. Preprint arXiv: 1311.6530v1

  • Wall MM, Guo J, Amemiya Y (2012) Mixture factor analysis for approximating a non-normally distributed continuous latent factor with continuous and dichotomous observed variables. Multivar Behav Res 47:276–313

    Article  Google Scholar 

  • Wang K, McLachlan GJ, Ng SK, Peel D (2009) EMMIX-skew: EM algorithm for mixture of multivariate skew normal/\(t\) distributions. R package version 1.0-12

  • Wang WL, Lin TI (2013) An efficient ECM algorithm for maximum likelihood estimation in mixtures of \(t\)-factor analyzers. Comput Stat 28:751–769

    Article  Google Scholar 

  • Zacks S (1971) The theory of statistical inference. Wiley, New York

    Google Scholar 

  • Zhang J, Li J, Liu C (2013) Robust factor analysis using the multivariate \(t\)-distribution. unpublished manuscript

Download references

Acknowledgments

We are grateful to the Editor-in-Chief, the Associate Editor and two anonymous referees for their insightful comments and suggestions, which led to a much improved version of this article. This research was supported by MOST 103-2118-M-005-001-MY2 awarded by the Ministry of Science and Technology of Taiwan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tsung-I Lin.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 75 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, TI., Wu, P.H., McLachlan, G.J. et al. A robust factor analysis model using the restricted skew-\(t\) distribution. TEST 24, 510–531 (2015). https://doi.org/10.1007/s11749-014-0422-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-014-0422-2

Keywords

Mathematics Subject Classification

Navigation