Skip to main content
Log in

Copula analysis of mixture models

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Contemporary computers collect databases that can be too large for classical methods to handle. The present work takes data whose observations are distribution functions (rather than the single numerical point value of classical data) and presents a computational statistical approach of a new methodology to group the distributions into classes. The clustering method links the searched partition to the decomposition of mixture densities, through the notions of a function of distributions and of multi-dimensional copulas. The new clustering technique is illustrated by ascertaining distinct temperature and humidity regions for a global climate dataset and shows that the results compare favorably with those obtained from the standard EM algorithm method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Achard V (1991) Trois Problemes dés de d’Analyse 3D de la Structure Thermodynamique de l’Atmosphére par Satellite: Mesure du Contenu en Ozone; Classification des Masses d’Air; Modélisation Hyper Rapide du Transfert Radiatif. Ph.D. Dissertation, University of Paris

  • Ali MM, Mikhail NN, Haq MS (1978) A class of bivariate distributions including the bivariate logistic. J Multivar Anal 8: 405–412

    Article  MathSciNet  MATH  Google Scholar 

  • Arabie P, Carroll JD (1980) MAPCLUS: a mathematical programming approach to fitting the ADCLUS model. Psychometrika 45: 211–235

    Article  MATH  Google Scholar 

  • Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803–821

    Article  MathSciNet  MATH  Google Scholar 

  • Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, Oxford

    Google Scholar 

  • Bock HH (1998) Clustering and neural networks. In: Rizzi A, Vichi M, Bock HH (eds) Advances in data science and classification. Springer, Berlin, pp 265–277

    Chapter  Google Scholar 

  • Bock RD, Gibbons RD (1996) High-dimensional multivariate probit analysis. Biometrics 52: 1183–1194

    Article  MathSciNet  MATH  Google Scholar 

  • Brossier G (1990) Piecewise hierarchical clustering. J Classif 7: 197–216

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux G, Diday E, Govaert G, Lechevallier Y, Ralambondrainy H (1989) Classification automatique des données. Dunod Informatique, Paris

    Google Scholar 

  • Celeux G, Diebolt J (1986) L’Algorithme SEM: Un algorithme d’apprentissage probabiliste pour la reconnaissance de mélange de densities. Revue de Statistiques Appliquées 34: 35–51

    MATH  Google Scholar 

  • Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14: 315–332

    Article  MathSciNet  MATH  Google Scholar 

  • Celeux G, Govaert G (1993) Comparison of the mixture and the classification maximum likelihood in cluster analysis. J Stat Comput Simul 47: 127–146

    Article  Google Scholar 

  • Chédin A, Scott N, Wahiche C, Moulinier P (1985) The improved initialization inversion method: a high resolution physical method for temperature retrievals from satellites of tiros-n series. J Appl Meteorol 24: 128–143

    Article  Google Scholar 

  • Chan JSK, Kuk AYC (1997) Maximum likelihood estimation for probit-linear mixed models with correlated random effects. Biometrics 53: 86–97

    Article  MathSciNet  MATH  Google Scholar 

  • Clayton DG (1978) A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65: 141–151

    Article  MathSciNet  MATH  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39: 1–38

    MathSciNet  MATH  Google Scholar 

  • Diday E (1984) Une représentation visuelle des classes empietantes: les pyramides. Rapport de Recherche 291 INRIA

  • Diday E (2001) A generalization of the mixture decomposition problem in the symbolic data analysis framework. Rapport de Recherche, CEREMADE 112: 1–14

    Google Scholar 

  • Diday E, Schroeder A, Ok Y (1974) The dynamic clusters method in pattern recognition. In: Proceedings of international federation for information processing congress. Elsevier, New York, pp 691–697

  • Diday E, Vrac M (2005) Mixture decomposition of distributions by copulas in the symbolic data analysis framework. Discrete Appl Math 147: 27–41

    Article  MathSciNet  MATH  Google Scholar 

  • Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis and density estimation. J Am Stat Assoc 97: 611–631

    Article  MathSciNet  MATH  Google Scholar 

  • Frank MJ (1979) On the simultaneous associativity of F(x, y) and x + yF(x, y). Aequationes Mathematicae 19: 194–226

    Article  MathSciNet  MATH  Google Scholar 

  • Genest C, Ghoudi K (1994) Une famille de lois bidimensionelles insolite. Compte Rendus Academy Sciences Paris I 318: 351–354

    MathSciNet  MATH  Google Scholar 

  • Genest C, MacKay J (1986) The joy of copulas: bivariate distributions with uniform marginals. Am Stat 40: 280–283

    MathSciNet  Google Scholar 

  • Genest C, Rivest LP (1993) Statistical inference procedures for bivariate Archimedean copulas. J Am Stat Assoc 88: 1034–1043

    Article  MathSciNet  MATH  Google Scholar 

  • Gordon A (1999) Classification. 2nd edn. Chapman and Hall, Boca Raton

    MATH  Google Scholar 

  • Hartigan JA, Wong MA (1979) Algorithm AS136. A k-means clustering algorithm. Appl Stat 28: 100–108

    Article  MATH  Google Scholar 

  • Hillali Y (1998) Analyse et modélisation des données probabilistes: Capacités et lois multidimensionelles. Ph.D. Dissertation, University of Paris

  • Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, New Jersey

    MATH  Google Scholar 

  • James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98: 397–408

    Article  MathSciNet  MATH  Google Scholar 

  • Kuk AYC, Chan JSK (2001) Three ways of implementing the EM algorithm when parameters are not identifiable. Biometric J 43: 207–218

    Article  MathSciNet  MATH  Google Scholar 

  • Li LA, Sedransk N (1988) Mixtures of distributions: a topological approach. Ann Stat 16: 1623–1634

    Article  MathSciNet  MATH  Google Scholar 

  • McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York

    Book  MATH  Google Scholar 

  • Meng XL, Rubin DB (1991) Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm. J Am Stat Assoc 86: 899–909

    Article  Google Scholar 

  • Nelsen RB (1999) An introduction to copulas. Springer, New York

    MATH  Google Scholar 

  • Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33: 1065–1076

    Article  MathSciNet  MATH  Google Scholar 

  • Prakasa Rao BLS (1983) Nonparametric functional estimation. Academic Press, New York

    MATH  Google Scholar 

  • Redner RA, Walker H (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26: 195–239

    Article  MathSciNet  MATH  Google Scholar 

  • Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc Ser B 59: 731–792

    Article  MathSciNet  MATH  Google Scholar 

  • Schroeder A (1976) Analyse d’un mélange de distributions de probabilité de měme type. Revue de Statistiques Appliquées 24: 39–62

    MathSciNet  Google Scholar 

  • Schweizer B, Sklar A (1983) Probabilistic metric spaces. North-Holland, New York

    MATH  Google Scholar 

  • Schweizer B (1984) Distributions are the numbers of the future. In: diNola A, Ventre A (eds) Proceedings of the mathematics of fuzzy systems meeting, Naples, Italy. University of Naples, Naples, pp 137–149

    Google Scholar 

  • Scott AJ, Symons MJ (1971) Clustering methods based on likelihood ratio criteria. Biometrics 27: 387–397

    Article  Google Scholar 

  • Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London

    MATH  Google Scholar 

  • Sklar A (1959) Fonction de répartition a n dimensions et leurs marges. Institute Statistics Université de Paris 8: 229–231

    MathSciNet  Google Scholar 

  • Symons MJ (1981) Clustering criteria and multivariate normal mixtures. Biometrics 37: 35–43

    Article  MathSciNet  MATH  Google Scholar 

  • Tanner MA, Wong WH (1987) The calculation of posterior distribution by data augmentation (with discussion). J Am Stat Assoc 82: 528–550

    Article  MathSciNet  MATH  Google Scholar 

  • Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York

    MATH  Google Scholar 

  • Vrac M (2002) Analyse et modélisation de données probabilistes par decomposition de mélange de copules et application á une base de données climatologiques. Ph.D. Dissertation, University of Paris

  • Vrac M, Chédin A, Diday E (2005) Clustering a global field of atmospheric profiles by mixture decomposition of copulas. J Atmos Ocean Technol 22: 1445–1459

    Article  Google Scholar 

  • Wei GCG, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc 85: 699–704

    Article  Google Scholar 

  • Winsberg S, DeSoete G (1999) Latent class models for time series analysis. Appl Stoch Models Bus Ind 15: 183–194

    Article  MATH  Google Scholar 

  • Yakowitz SJ, Spragins LD (1968) On the identifiability of finite mixtures. Ann Math Stat 39: 209–214

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to L. Billard.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vrac, M., Billard, L., Diday, E. et al. Copula analysis of mixture models. Comput Stat 27, 427–457 (2012). https://doi.org/10.1007/s00180-011-0266-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-011-0266-0

Keywords

Navigation