Copula analysis of mixture models

Vrac, M.; Billard, L.; Diday, E.; Chédin, A.

doi:10.1007/s00180-011-0266-0

Copula analysis of mixture models

Original Paper
Published: 28 June 2011

Volume 27, pages 427–457, (2012)
Cite this article

Computational Statistics Aims and scope Submit manuscript

M. Vrac¹,
L. Billard²,
E. Diday³ &
…
A. Chédin⁴

777 Accesses
37 Citations
Explore all metrics

Abstract

Contemporary computers collect databases that can be too large for classical methods to handle. The present work takes data whose observations are distribution functions (rather than the single numerical point value of classical data) and presents a computational statistical approach of a new methodology to group the distributions into classes. The clustering method links the searched partition to the decomposition of mixture densities, through the notions of a function of distributions and of multi-dimensional copulas. The new clustering technique is illustrated by ascertaining distinct temperature and humidity regions for a global climate dataset and shows that the results compare favorably with those obtained from the standard EM algorithm method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Achard V (1991) Trois Problemes dés de d’Analyse 3D de la Structure Thermodynamique de l’Atmosphére par Satellite: Mesure du Contenu en Ozone; Classification des Masses d’Air; Modélisation Hyper Rapide du Transfert Radiatif. Ph.D. Dissertation, University of Paris
Ali MM, Mikhail NN, Haq MS (1978) A class of bivariate distributions including the bivariate logistic. J Multivar Anal 8: 405–412
Article MathSciNet MATH Google Scholar
Arabie P, Carroll JD (1980) MAPCLUS: a mathematical programming approach to fitting the ADCLUS model. Psychometrika 45: 211–235
Article MATH Google Scholar
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803–821
Article MathSciNet MATH Google Scholar
Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, Oxford
Google Scholar
Bock HH (1998) Clustering and neural networks. In: Rizzi A, Vichi M, Bock HH (eds) Advances in data science and classification. Springer, Berlin, pp 265–277
Chapter Google Scholar
Bock RD, Gibbons RD (1996) High-dimensional multivariate probit analysis. Biometrics 52: 1183–1194
Article MathSciNet MATH Google Scholar
Brossier G (1990) Piecewise hierarchical clustering. J Classif 7: 197–216
Article MathSciNet MATH Google Scholar
Celeux G, Diday E, Govaert G, Lechevallier Y, Ralambondrainy H (1989) Classification automatique des données. Dunod Informatique, Paris
Google Scholar
Celeux G, Diebolt J (1986) L’Algorithme SEM: Un algorithme d’apprentissage probabiliste pour la reconnaissance de mélange de densities. Revue de Statistiques Appliquées 34: 35–51
MATH Google Scholar
Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14: 315–332
Article MathSciNet MATH Google Scholar
Celeux G, Govaert G (1993) Comparison of the mixture and the classification maximum likelihood in cluster analysis. J Stat Comput Simul 47: 127–146
Article Google Scholar
Chédin A, Scott N, Wahiche C, Moulinier P (1985) The improved initialization inversion method: a high resolution physical method for temperature retrievals from satellites of tiros-n series. J Appl Meteorol 24: 128–143
Article Google Scholar
Chan JSK, Kuk AYC (1997) Maximum likelihood estimation for probit-linear mixed models with correlated random effects. Biometrics 53: 86–97
Article MathSciNet MATH Google Scholar
Clayton DG (1978) A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65: 141–151
Article MathSciNet MATH Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39: 1–38
MathSciNet MATH Google Scholar
Diday E (1984) Une représentation visuelle des classes empietantes: les pyramides. Rapport de Recherche 291 INRIA
Diday E (2001) A generalization of the mixture decomposition problem in the symbolic data analysis framework. Rapport de Recherche, CEREMADE 112: 1–14
Google Scholar
Diday E, Schroeder A, Ok Y (1974) The dynamic clusters method in pattern recognition. In: Proceedings of international federation for information processing congress. Elsevier, New York, pp 691–697
Diday E, Vrac M (2005) Mixture decomposition of distributions by copulas in the symbolic data analysis framework. Discrete Appl Math 147: 27–41
Article MathSciNet MATH Google Scholar
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis and density estimation. J Am Stat Assoc 97: 611–631
Article MathSciNet MATH Google Scholar
Frank MJ (1979) On the simultaneous associativity of F(x, y) and x + y − F(x, y). Aequationes Mathematicae 19: 194–226
Article MathSciNet MATH Google Scholar
Genest C, Ghoudi K (1994) Une famille de lois bidimensionelles insolite. Compte Rendus Academy Sciences Paris I 318: 351–354
MathSciNet MATH Google Scholar
Genest C, MacKay J (1986) The joy of copulas: bivariate distributions with uniform marginals. Am Stat 40: 280–283
MathSciNet Google Scholar
Genest C, Rivest LP (1993) Statistical inference procedures for bivariate Archimedean copulas. J Am Stat Assoc 88: 1034–1043
Article MathSciNet MATH Google Scholar
Gordon A (1999) Classification. 2nd edn. Chapman and Hall, Boca Raton
MATH Google Scholar
Hartigan JA, Wong MA (1979) Algorithm AS136. A k-means clustering algorithm. Appl Stat 28: 100–108
Article MATH Google Scholar
Hillali Y (1998) Analyse et modélisation des données probabilistes: Capacités et lois multidimensionelles. Ph.D. Dissertation, University of Paris
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, New Jersey
MATH Google Scholar
James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98: 397–408
Article MathSciNet MATH Google Scholar
Kuk AYC, Chan JSK (2001) Three ways of implementing the EM algorithm when parameters are not identifiable. Biometric J 43: 207–218
Article MathSciNet MATH Google Scholar
Li LA, Sedransk N (1988) Mixtures of distributions: a topological approach. Ann Stat 16: 1623–1634
Article MathSciNet MATH Google Scholar
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
Meng XL, Rubin DB (1991) Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm. J Am Stat Assoc 86: 899–909
Article Google Scholar
Nelsen RB (1999) An introduction to copulas. Springer, New York
MATH Google Scholar
Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33: 1065–1076
Article MathSciNet MATH Google Scholar
Prakasa Rao BLS (1983) Nonparametric functional estimation. Academic Press, New York
MATH Google Scholar
Redner RA, Walker H (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26: 195–239
Article MathSciNet MATH Google Scholar
Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc Ser B 59: 731–792
Article MathSciNet MATH Google Scholar
Schroeder A (1976) Analyse d’un mélange de distributions de probabilité de měme type. Revue de Statistiques Appliquées 24: 39–62
MathSciNet Google Scholar
Schweizer B, Sklar A (1983) Probabilistic metric spaces. North-Holland, New York
MATH Google Scholar
Schweizer B (1984) Distributions are the numbers of the future. In: diNola A, Ventre A (eds) Proceedings of the mathematics of fuzzy systems meeting, Naples, Italy. University of Naples, Naples, pp 137–149
Google Scholar
Scott AJ, Symons MJ (1971) Clustering methods based on likelihood ratio criteria. Biometrics 27: 387–397
Article Google Scholar
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
MATH Google Scholar
Sklar A (1959) Fonction de répartition a n dimensions et leurs marges. Institute Statistics Université de Paris 8: 229–231
MathSciNet Google Scholar
Symons MJ (1981) Clustering criteria and multivariate normal mixtures. Biometrics 37: 35–43
Article MathSciNet MATH Google Scholar
Tanner MA, Wong WH (1987) The calculation of posterior distribution by data augmentation (with discussion). J Am Stat Assoc 82: 528–550
Article MathSciNet MATH Google Scholar
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York
MATH Google Scholar
Vrac M (2002) Analyse et modélisation de données probabilistes par decomposition de mélange de copules et application á une base de données climatologiques. Ph.D. Dissertation, University of Paris
Vrac M, Chédin A, Diday E (2005) Clustering a global field of atmospheric profiles by mixture decomposition of copulas. J Atmos Ocean Technol 22: 1445–1459
Article Google Scholar
Wei GCG, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc 85: 699–704
Article Google Scholar
Winsberg S, DeSoete G (1999) Latent class models for time series analysis. Appl Stoch Models Bus Ind 15: 183–194
Article MATH Google Scholar
Yakowitz SJ, Spragins LD (1968) On the identifiability of finite mixtures. Ann Math Stat 39: 209–214
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire de Sciences du Climat et de l’Environment, IPSL-CNRS/CEA/UVSQ, Centre d’Etudes de Saclay, Orme des Merisiers, 91191, Gif-sur-Yvette, France
M. Vrac
Department of Statistics, University of Georgia, Athens, GA, 30602, USA
L. Billard
CEREMADE, University of Paris Dauphine, Place du Maréchal de Lattre-de-Tassigny, 75775, Paris, France
E. Diday
Laboratoire de Meteorologie, Dynamique/IPSL, Ecole Polytechnique, 91128, Palaiseau, France
A. Chédin

Authors

M. Vrac
View author publications
You can also search for this author in PubMed Google Scholar
L. Billard
View author publications
You can also search for this author in PubMed Google Scholar
E. Diday
View author publications
You can also search for this author in PubMed Google Scholar
A. Chédin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to L. Billard.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vrac, M., Billard, L., Diday, E. et al. Copula analysis of mixture models. Comput Stat 27, 427–457 (2012). https://doi.org/10.1007/s00180-011-0266-0

Download citation

Received: 10 February 2010
Accepted: 08 June 2011
Published: 28 June 2011
Issue Date: September 2012
DOI: https://doi.org/10.1007/s00180-011-0266-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Copula analysis of mixture models

Abstract

Access this article

Similar content being viewed by others

Model-based clustering using copulas with applications

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

The parsimonious Gaussian mixture models with partitioned parameters and their application in clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Copula analysis of mixture models

Abstract

Access this article

Similar content being viewed by others

Model-based clustering using copulas with applications

Model-based clustering via new parsimonious mixtures of heavy-tailed distributions

The parsimonious Gaussian mixture models with partitioned parameters and their application in clustering

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation