Abstract
Contemporary computers collect databases that can be too large for classical methods to handle. The present work takes data whose observations are distribution functions (rather than the single numerical point value of classical data) and presents a computational statistical approach of a new methodology to group the distributions into classes. The clustering method links the searched partition to the decomposition of mixture densities, through the notions of a function of distributions and of multi-dimensional copulas. The new clustering technique is illustrated by ascertaining distinct temperature and humidity regions for a global climate dataset and shows that the results compare favorably with those obtained from the standard EM algorithm method.
Similar content being viewed by others
References
Achard V (1991) Trois Problemes dés de d’Analyse 3D de la Structure Thermodynamique de l’Atmosphére par Satellite: Mesure du Contenu en Ozone; Classification des Masses d’Air; Modélisation Hyper Rapide du Transfert Radiatif. Ph.D. Dissertation, University of Paris
Ali MM, Mikhail NN, Haq MS (1978) A class of bivariate distributions including the bivariate logistic. J Multivar Anal 8: 405–412
Arabie P, Carroll JD (1980) MAPCLUS: a mathematical programming approach to fitting the ADCLUS model. Psychometrika 45: 211–235
Banfield JD, Raftery AE (1993) Model-based Gaussian and non-Gaussian clustering. Biometrics 49: 803–821
Bishop CM (1995) Neural networks for pattern recognition. Clarendon Press, Oxford
Bock HH (1998) Clustering and neural networks. In: Rizzi A, Vichi M, Bock HH (eds) Advances in data science and classification. Springer, Berlin, pp 265–277
Bock RD, Gibbons RD (1996) High-dimensional multivariate probit analysis. Biometrics 52: 1183–1194
Brossier G (1990) Piecewise hierarchical clustering. J Classif 7: 197–216
Celeux G, Diday E, Govaert G, Lechevallier Y, Ralambondrainy H (1989) Classification automatique des données. Dunod Informatique, Paris
Celeux G, Diebolt J (1986) L’Algorithme SEM: Un algorithme d’apprentissage probabiliste pour la reconnaissance de mélange de densities. Revue de Statistiques Appliquées 34: 35–51
Celeux G, Govaert G (1992) A classification EM algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14: 315–332
Celeux G, Govaert G (1993) Comparison of the mixture and the classification maximum likelihood in cluster analysis. J Stat Comput Simul 47: 127–146
Chédin A, Scott N, Wahiche C, Moulinier P (1985) The improved initialization inversion method: a high resolution physical method for temperature retrievals from satellites of tiros-n series. J Appl Meteorol 24: 128–143
Chan JSK, Kuk AYC (1997) Maximum likelihood estimation for probit-linear mixed models with correlated random effects. Biometrics 53: 86–97
Clayton DG (1978) A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65: 141–151
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39: 1–38
Diday E (1984) Une représentation visuelle des classes empietantes: les pyramides. Rapport de Recherche 291 INRIA
Diday E (2001) A generalization of the mixture decomposition problem in the symbolic data analysis framework. Rapport de Recherche, CEREMADE 112: 1–14
Diday E, Schroeder A, Ok Y (1974) The dynamic clusters method in pattern recognition. In: Proceedings of international federation for information processing congress. Elsevier, New York, pp 691–697
Diday E, Vrac M (2005) Mixture decomposition of distributions by copulas in the symbolic data analysis framework. Discrete Appl Math 147: 27–41
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis and density estimation. J Am Stat Assoc 97: 611–631
Frank MJ (1979) On the simultaneous associativity of F(x, y) and x + y − F(x, y). Aequationes Mathematicae 19: 194–226
Genest C, Ghoudi K (1994) Une famille de lois bidimensionelles insolite. Compte Rendus Academy Sciences Paris I 318: 351–354
Genest C, MacKay J (1986) The joy of copulas: bivariate distributions with uniform marginals. Am Stat 40: 280–283
Genest C, Rivest LP (1993) Statistical inference procedures for bivariate Archimedean copulas. J Am Stat Assoc 88: 1034–1043
Gordon A (1999) Classification. 2nd edn. Chapman and Hall, Boca Raton
Hartigan JA, Wong MA (1979) Algorithm AS136. A k-means clustering algorithm. Appl Stat 28: 100–108
Hillali Y (1998) Analyse et modélisation des données probabilistes: Capacités et lois multidimensionelles. Ph.D. Dissertation, University of Paris
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, New Jersey
James GM, Sugar CA (2003) Clustering for sparsely sampled functional data. J Am Stat Assoc 98: 397–408
Kuk AYC, Chan JSK (2001) Three ways of implementing the EM algorithm when parameters are not identifiable. Biometric J 43: 207–218
Li LA, Sedransk N (1988) Mixtures of distributions: a topological approach. Ann Stat 16: 1623–1634
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Meng XL, Rubin DB (1991) Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm. J Am Stat Assoc 86: 899–909
Nelsen RB (1999) An introduction to copulas. Springer, New York
Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33: 1065–1076
Prakasa Rao BLS (1983) Nonparametric functional estimation. Academic Press, New York
Redner RA, Walker H (1984) Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev 26: 195–239
Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc Ser B 59: 731–792
Schroeder A (1976) Analyse d’un mélange de distributions de probabilité de měme type. Revue de Statistiques Appliquées 24: 39–62
Schweizer B, Sklar A (1983) Probabilistic metric spaces. North-Holland, New York
Schweizer B (1984) Distributions are the numbers of the future. In: diNola A, Ventre A (eds) Proceedings of the mathematics of fuzzy systems meeting, Naples, Italy. University of Naples, Naples, pp 137–149
Scott AJ, Symons MJ (1971) Clustering methods based on likelihood ratio criteria. Biometrics 27: 387–397
Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London
Sklar A (1959) Fonction de répartition a n dimensions et leurs marges. Institute Statistics Université de Paris 8: 229–231
Symons MJ (1981) Clustering criteria and multivariate normal mixtures. Biometrics 37: 35–43
Tanner MA, Wong WH (1987) The calculation of posterior distribution by data augmentation (with discussion). J Am Stat Assoc 82: 528–550
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, New York
Vrac M (2002) Analyse et modélisation de données probabilistes par decomposition de mélange de copules et application á une base de données climatologiques. Ph.D. Dissertation, University of Paris
Vrac M, Chédin A, Diday E (2005) Clustering a global field of atmospheric profiles by mixture decomposition of copulas. J Atmos Ocean Technol 22: 1445–1459
Wei GCG, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc 85: 699–704
Winsberg S, DeSoete G (1999) Latent class models for time series analysis. Appl Stoch Models Bus Ind 15: 183–194
Yakowitz SJ, Spragins LD (1968) On the identifiability of finite mixtures. Ann Math Stat 39: 209–214
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vrac, M., Billard, L., Diday, E. et al. Copula analysis of mixture models. Comput Stat 27, 427–457 (2012). https://doi.org/10.1007/s00180-011-0266-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-011-0266-0