Skip to main content
Erschienen in: Advances in Data Analysis and Classification 2/2014

01.06.2014 | Regular Article

Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions

verfasst von: Sanjeena Subedi, Paul D. McNicholas

Erschienen in: Advances in Data Analysis and Classification | Ausgabe 2/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Parameter estimation for model-based clustering using a finite mixture of normal inverse Gaussian (NIG) distributions is achieved through variational Bayes approximations. Univariate NIG mixtures and multivariate NIG mixtures are considered. The use of variational Bayes approximations here is a substantial departure from the traditional EM approach and alleviates some of the associated computational complexities and uncertainties. Our variational algorithm is applied to simulated and real data. The paper concludes with discussion and suggestions for future work.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abramowitz M, Stegun I (1972) Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th edn. Dover Press, New YorkMATH Abramowitz M, Stegun I (1972) Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th edn. Dover Press, New YorkMATH
Zurück zum Zitat Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Second international symposium on information theory, vol 1. Springer, Berlin, pp 267–281 Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Second international symposium on information theory, vol 1. Springer, Berlin, pp 267–281
Zurück zum Zitat Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate t-factor analyzers. Stat Comput 21(3):361–373MathSciNetCrossRef Andrews JL, McNicholas PD (2011) Extending mixtures of multivariate t-factor analyzers. Stat Comput 21(3):361–373MathSciNetCrossRef
Zurück zum Zitat Andrews JL, McNicholas PD (2012) Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions. Stat Comput 22(5):1021–1029MATHMathSciNetCrossRef Andrews JL, McNicholas PD (2012) Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions. Stat Comput 22(5):1021–1029MATHMathSciNetCrossRef
Zurück zum Zitat Andrews JL, McNicholas PD, Subedi S (2011) Model-based classification via mixtures of multivariate t-distributions. Comput Stat Data Anal 55:520–529MATHMathSciNetCrossRef Andrews JL, McNicholas PD, Subedi S (2011) Model-based classification via mixtures of multivariate t-distributions. Comput Stat Data Anal 55:520–529MATHMathSciNetCrossRef
Zurück zum Zitat Baek J, McLachlan GJ (2011) Mixtures of common t-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 27:1269–1276CrossRef Baek J, McLachlan GJ (2011) Mixtures of common t-factor analyzers for clustering high-dimensional microarray data. Bioinformatics 27:1269–1276CrossRef
Zurück zum Zitat Baek J, McLachlan GJ, Flack LK (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32(7):1298–1309CrossRef Baek J, McLachlan GJ, Flack LK (2010) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32(7):1298–1309CrossRef
Zurück zum Zitat Barndorff-Nielsen OE (1997) Normal inverse Gaussian distributions and stochastic volatility modelling. Scand J Stat 24(1):1–13MATHMathSciNetCrossRef Barndorff-Nielsen OE (1997) Normal inverse Gaussian distributions and stochastic volatility modelling. Scand J Stat 24(1):1–13MATHMathSciNetCrossRef
Zurück zum Zitat Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41:164–171MATHMathSciNetCrossRef Baum LE, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41:164–171MATHMathSciNetCrossRef
Zurück zum Zitat Beal MJ (2003) Variational algorithms for approximate Bayesian inference. PhD thesis, University of London Beal MJ (2003) Variational algorithms for approximate Bayesian inference. PhD thesis, University of London
Zurück zum Zitat Bechtel Y, Bonaiti-Pellie C, Poisson N, Magnette J, Bechtel P (1993) A population and family study of \(N\)-acetyltransferase using caffeine urinary metabolites. Clin Pharmacol Ther 54(2):134–141CrossRef Bechtel Y, Bonaiti-Pellie C, Poisson N, Magnette J, Bechtel P (1993) A population and family study of \(N\)-acetyltransferase using caffeine urinary metabolites. Clin Pharmacol Ther 54(2):134–141CrossRef
Zurück zum Zitat Browne RP, McNicholas PD, Sparling MD (2012) Model-based learning using a mixture of mixtures of Gaussian and uniform distributions. IEEE Trans Pattern Anal Machine Intell 34(4):814–817CrossRef Browne RP, McNicholas PD, Sparling MD (2012) Model-based learning using a mixture of mixtures of Gaussian and uniform distributions. IEEE Trans Pattern Anal Machine Intell 34(4):814–817CrossRef
Zurück zum Zitat Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793CrossRef Celeux G, Govaert G (1995) Gaussian parsimonious clustering models. Pattern Recognit 28:781–793CrossRef
Zurück zum Zitat Chhikara RS, Folks JL (1989) The Inverse Gaussian Distribution: Theory, Methodology, and Applications, Statistics: Textbooks and Monographs, vol 95. Marcel Dekker Inc, New York Chhikara RS, Folks JL (1989) The Inverse Gaussian Distribution: Theory, Methodology, and Applications, Statistics: Textbooks and Monographs, vol 95. Marcel Dekker Inc, New York
Zurück zum Zitat Corduneanu A, Bishop CM (2001) Variational Bayesian model selection for mixture distributions. Artificial Intelligence and Statistics. Morgan Kaufmann, Los Altos, pp 27–34 Corduneanu A, Bishop CM (2001) Variational Bayesian model selection for mixture distributions. Artificial Intelligence and Statistics. Morgan Kaufmann, Los Altos, pp 27–34
Zurück zum Zitat Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38MATHMathSciNet Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38MATHMathSciNet
Zurück zum Zitat Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MATHMathSciNetCrossRef Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631MATHMathSciNetCrossRef
Zurück zum Zitat Ghahramani Z, Hinton GE (1997) The EM algorithm for factor analyzers. Tech. Rep. CRG-TR-96-1, University of Toronto, Toronto Ghahramani Z, Hinton GE (1997) The EM algorithm for factor analyzers. Tech. Rep. CRG-TR-96-1, University of Toronto, Toronto
Zurück zum Zitat Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc Ser B 58(1):155–176MATHMathSciNet Hastie T, Tibshirani R (1996) Discriminant analysis by Gaussian mixtures. J R Stat Soc Ser B 58(1):155–176MATHMathSciNet
Zurück zum Zitat Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218CrossRef Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218CrossRef
Zurück zum Zitat Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233MATHCrossRef Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Mach Learn 37:183–233MATHCrossRef
Zurück zum Zitat Jørgensen B (1982) Statistical Properties of the Generalized Inverse Gaussian Distribution, vol 21. Springer, New YorkCrossRef Jørgensen B (1982) Statistical Properties of the Generalized Inverse Gaussian Distribution, vol 21. Springer, New YorkCrossRef
Zurück zum Zitat Karlis D, Lillestol J (2004) Bayesian estimation of NIG models via Markov chain Monte Carlo methods. Appl Stoch Models Business Ind 20:323–338MATHMathSciNetCrossRef Karlis D, Lillestol J (2004) Bayesian estimation of NIG models via Markov chain Monte Carlo methods. Appl Stoch Models Business Ind 20:323–338MATHMathSciNetCrossRef
Zurück zum Zitat Karlis D, Santourian A (2009) Model-based clustering with non-elliptically contoured distributions. Stat Comput 19(1):73–83MathSciNetCrossRef Karlis D, Santourian A (2009) Model-based clustering with non-elliptically contoured distributions. Stat Comput 19(1):73–83MathSciNetCrossRef
Zurück zum Zitat Lillestol J (2000) Risk analysis and the NIG distribution. J Risk 2:41–56 Lillestol J (2000) Risk analysis and the NIG distribution. J Risk 2:41–56
Zurück zum Zitat McGrory CA, Titterington DM (2007) Variational approximations in Bayesian model selection for finite mixture distributions. Comput Stat Data Anal 51:5352–5367MATHMathSciNetCrossRef McGrory CA, Titterington DM (2007) Variational approximations in Bayesian model selection for finite mixture distributions. Comput Stat Data Anal 51:5352–5367MATHMathSciNetCrossRef
Zurück zum Zitat McLachlan GJ, Peel D (2000) Mixtures of factor analyzers. Proceedings of the seventh international conference on machine learning. Morgan Kaufmann, San Francisco, pp 599–606 McLachlan GJ, Peel D (2000) Mixtures of factor analyzers. Proceedings of the seventh international conference on machine learning. Morgan Kaufmann, San Francisco, pp 599–606
Zurück zum Zitat McNicholas PD, Murphy TB (2010) Model-based clustering of longitudinal data. Can J Stat 38(1):153–168MATHMathSciNet McNicholas PD, Murphy TB (2010) Model-based clustering of longitudinal data. Can J Stat 38(1):153–168MATHMathSciNet
Zurück zum Zitat McNicholas PD, Subedi S (2012) Clustering gene expression time course data using mixtures of multivariate t-distributions. J Stat Plan Infer 142(5):1114–1127MATHMathSciNetCrossRef McNicholas PD, Subedi S (2012) Clustering gene expression time course data using mixtures of multivariate t-distributions. J Stat Plan Infer 142(5):1114–1127MATHMathSciNetCrossRef
Zurück zum Zitat McNicholas PD, Murphy TB, McDaid AF, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54(3):711–723MATHMathSciNetCrossRef McNicholas PD, Murphy TB, McDaid AF, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54(3):711–723MATHMathSciNetCrossRef
Zurück zum Zitat Morris K, McNicholas PD (2013a) Dimension reduction for model-based clustering via mixtures of shifted asymmetric Laplace distributions. Stat Probab Lett 83(9):2088–2093MATHMathSciNetCrossRef Morris K, McNicholas PD (2013a) Dimension reduction for model-based clustering via mixtures of shifted asymmetric Laplace distributions. Stat Probab Lett 83(9):2088–2093MATHMathSciNetCrossRef
Zurück zum Zitat Morris K, McNicholas PD (2013b) Non-Gaussian mixtures for dimension reduction, clustering, classification, and discriminant analysis. arXiv:1308.6315 Morris K, McNicholas PD (2013b) Non-Gaussian mixtures for dimension reduction, clustering, classification, and discriminant analysis. arXiv:​1308.​6315
Zurück zum Zitat Morris K, McNicholas PD, Scrucca L (2013) Dimension reduction for model-based clustering via mixtures of multivariate t-distributions. Adv Data Anal Classif 7(3):321–338MATHMathSciNetCrossRef Morris K, McNicholas PD, Scrucca L (2013) Dimension reduction for model-based clustering via mixtures of multivariate t-distributions. Adv Data Anal Classif 7(3):321–338MATHMathSciNetCrossRef
Zurück zum Zitat Orchard T, Woodbury MA (1972) A missing information principle: theory and applications. In: Le Cam LM, Neyman J, Scott EL (eds) Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, vol 1., Theory of StatisticsUniversity of California Press, Berkeley, pp 697–715 Orchard T, Woodbury MA (1972) A missing information principle: theory and applications. In: Le Cam LM, Neyman J, Scott EL (eds) Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, vol 1., Theory of StatisticsUniversity of California Press, Berkeley, pp 697–715
Zurück zum Zitat Punzo A, McNicholas PD (2013) Outlier detection via parsimonious mixtures of contaminated Gaussian distributions. arXiv:1305.4669 Punzo A, McNicholas PD (2013) Outlier detection via parsimonious mixtures of contaminated Gaussian distributions. arXiv:​1305.​4669
Zurück zum Zitat Seshadri V (1993) The inverse Gaussian distribution: a case study in exponential families. Oxford University Press, New York Seshadri V (1993) The inverse Gaussian distribution: a case study in exponential families. Oxford University Press, New York
Zurück zum Zitat Steane MA, McNicholas PD, Yada R (2012) Model-based classification via mixtures of multivariate t-factor analyzers. Commun Stat 41(4):510–523MATHMathSciNetCrossRef Steane MA, McNicholas PD, Yada R (2012) Model-based classification via mixtures of multivariate t-factor analyzers. Commun Stat 41(4):510–523MATHMathSciNetCrossRef
Zurück zum Zitat Sundberg R (1974) Maximum likelihood theory for incomplete data from an exponential family. Scand J Stat 1:49–58MATHMathSciNet Sundberg R (1974) Maximum likelihood theory for incomplete data from an exponential family. Scand J Stat 1:49–58MATHMathSciNet
Zurück zum Zitat Teschendorff A, Wang Y, Barbosa-Morais N, Brenton J, Caldas C (2005) A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data. Bioinformatics 21(13):3025–3033CrossRef Teschendorff A, Wang Y, Barbosa-Morais N, Brenton J, Caldas C (2005) A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data. Bioinformatics 21(13):3025–3033CrossRef
Zurück zum Zitat Titterington DM, Smith AFM, Makov UE (1985) Statistical Analysis of Finite Mixture Distributions. Wiley, ChichesterMATH Titterington DM, Smith AFM, Makov UE (1985) Statistical Analysis of Finite Mixture Distributions. Wiley, ChichesterMATH
Zurück zum Zitat Todorov V, Filzmoser P (2009) An object-oriented framework for robust multivariate analysis. J Stat Softw 32(3):1–47 Todorov V, Filzmoser P (2009) An object-oriented framework for robust multivariate analysis. J Stat Softw 32(3):1–47
Zurück zum Zitat Venables WN, Ripley BD (2002) Modern Applied Statistics with S, 4th edn. Springer, New YorkMATHCrossRef Venables WN, Ripley BD (2002) Modern Applied Statistics with S, 4th edn. Springer, New YorkMATHCrossRef
Zurück zum Zitat Vrbik I, McNicholas PD (2012) Analytic calculations for the EM algorithm for multivariate skew-t mixture models. Stat Probab Lett 82(6):1169–1174MATHMathSciNetCrossRef Vrbik I, McNicholas PD (2012) Analytic calculations for the EM algorithm for multivariate skew-t mixture models. Stat Probab Lett 82(6):1169–1174MATHMathSciNetCrossRef
Zurück zum Zitat Vrbik I, McNicholas PD (2014) Parsimonious skew mixture models for model-based clustering and classification. Comput Stat Data Anal 71:196–210MathSciNetCrossRef Vrbik I, McNicholas PD (2014) Parsimonious skew mixture models for model-based clustering and classification. Comput Stat Data Anal 71:196–210MathSciNetCrossRef
Zurück zum Zitat Waterhouse S, MacKay D, Robinson T (1996) Bayesian methods for mixture of experts. In: Advances in neural information processing systems, vol 8. MIT Press, Cambridge Waterhouse S, MacKay D, Robinson T (1996) Bayesian methods for mixture of experts. In: Advances in neural information processing systems, vol 8. MIT Press, Cambridge
Zurück zum Zitat Wolfe JH (1963) Object cluster analysis of social areas. Master’s thesis, University of California, Berkeley Wolfe JH (1963) Object cluster analysis of social areas. Master’s thesis, University of California, Berkeley
Metadaten
Titel
Variational Bayes approximations for clustering via mixtures of normal inverse Gaussian distributions
verfasst von
Sanjeena Subedi
Paul D. McNicholas
Publikationsdatum
01.06.2014
Verlag
Springer Berlin Heidelberg
Erschienen in
Advances in Data Analysis and Classification / Ausgabe 2/2014
Print ISSN: 1862-5347
Elektronische ISSN: 1862-5355
DOI
https://doi.org/10.1007/s11634-014-0165-7

Weitere Artikel der Ausgabe 2/2014

Advances in Data Analysis and Classification 2/2014 Zur Ausgabe

Premium Partner