Elsevier

Pattern Recognition

Volume 47, Issue 9, September 2014, Pages 3143-3157
Pattern Recognition

Bayesian estimation of Dirichlet mixture model with variational inference

https://doi.org/10.1016/j.patcog.2014.04.002Get rights and content

Highlights

  • An analytically tractable solution for Bayesian estimation of the Dirichlet mixture model.

  • Relative convexity of the multivariate log-inverse-gamma function is proved and utilized.

  • The free energy function is approximated by a single lower-bound to guarantee convergence.

  • The method outperforms the ML based method and the VI based method, moreover, it is comparable to the sampling based method.

  • The performances are demonstrated with important multimedia signal processing applications.

Abstract

In statistical modeling, parameter estimation is an essential and challengeable task. Estimation of the parameters in the Dirichlet mixture model (DMM) is analytically intractable, due to the integral expressions of the gamma function and its corresponding derivatives. We introduce a Bayesian estimation strategy to estimate the posterior distribution of the parameters in DMM. By assuming the gamma distribution as the prior to each parameter, we approximate both the prior and the posterior distribution of the parameters with a product of several mutually independent gamma distributions. The extended factorized approximation method is applied to introduce a single lower-bound to the variational objective function and an analytically tractable estimation solution is derived. Moreover, there is only one function that is maximized during iterations and, therefore, the convergence of the proposed algorithm is theoretically guaranteed. With synthesized data, the proposed method shows the advantages over the EM-based method and the previously proposed Bayesian estimation method. With two important multimedia signal processing applications, the good performance of the proposed Bayesian estimation method is demonstrated.

Introduction

Statistical modeling plays an important role in various research areas [1], [2], [3]. It provides a way to connect the data with the statistics. An essential part in statistical modeling is to estimate the values of the parameters in the distribution or to estimate the distribution of the parameters, if we consider them as random variables. The maximum likelihood (ML) estimation method gives point estimates to the parameters and disregards the remaining uncertainty in the estimation. Rather than taking the point estimates, the Bayesian estimation method gives the posterior probability distributions over all model parameters, using the observed data together with the prior distributions [3]. In general, compared to the ML estimation, the Bayesian estimation of the parameters in a statistical model could yield a robust and stable estimate, by including the resulting uncertainty into the estimation, especially when the amount of the observed data is small [4].

The Gaussian distribution and the corresponding Gaussian mixture model (GMM) are widely used to model the underlying distribution of the data. However, not all data we would like to model can be safely assumed to be Gaussian distributed [5]. Recently, the studies of non-Gaussian statistical models have become popular for the purpose of modeling bounded or semi-bounded data (see e.g., [6], [7], [8], [9]). The non-Gaussian statistical models include, among others, the beta distribution, the gamma distribution, and the Dirichlet distribution.

The Dirichlet distribution and the corresponding Dirichlet mixture model (DMM) were frequently applied to model proportional data, for example, in image processing [10], in text analysis [11], and in data mining [12]. For speech processing, applications of Dirichlet distribution in the line spectral frequency (LSF) parameter quantization [13] were shown superior to conventional GMM based methods. Another usage of the Dirichlet distribution is to model the probabilities of the weighting factors in a mixture model [14], [15]. In non-parametric Bayesian modeling, the Dirichlet process is actually an infinite-dimensional generalization of the Dirichlet distribution so that an infinite mixture model can be obtained [15], [16], [17]. Here, we study only the finite DMM and the work conducted can also be extended to the infinite mixture modeling case.

In this paper, we carry on our previous study of Bayesian analysis of BMM [8] and extend it to the Bayesian analysis of DMM. The parameters in a Dirichlet distribution are assumed mutually independent and each of them is assigned by a gamma prior. Although this assumption violates the correlation among the parameters, it captures the non-negative properties of those parameters. By this assumption, we can apply the factorized approximation (FA) method to carry out the Bayesian estimation. However, as the expectation of the multivariate log-inverse-beta (MLIB) function cannot be calculated explicitly, an analytically tractable solution to the posterior distribution is not feasible. To overcome this problem, we study some relative convex properties of the MLIB function. Using these convexities, we approximate the expectation of the MLIB function by a single lower-bound (SLB). With this derived SLB and by principles of the VI framework and the extended factorized approximation (EFA) method [8], [18], [19], [20], [21], [22], [23], [24], we approximate the posterior distributions of the parameters in a Dirichlet distribution with a product of several mutually independent gamma distributions, which satisfies the conjugate match between the prior and posterior distributions. Finally, an analytically tractable solution for calculating the posterior distribution is obtained. This analytically tractable solution avoids the numerical calculations in the EM algorithm [25], [10].

The proposed method, which is a full Bayesian framework, can automatically determine the model complexity (in terms of the number of necessary mixture components) based on the data. This task is also challenging in model estimation and the ML estimation itself cannot handle this issue. Moreover, the overfitting problem in the ML estimation can also be prevented due to the advantages of Occam׳s razor effect in Bayesian estimation. With synthesized data evaluation, the effectiveness and the accuracy of the proposed Bayesian estimation method over the ML estimation method [10], [25] and the recently proposed Bayesian estimation method [12] are demonstrated. For the real life applications, we evaluate the proposed Bayesian estimation method with two important multimedia signal processing applications, namely (1) the LSF parameter quantization in speech coding [13] and (2) the multiview depth image enhancement in free-viewpoint television (FTV) [26]. For both applications, the proposed Bayesian method works well and shows improvement over the conventional methods.

The remaining parts of this paper are organized as follows: the DMM and the Bayesian analysis of a DMM are introduced in Sections 2 and 3, respectively. In Section 4, we show the efficiency and good performance of the proposed method with the synthesized data and the real life data. Some conclusions are drawn in Section 5.

Section snippets

Dirichlet mixture model

If a K-dimensional vector x=[x1,,xK]T contains only positive values and the summation of all the K elements is smaller than one, the underlying distribution of x could be modeled by a Dirichlet distribution. The probability density function (PDF) of a Dirichlet distribution is1

Bayesian estimation with variational inference framework

For a distribution belonged to the exponential family, the conjugate prior and the corresponding posterior distribution always exist [3]. Similar to the beta distribution [8], the Dirichlet distribution has its conjugate prior and the corresponding posterior distributions. However, they are not tractable in practical use. Thus we follow the principle of VI framework [18], [3] to approximate the prior and posterior distributions. With the proposed approximation, the obtained prior and posterior

Experimental results and discussion

We evaluate the proposed method with both synthesized data and real data. With the synthesized data, the efficiency and accuracy of the proposed Bayesian estimation method are demonstrated. In the real data evaluation, the proposed Bayesian DMM method is applied for the purpose of LSF parameters quantization in speech transmission and compared to the recently proposed statistical model based methods [13]. Meanwhile, we also apply the proposed Bayesian DMM to enhance the multiview depth image,

Conclusion

The Bayesian estimation of a statistical model is, in general, preferable to the maximum likelihood (ML) estimation. To avoid the numerical calculation in the maximum likelihood estimation of the parameters in a Dirichlet mixture model (DMM), we proposed a novel Bayesian estimation method based on the variational inference framework. The main contribution of this paper is to derive an analytically tractable solution for approximating the posterior distribution of the parameters, by utilizing

Conflict of interest statement

None declared.

Zhanyu Ma has been an assistant Professor at Beijing University of Posts and Telecommunications, Beijing, China, since 2013. He received his M.Eng. degree in Signal and Information Processing from BUPT (Beijing University of Posts and Telecommunications), China, and his Ph.D. degree in Electrical Engineering from KTH (Royal Institute of Technology), Sweden, in 2007 and 2011, respectively. From 2012 to 2013, he has been a Postdoctoral research fellow in the School of Electrical Engineering, KTH,

References (58)

  • K. Fukunaga

    Introduction to Statistical Pattern Recognition

    (1990)
  • A.K. Jain et al.

    Statistical pattern recognitiona review

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2000)
  • C.M. Bishop

    Pattern Recognition and Machine Learning

    (2006)
  • J.M. Bernardo et al.

    Bayesian Theory, Wiley Series in Probability and Statistics

    (2000)
  • J.D. Banfield et al.

    Model-based Gaussian and non-Gaussian clustering

    Biometrics

    (1993)
  • Y. Ji et al.

    Application of beta-mixture models in bioinformatics

    Bioinform. Appl. Note

    (2005)
  • Z. Ma, Non-Gaussian statistical models and their applications (Ph.D. thesis), KTH - Royal Institute of Technology,...
  • Z. Ma et al.

    Bayesian estimation of beta mixture models with variational inference

    IEEE Trans. Pattern Anal. Mach. Intell..

    (2011)
  • Z. Ma, A.E. Teschendorff, A variational Bayes beta mixture model for feature selection in DNA methylation studies, J....
  • N. Bouguila et al.

    Unsupervised learning of a finite mixture model based on the Dirichlet distribution and its application

    IEEE Trans. Image Process.

    (2004)
  • D. Blei, Probabilistic models of text and images (Ph.D. thesis), Univ. of California, Berkeley,...
  • W. Fan et al.

    Variational learning for finite Dirichlet mixture models and applications

    IEEE Trans. Neural Netw. Learn. Syst.

    (2012)
  • Z. Ma et al.

    Vector quantization of LSF parameters with a mixture of Dirichlet distributions

    IEEE Trans. Audio Speech Lang. Process.

    (2013)
  • G.J. McLachlan et al.

    Finite Mixture Models

    (2000)
  • D.M. Blei et al.

    Variational inference for Dirichlet process mixtures

    Bayesian Anal.

    (2005)
  • P. Orbanz, Y.W. Teh, Bayesian nonparametric models, Encyclopedia of Machine Learning, 2010, pp...
  • Z. Ghahramani, Bayesian non-parametrics and the probabilistic approach to modelling. Philos. Trans. R. Soc. A: Math....
  • M.I. Jordan et al.

    An introduction to variational methods for graphical models

    Mach. Learn.

    (1999)
  • T.S. Jaakkola, Tutorial on variational approximation methods, in: Advanced Mean Field Methods: Theory and Practice, MIT...
  • T.S. Jaakkola, Tutorial on variational approximation methods, in: Advances in Mean Field Methods, MIT Press, Cambridge,...
  • D.M. Blei, J.D. Lafferty, Correlated topic models, in: Advances in Neural Information Processing Systems, MIT Press,...
  • D.M. Blei et al.

    A correlated topic model of Science

    Ann. Appl. Stat.

    (2007)
  • M. Braun et al.

    Variational inference for large-scale models of discrete choice

    J. Am. Stat. Assoc.

    (2010)
  • M. Hoffman, D. Blei, P. Cook, Bayesian nonparametric matrix factorization for recorded music, in: Proceedings of...
  • Z. Ma, A. Leijon, Modeling speech line spectral frequencies with Dirichlet mixture models, in: Proceedings of...
  • M. Tanimoto et al.

    Free-viewpoint TV

    IEEE Signal Process. Mag.

    (2011)
  • R. Ksantini et al.

    Weighted pseudometric discriminatory power improvement using a Bayesian logistic regression model based on a variational method

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2008)
  • P.J. Bickel et al.

    Mathematical Statistics: Basic Ideas and Selected Topics

    (2007)
  • J.A. Palmer, Relative convexity, Technical Report, UCSD,...
  • Cited by (106)

    View all citing articles on Scopus

    Zhanyu Ma has been an assistant Professor at Beijing University of Posts and Telecommunications, Beijing, China, since 2013. He received his M.Eng. degree in Signal and Information Processing from BUPT (Beijing University of Posts and Telecommunications), China, and his Ph.D. degree in Electrical Engineering from KTH (Royal Institute of Technology), Sweden, in 2007 and 2011, respectively. From 2012 to 2013, he has been a Postdoctoral research fellow in the School of Electrical Engineering, KTH, Sweden. His research interests include pattern recognition and machine learning fundamentals with a focus on applications in multimedia signal processing, data mining, biomedical signal processing, and bioinformatics.

    Pravin Kumar Rana received the M.Sc. degree in Physics with specialization in Electronics and Communication from Ranchi University, Ranchi, India, in 2004, and the M.Tech. degree in Earth System Science and Technology from Indian Institute of Technology (IIT) Kharagpur, India, in 2008. In 2008, he joined the School of Electrical Engineering at KTH Royal Institute of Technology, Stockholm, Sweden, where he is currently working towards the Ph.D. degree. His research area includes multiview video processing, 3D and free-viewpoint TV, and computer vision. He is a student member of the IEEE.

    Jalil Taghia is currently pursuing a Ph.D. degree at the Communication Theory laboratory, KTH Royal Institute of Technology, Stockholm, Sweden. His research interest includes statistical signal processing and machine learning including Bayesian inference, variational approximations, latent variable models in particular with audio applications.

    Markus Flierl received the Ph.D. degree in electrical engineering from Friedrich Alexander University, Erlangen, Germany, in 2003. He is an Associate Professor at the Autonomic Complex Communication Networks, Signals and Systems (ACCESS) Linnaeus Center, School of Electrical Engineering, KTH - Royal Institute of Technology, Stockholm, Sweden. From 2005 to 2008, he was a Visiting Assistant Professor at the Max Planck Center for Visual Computing and Communication at Stanford University, Stanford, CA. He is the author of the book Video Coding with Superimposed Motion-Compensated Signals: Applications to H.264 and Beyond and of more than 50 other scientific publications.

    He is the recipient of the 2007 Visual Communications and Image Processing Young Investigator Award.

    Arne Leijon is a Professor in Hearing Technology at the KTH (Royal Institute of Technology) Sound and Image Processing Lab, Stockholm, Sweden, since 1994. His main research interest concerns applied signal processing in aids for people with hearing impairment, and methods for individual fitting of these aids, based on psychoacoustic modelling of sensory information transmission and subjective sound quality. He received the M.S. degree in Engineering Physics in 1971, and a Ph.D. degree in Information Theory in 1989, both from Chalmers University of Technology, Gothenburg, Sweden.

    View full text