Information measures for generalized gamma family

https://doi.org/10.1016/j.jeconom.2006.05.010Get rights and content

Abstract

The objective of this paper is to integrate the generalized gamma (GG) distribution into the information theoretic literature. We study information properties of the GG distribution and provide an assortment of information measures for the GG family, which includes the exponential, gamma, Weibull, and generalized normal distributions as its subfamilies. The measures include entropy representations of the log-likelihood ratio, AIC, and BIC, discriminating information between GG and its subfamilies, a minimum discriminating information function, power transformation information, and a maximum entropy index of fit to histogram. We provide the full parametric Bayesian inference for the discrimination information measures. We also provide Bayesian inference for the fit of GG model to histogram, using a semi-parametric Bayesian procedure, referred to as the maximum entropy Dirichlet (MED). The GG information measures are computed for duration of unemployment and duration of CEO tenure.

Introduction

The generalized gamma (GG) distribution offers a flexible family in the varieties of shapes and hazard functions for modeling duration. It was introduced by Stacy (1962). Difficulties with convergence of algorithms for maximum likelihood estimation (Hager and Bain, 1970) inhibited applications of the GG model. Prentice (1974) resolved the convergence problem using a nonlinear transformation of GG model. However, despite its long history and growing use in various applications, the GG family has been remarkably absent in the information theoretic literature. Thus far a maximum entropy (ME) derivation of GG is given in Kapur (1989), where it is referred to as generalized Weibull distribution, and only recently the entropy of GG has appeared in the context of flexible families of distributions (Nadarajah and Zografos, 2003). The GG family has not been included in information studies such as the existing ME distributional fitting of the parametric families (see, e.g., Soofi and Retzer, 2002 and references therein), the discrimination information statistics analysis of the parametric families (Alwan et al., 1998), and the entropy orderings of the parametric families (Ebrahimi et al., 1999). The main objective of this paper is to fill this void and integrate the GG family into the information theoretic literature. For this purpose, we develop information criteria for discriminating between the GG and its subfamilies and for assessing the fit of GG to the data. We also present Bayesian inference about the discrimination and the fit.

Analysis of duration data is increasingly used in various areas of economics and related fields (Keifer, 1988). In labor economics, examples include studies of the duration of unemployment, (Lancaster, 1979, Kiefer, 1984, McDonald and Butler, 1987, Yamaguchi, 1992), turnover in labor market (Kiefer et al., 1985), length of contract (Gronberg, 1994), and duration of strike (Jaggia, 1991). Examples in other areas include studies of firms survival (Audretsch and Mahmoud, 1995), duration that firms spend under Chapter 11 (Orbe et al., 2002), duration that a property is on the market (Genesove and Mayer, 1997), duration of schooling at higher education (Diaz, 1999), duration of stages of oilfield exploration (Favero et al., 1994), household interpurchase time (Vakratsas and Bass, 2002), interpurchase time in financial markets (Allenby et al., 1999), and length of the time that new movies stay on screens (Blumenthal, 1988).

Distributions that are used in duration analysis in economics include exponential (Kiefer, 1984, Diebold and Rudebusch, 1990), lognormal (Eckstein and Wolpin, 1995), gamma (Lancaster, 1979), and Weibull (Favero et al., 1994). The GG family, which encompasses exponential, gamma, and Weibull as subfamilies, and lognormal as a limiting distribution, has been used in economics by Jaggia (1991), Yamaguchi (1992), and Allenby et al. (1999). Some authors (e.g., Jaggia, 1991, Allenby et al., 1999) have argued that the flexibility of GG makes it suitable for duration analysis, while others have been using simpler models and avoiding the estimation difficulties caused by the complexity of GG parameter structure. Obviously, there would be no need to endure the costs associated with the application of a complex GG model if the data do not discriminate between the GG and members of its subfamilies, or if the fit of a simpler model to the data is as good as that for the complex GG. The question therefore is: Do the data necessitate use of a GG model? From the information theoretic perspective, this question is dealt with derivation of probability models based on partial information in the form of a set of constraints, measuring the incremental information content of additional constraints, and thereby assessing compatibility of models with the data. The GG information measures, presented in this paper, offer tools, with axiomatic basis and intuitive appeals, for GG as a general class of duration models.

The paper is organized as follows. Section 2 discusses information properties of the GG family and presents several discrimination information measures for the GG and its subfamilies. Section 3 gives entropy representations of the likelihood statistic, AIC, and BIC measures. Section 4 discusses Bayesian inference about the GG parameters and discrimination information measures. Section 5 presents an information index of fit of the GG model to the histogram and Bayesian inference about the fit. Section 6 illustrates application of the GG information criteria to the duration of unemployment and duration of CEO tenure. Section 7 gives some brief concluding remarks.

Section snippets

Information properties of GG family

The probability density function of the GG distribution, GG(α,τ,λ), isfGG(y|α,τ,λ)=τλατΓ(α)yατ-1e-(y/λ)τ,y0,α,τ,λ>0,where Γ(·) is the gamma function, α and τ are shape parameters, and λ is the scale parameter.

The GG family is flexible in that it includes several well-known models as subfamilies (see, Johnson et al., 1994). The subfamilies of GG thus far considered in the literature are exponential (α=τ=1), gamma for (τ=1), and Weibull for (α=1). The lognormal distribution is also obtained as a

Likelihood-based measures

The likelihood function based on a set of observations y=(y1,,yn) from yf(y|α,τ,λ)=GG(α,τ,λ) isf(y|α,τ,λ)=τλατΓ(α)nexpn(ατ-1)logy¯-yτ¯λτ,where yτ¯=(1/n)i=1nyiτ and logy¯=(1/n)i=1nlogyi.

The likelihood equations for the derivatives of the log-likelihood function L(α,τ,λ)=logf(y|α,τ,λ) with respect to α and λ are the two moment equations (6) and (7) with θ1=yτ¯ and θ2=logy¯. These equations give L(α^,τ^,λ^)=-nH^GG, where H^GG is given by (8) with the MLE estimates α^, τ^, and λ^ of the

Bayesian inference for discrimination information

Given data y1,,yn, discrimination information statistics for the GG family are obtained by estimating the Kullback–Leibler functions presented in the preceding section. We may estimate the discrimination information measures by estimating the parameters using the maximum likelihood, the methods of moments, generalized method of moments, and Bayesian procedures. These estimates of information provide descriptive statistics which are useful diagnostic measures for quantifying data information

Bayesian inference for maximum entropy index

Maximum entropy fit indices and tests are constructed based on properties of the parametric family of the model. Consider the distributions in the moment class (5). If GG*Ωθ is the ME model, then for any FΩθ, by the information distinguishability (ID) relation (Soofi et al., 1995), we haveK(F:GG*|θ)=H(GG*Ωθ)-H(FΩθ).That is, the discrepancy between ME distribution GG* and any other distribution in Ωθ is given by the difference between entropies of the two models.

Given observations y1,,yn

Examples

We illustrate applications of the discrimination information measures and ME fit indices using two data sets. The first data set pertains to unemployment duration, drawn from the Bureau of Labor Statistics 2001. We studied unemployment data for females and males in rural and urban areas, and will report the results for female workers in the urban areas. The results of information analyses for other categories were all remarkably similar to those reported here. The second data set pertains to

Concluding remarks

This paper took the first major step toward closing the gap between the growing presence of GG model in duration analysis literature and its remarkable absence in the information studies. We presented some information properties of the GG distribution and showed that its flexibility leads to an assortments of information measures for the family. These information functions provide insights and can serve various data analysis purposes such as MDI modeling and data transformation. We gave entropy

References (39)

  • M.D.M. Diaz

    Extended stay at university: an application of multinomial logit and duration models

    Applied Economics

    (1999)
  • F.X. Diebold et al.

    A nonparametric investigation of duration dependence in the American business cycle

    Journal of Political Economy

    (1990)
  • Z. Eckstein et al.

    Duration to first job and the return to schooling: estimates from a search-matching model

    Review of Economic Studies

    (1995)
  • C.A. Favero et al.

    A duration model of irreversible oil investment: theory and empirical evidence

    Journal of Applied Econometrics

    (1994)
  • D. Genesove et al.

    Equity and time to sale in the real estate market

    American Economic Review

    (1997)
  • W.R. Gilks et al.

    Adaptive rejection sampling for Gibbs sampling

    Applied Statistics

    (1992)
  • T.J. Gronberg

    Estimating workers’ marginal willingness to pay for job attributes using duration data

    Journal of Human Resources

    (1994)
  • H.W. Hager et al.

    Theory and methods inferential procedures for the generalized gamma distribution

    Journal of the American Statistical Association

    (1970)
  • P. Hall et al.

    On the estimation of entropy

    Annals of Institute of Mathematical Statistics

    (1993)
  • Cited by (24)

    • Using a generalized model for air traffic delay: An application of information based duration analysis

      2018, Journal of Air Transport Management
      Citation Excerpt :

      The GG family includes many duration distributions such as exponential, gamma and Weibull as subfamilies. This essay illustrates the applications of information functions developed for GG family in Dadpay et al. (2007), using data on a sample of flight delays data. Air traffic delays are both a major source of passengers' complaints and a topic of discussion for authors of different disciplines studying the aviation industry.

    • Mixed Poisson process with Stacy mixing variable

      2024, Stochastic Analysis and Applications
    View all citing articles on Scopus
    1

    Currently at. Rapp Collins Worldwide, 1660 North Westridge Circle, Irving, TX 75062.

    View full text