Top

Published in:

2013 | OriginalPaper | Chapter

8. Point Estimation Methods

Author : Ron C. Mittelhammer

Published in: Mathematical Statistics for Economics and Business

Publisher: Springer New York

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this chapter we examine point estimation methods that lead to specific functional forms for estimators of q(Θ) that can be relied upon to define estimators that often have good estimator properties. Thus for the only result in Chapter 7 that could be used directly to define the functional form of an estimator is the theorem on the attainment of the CRLB, which is useful only if the probability model {f(x;Θ), Θ∈Ω} and the estimand q(Θ) are such that the CRLB is actually attainable. We did examine a number of important results that could be used to narrow the search for a good estimator of q(Θ), to potentially improve upon an unbiased estimator that was already available, or that could verify when an unbiased estimator was actually the best in the sense of minimizing variance or in having the smallest covariance matrix. However, since the functional form of an estimator of q(Θ) having good estimator properties is often not apparent even with the aid of the results assembled in Chapter 7, we now examine procedures that suggest functional forms of estimators.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Point Estimation Theory

next chapter Hypothesis Testing Theory

We will concentrate on models for which the mean of the random variable exists. It is possible to use this characterization in other ways when it does not, for example, using the median as the measure of the central tendency of the outcomes, such as $ {{Y}_i} = {{\eta}_i} + {{\varepsilon}_i}_i $, where $ {{\eta}_i} = median\left( {{{Y}_i}} \right) $ and $ median\left( {{{\varepsilon}_i}} \right) = 0 $.

We note, however, that for highly nonlinear functions, the degree of polynomial required to provide an adequate approximation to μ(z _i1,…,z _im) may be so high that there will not be enough sample observations to estimate the unknown β _j′s adequately, or at all. This is related to a requirement that x have full column rank, which will be discussed shortly.

Note that by tradition, the previous case where x is designed is also referred to as a regression of Y on x, where one can think of the conditional expectation in the degenerate sense, i.e., where x takes its observed value with probability one.

Recall that $ \arg {{\min}_w}\left\{ {f(w)} \right\} $ denotes the argument value w that minimizes f(w).

This follows because z ^1/2 is a monotonic transformation of z for z ≥ 0, so that the minimum (and maximum) of z ^1/2 and z occur at the same values of z, z ∈ D, D being some set of nonnegative numbers.

A matrix x has full column rank iff x′x has full rank. See Rao, C., op. cit., p. 30.

The inverse of a symmetric positive definite matrix is necessarily a symmetric positive definite matrix. This can be shown by noting that a symmetric matrix is positive definite iff all of its characteristic roots are positive, and the characteristic roots of A ⁻¹ are the reciprocals of the characteristic roots of A.

Suppose B is a $ \left( {k\times k} \right) $ symmetric positive definite matrix. Let C be a $ \left( {r\times k} \right) $ matrix such that each row consists of all zeros except for a 1 in one position, and let the r-rows of C be linearly independent. Then CBC′ defines an $ \left( {r\times r} \right) $ principal submatrix of B. Now note that $ { l^{\prime}} $ CBC′ = _* ^′ B _* > 0 ∀ ≠ 0 since _* = C′ ≠ 0 ∀ ≠ 0 by the definition of C, and B is positive definite. Therefore, the principal submatrix CBC′ is positive definite.

Consistency of $ {\hat{\mathbf{\upbeta} }} $ can be proven under alternative conditions on x. Judge, et al., (1982) Introduction to the Theory and Practice of Econometrics, John Wiley, pp. 269–269, prove the result using the stronger condition that $ {{\lim}_{{n\to \infty }}} $ n⁻¹ x′x = Q, where Q is a finite, positive definitive matrix. Halbert White, (1984) Asymptotic Theory for Econometricians, Academic Press, p. 20, assumes that n⁻¹ x′x is bounded and uniformly positive definite, which is also a stronger condition than the one we use here.

We should note, however, that in the specification of some linear models, certain proxy variables might be used to explain y that literally violate the boundedness assumption. For example, a linear “time trend” t = (1,2,3,4,…) is sometimes used to explain an apparent upward or downward trend in E(Y _t)and the trend clearly violates the boundedness constraint. In such cases, one may wonder whether it is really to be believed that t → ∞ is relevant in explaining y, or whether the time trend is just an artifice relevant for a certain range of observations, but for which extrapolation ad infinitum is not appropriate.

Regarding the symmetric square root matrix A ^1/2, note that P′AP = Λ, where P is the orthogonal matrix of characteristic vectors of the symmetric positive semidefinite matrix A, and Λ is the diagonal matrix of characteristic roots. Then A ^1/2 = PΛ ^1/2 P′, where Λ ^1/2 is the diagonal matrix formed from Λ by taking the square root of each diagonal element. Note that since P′P = I, A ^1/2 A ^1/2 = PΛ ^1/2 P′PΛ ^1/2 P′ = PΛ ^1/2 Λ ^1/2 P′ = PΛP′ = A. The matrix square root is a continuous function of the elements of A. Therefore, $ {{\lim}_{{n\to \infty }}} $(A _n)^1/2 = $ {\mathop{{\left( {{{{\lim }}_{{n\to \infty }}}{{{\bf A}}_n}} \right)}}\nolimits^{{1/2}} } $.

Positive semidefiniteness can be deduced from the fact that the characteristic roots of a symmetric idempotent matrix are all nonnegative, being a collection of 0’s and 1’s.

We are making the tacit assumption that x contains no lagged values of the dependent variable, which is as it must be if it is presumed that x can be held fixed. In the event that x contains lagged values of the dependent variable and the error terms are autocorrelated, then in general E((X′X)⁻¹ X′ $ \mathbf{\upvarepsilon} $) ≠ 0, and $ {\hat{\mathbf{\upbeta} }} $ is biased. Issues related to this case are discussed in subsection 8.2.3.

The characteristic roots of τA are equal to the characteristic roots of A times the scalar τ.

Note tr(Ωx(x′x)⁻¹ x′) = tr(Ω^1/2 x(x′x)⁻¹ x′Ω ^1/2), and since (x′x)⁻¹ is positive definite, Ω ^1/2 x(x′x)⁻¹ x′ Ω ^1/2 is at least positive semidefinite and its trace must be nonnegative.

Recall that $ \mathrm{\rm E}\left( {\mathbf{\upvarepsilon} \mathbf{\upvarepsilon} ^{\prime}} \right) = {\bf {Cov}}\left( \mathbf{\upvarepsilon} \right) + \mathrm{\rm E}\left( \mathbf{\upvarepsilon} \right)\mathrm{\rm E}\left( {\mathbf{\upvarepsilon} ^{\prime}} \right) $.

Note that β _i $ \in $(−∞,∞), ∀i, and σ ² > 0 are the admissible parameter values for Y ~ N(xβ , σ ² I). It may be the case that only a subset of these values for the parameters are deemed to be relevant in a given estimation problem (e.g., a price effect may be restricted to be of one sign, or the realistic magnitude of the effect of an explanatory variable may be bounded). Restricting the parameter space comes under the realm of prior information models, which we do not pursue here. So long as the admissible values of β and σ ² form an open rectangle themselves, the range of c(β, σ ²) will contain an open rectangle.

Recall that $ \arg {{\max}_w} ${f(w)}denotes the argument value of f(w) that maximizes f(w), where argument value means the value of w. Also, $ \arg {{\max}_{{w\in \Omega }}} ${f(w)} denotes the value of $ w\in \Omega $ that maximizes f(w).

Recall that $ {{\arg}_{{{\mathbf{\Theta}} \in \Omega }}}\left\{ {{\bf g}\left( {\mathbf{\Theta}} \right) = {\bf c}} \right\} $ represents the value of ΘϵΩ that satisfies or solves g(Θ) = c.

We are suppressing the fact that a maximum of L(Θ;x) may not be attainable. For example, if the parameter space is an open interval and if the likelihood function is strictly monotonically increasing, then no maximum can be stated. If a maximum of L(Θ;x) for Θ ϵ Ω does not exist, then the MLE of Θ does not exist.

In the case where the classical first order conditions are applicable, note that if L(Θ,x) > 0 (which will necessarily be true at the maximum value), then

$$ {\frac{{\partial \ln \left( {L(\boldsymbol{\varTheta}; \mathbf{x})} \right)}}{{\partial {\varTheta}}} = \frac{1}{{L(\boldsymbol{\varTheta}; \mathbf{x})}}\frac{{\partial L(\boldsymbol{\varTheta}; \mathbf{x})}}{{\partial \boldsymbol{\varTheta}}},} $$

and thus any Θ for which ∂L(Θ;x)/∂Θ = 0 also satisfies ∂ ln(L(Θ;x))/∂Θ = 0. Regarding second order conditions, note that if Θ satisfies the first order conditions, then

$$ {\frac{{\mathop{\partial}\nolimits^2 \ln \left( {L(\boldsymbol{\varTheta}; \mathbf{x})} \right)}}{{\partial \boldsymbol{\varTheta} \partial \boldsymbol{\varTheta} ^{\prime}}} = \frac{1}{{L(\boldsymbol{\varTheta}; \mathbf{x})}} \frac{{\mathop{\partial}\nolimits^2 L(\boldsymbol{\varTheta}; \mathbf{x})}}{{\partial \boldsymbol{\varTheta} \partial \boldsymbol{\varTheta} ^{\prime}}} - \frac{{\partial L(\boldsymbol{\varTheta}; \mathbf{x})}}{{\partial \boldsymbol{\varTheta}}} \frac{{\partial L(\boldsymbol{\varTheta}; \mathbf{x})}}{{\partial \boldsymbol{\varTheta} ^{\prime}}} = \frac{1}{{L(\boldsymbol{\varTheta}; \mathbf{x})}} \frac{{\mathop{\partial}\nolimits^2 L(\boldsymbol{\varTheta}; \mathbf{x})}}{{\partial \boldsymbol{\varTheta} \partial \boldsymbol{\varTheta} ^{\prime}}}} $$

since ∂L(Θ;x)/∂Θ = 0. Then since L(Θ;x) > 0 at the maximum, $ {{\partial}^2}\ln \left( {L\left( {{\mathbf{\Theta}}; \mathbf{x}} \right)} \right)/\partial {\mathbf{\Theta}} \partial {\mathbf{\Theta}} ^{\prime} $ is negative definite iff $ {{\partial}^2}\left( {L\left( {{\mathbf{\Theta}}; \mathbf{x}} \right)} \right)/\partial {\mathbf{\Theta}} \partial {\mathbf{\Theta}} ^{\prime} $ is negative definite.

Economic theory may suggest constraints on the signs of some of the entries in β (e.g., the effect of income on durables consumption will be positive), in which case β ∈ Ω_β ⊂ ℝ^k may be more appropriate.

In the event that a MLE is not unique, then the set of MLE’s is a function of any set of sufficient statistics. However, a particular MLE within the set of MLE’s need not necessarily be a function of (s ₁,…,s _r), although it is always possible to choose an MLE that is a function of (s ₁,…,s _r). See Moore, D.S., (1971), “Maximum Likelihood and Sufficient Statistics”, American Mathematical Monthly, January, pp. 50–52.

Alternatively, the solution for α can be determined by consulting tables generated by Chapman which were constructed specifically for this purpose (Chapman, D.G. “Estimating Parameters of a Truncated Gamma Distribution,” Ann. Math. Stat., 27, 1956, pp. 498–506).

Bowman, K.O. and L.R. Shenton. Properties of Estimators for the Gamma Distribution, Report CTC–1, Union Carbide Corp., Oak Ridge, Tennessee.

Norden, R.H. (1972; 1973) “A Survey of Maximum Likelihood Estimation,” International Statistical Revue, (40): 329–354 and (41): 39–58.

See Lehmann, E. (1983) Theory of Point Estimation, John Wiley and Sons, pp. 420–427.

We remind the reader of our tacit assumption that Θ is identified (Definition 7.2).

By true value of Θ, we again mean that Θ_o is the value of Θ ∈ Ω for which f(x;Θ_o) ≡ L(Θ_o;x) is the actual joint density function of the random sample X. The value of Θ_o is generally unknown, and in the current context is the objective of point estimation.

If X_i is a continuous random variable, then since Θ_o is the true value of Θ,

$$ \mathrm{E}\left[ {f\left( {{{X}_i};\varTheta } \right)/f\left( {{{X}_i};{{\varTheta}_0}} \right)} \right] = \int_{{ - \infty }}^{\infty } {\frac{{f\left( {{\mathbf{x}_i};\varTheta } \right)}}{{f\left( {{\mathbf{x}_i};{{\varTheta}_{\mathrm{o}}}} \right)}}} f\left( {{\mathbf{x}_i},{{\varTheta}_{\mathrm{o}}}} \right)d{\mathbf{x}_i} = \int_{{ - \infty }}^{\infty } f \left( {{\mathbf{x}_i};\varTheta } \right)d{\mathbf{x}_i} = 1 $$

because f(x _i;Θ) is a probability density function. The discrete case is analogous.

That this unique solution is a maximum can be demonstrated by noting that $ {{{\partial}^2}\ln \left( {L({\Theta}; {\bf x})} \right)/\partial {{{\Theta}}^2} = n/{{{\Theta}}^2} - 2\sum\nolimits_{{i = 1}}^n {{\mathbf{x}_i}} /{{\Theta}^3}} $, which when evaluated at the maximum likelihood estimate $ { \hat{{\Theta}}} = \sum\nolimits_{{i = 1}}^n {{\mathbf{x}_i}/n} $, yields $ \left( {\left( {{{\partial}^2}\ln \left( {L\left( {\hat{\theta };{\bf x}} \right)} \right)} \right)/\left( {\partial {{\Theta}^2}} \right)} \right) = - {{n}^3}/{{\left( {\sum\nolimits_{{i = 1}}^n {{\mathbf{x}_i}} } \right)}^2} < 0 $.

N(ε) is an open interval, the interior of a circle, the interior of a sphere, and the interior of a hypersphere in 1,2,3, and ≥ 4 dimension, respectively.

It is allowable that conditions (2) and (3) be violated on a set of x-values having probability zero.

Change max to sup if max does not exist.

This follows from Khinchin’s WLLN upon recognizing that the right hand side expression represents E(ln(x _i)).

Recall that the matrix square root is a continuous function of its arguments, so that plim$ {\left( {{\bf A}_n^{{1/2}}} \right) = {{{\left( {\rm p\lim {{{\bf A}}_n}} \right)}}^{{1/2}}}} $. Letting$ {{{\mathbf{A}}_n} = {{n}^{{ - 1}}}\mathrm{E}\left( {\frac{{\partial \ln L\left( {{{\boldsymbol{\varTheta}}_{\mathrm{o}}};\mathbf{X}} \right)}}{{\partial \boldsymbol{\varTheta}}} \frac{{\partial \ln L\left( {{{\boldsymbol{\varTheta}}_{\mathrm{o}}};\mathbf{X}} \right)}}{{\partial \boldsymbol{\varTheta} ^{\prime}}}} \right)}. $leads to $ {{\rm {plim}}\left( {{\bf A}_n^{{1/2}}} \right) = {\bf M}{{{\left( {{{\mathbf{\Theta}}_\mathrm{{\rm o}}}} \right)}}^{{1/2}}}} $.

The gamma function, Γ(α), is continuous in α, and its first two derivatives are continuous in α, for α > 0. Γ(α) is in fact strictly convex, with its second order derivative strictly positive for α > 0. See Bartle, op. cit., p. 282.

A way of deriving the expectations involving ln(X _i) that is conceptually straightforward, albeit somewhat tedious algebraically, is first to derive the MGF of ln(X _i), which is given by β ^tΓ(α + t)/Γ(α). Then using the MGF in the usual way establishes the mean and variance of ln(X _i). The covariance between $ \left( {{{X}_i}/\beta_\mathrm{{\rm o}}^2} \right) $ and ln(X _i) can be established by noting that

$$ \begin{array}{lll} \beta_{\mathrm{o}}^{{ - 2}}\mathrm{E}({{X}_i}\ln ({{X}_i})) = \beta_{\mathrm{o}}^{{ - 2}}\left[ {\begin{array}{lll}{*{20}{c}} \hfill {\frac{1}{{\beta_{\mathrm{o}}^{{{{\alpha}_{\mathrm{o}}}}}\varGamma \left( {{{\alpha}_{\mathrm{o}}}} \right)}}} \\ \end{array} } \right]\int_{\mathrm{o}}^{\infty } {\left( {\ln ({\mathbf{x}_i})} \right)} {\mathbf{x}_i}{{\alpha}_{\mathrm{o}}}{{e}^{{ - {\mathbf{x}_i}/{{\beta}_{\mathrm{o}}}}}}d{\mathbf{x}_i} \\ = {{\alpha}_{\mathrm{o}}}\beta_{\mathrm{o}}^{{ - 1}}{{\mathrm{E}}_{*}}(\ln ({{X}_i})) \\ \end{array}. $$

where E_* denotes an expectation of ln(X _i) using a gamma density having parameter values α _o + 1 and β _o. Then $ \mathrm{\rm cov} \left( {\left( {{{X}_i}/\beta_\mathrm{{\rm o}}^2} \right),\ln ({{X}_i})} \right) $ is equal to

$$ {{\alpha}_\mathrm{{\rm o}}}\beta_\mathrm{{\rm o}}^{{ - 1}}{\mathrm{{\rm E}}_{*}}(\ln ({{X}_i})) - \left( \mathrm{{\rm E}(\ln ({{X}_i}))\mathrm{\rm E}\left( {{{X}_i}/\beta_\mathrm{{\rm o}}^2} \right)} \right) = {{\alpha}_\mathrm{{\rm o}}}\beta_\mathrm{{\rm o}}^{{ - 1}}\left( {{\mathrm{{\rm E}}_{*}}(\ln ({{X}_i})) - \mathrm{\rm E}(\ln ({{X}_i}))} \right) = \beta_\mathrm{{\rm o}}^{{ - 1}}. $$

Pearson, K.P. (1894), Contributions to the Mathematical Theory of Evolution, Phil. Trans. R. Soc. London A, 185: pp. 71–110.

Under the assumed conditions, for each density (conditional or not),

$$ {{\mathrm{{\rm E}}_{{{{{\mathbf{\Theta}}}_\mathrm{{\rm o}}}}}} \left( {\frac{{\partial \ln f\left( {{\bf Z};{{{\mathbf{\Theta}}}_\mathrm{{\rm o}}}} \right)}}{{\partial {\mathbf{\Theta}}}}} \right) = \int_{{ - \infty }}^{\infty } {\frac{{\partial f\left( {{\bf z};{{{\mathbf{\Theta}}}_\mathrm{{\rm o}}}} \right)}}{{\partial {\mathbf{\Theta}}}}} d{z} = \frac{{\partial \int_{{ - \infty }}^{\infty } {f\left( {{\bf z};{{{\mathbf{\Theta}}}_\mathrm{{\rm o}}}} \right)} d{z}}}{{\partial {\mathbf{\Theta}}}} = {0}} $$

in the continuous case, and likewise in the discrete case.

See T. Amemiya, op. cit., pp. 111–112 for a closely related theorem relating to extremum estimators. The assumptions of Theorem 8.26 can be weakened to assuming uniqueness and consistency of $ {{ \hat{{\mathbf{\Theta}}}}} $ for use in this theorem.

We are suppressing the fact that each row of the matrix $ {{\partial}^2}{\rm Q_n} ({\rm Y},{{\mathbf{\Theta}}_{*}})/\partial \mathbf{\Theta} \partial \mathbf{\Theta} ^{\prime} $ will generally require a different value of Θ _* defined by a different value of λ for the representation to hold. The conclusions of the argument will remain the same.

This proof is related to a proof by T. Amemiya,(1985), Advanced Econometrics, Harvard University Press, p. 107, dealing with the consistency of extremum estimators.

See Bartle, R.G.,(1976) The Elements of Real Analysis, 2nd Edition, John Wiley, p. 371.

Title: Point Estimation Methods
Author: Ron C. Mittelhammer
Publisher: Springer New York
Book: Mathematical Statistics for Economics and Business
Print ISBN: 978-1-4614-5021-4

Electronic ISBN: 978-1-4614-5022-1

Copyright Year: 2013
DOI: https://doi.org/10.1007/978-1-4614-5022-1_8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"