2013  OriginalPaper  Chapter
Hint
Swipe to navigate through the chapters of this book
Published in:
Mathematical Statistics for Economics and Business
In this chapter we examine point estimation methods that lead to specific functional forms for estimators of q(Θ) that can be relied upon to define estimators that often have good estimator properties. Thus for the only result in Chapter 7 that could be used directly to define the functional form of an estimator is the theorem on the attainment of the CRLB, which is useful only if the probability model {f(x;Θ), Θ∈Ω} and the estimand q(Θ) are such that the CRLB is actually attainable. We did examine a number of important results that could be used to narrow the search for a good estimator of q(Θ), to potentially improve upon an unbiased estimator that was already available, or that could verify when an unbiased estimator was actually the best in the sense of minimizing variance or in having the smallest covariance matrix. However, since the functional form of an estimator of q(Θ) having good estimator properties is often not apparent even with the aid of the results assembled in Chapter 7, we now examine procedures that suggest functional forms of estimators.
Please log in to get access to this content
To get access to this content you need the following product:
Advertisement
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
We will concentrate on models for which the mean of the random variable exists. It is possible to use this characterization in other ways when it does not, for example, using the median as the measure of the central tendency of the outcomes, such as
\( {{Y}_i} = {{\eta}_i} + {{\varepsilon}_i}_i \), where
\( {{\eta}_i} = median\left( {{{Y}_i}} \right) \) and
\( median\left( {{{\varepsilon}_i}} \right) = 0 \).
We note, however, that for highly nonlinear functions, the degree of polynomial required to provide an adequate approximation to
μ(
z
_{ i1},…,
z
_{ im }) may be so high that there will not be enough sample observations to estimate the unknown
β
_{j}′s adequately, or at all. This is related to a requirement that
x have full column rank, which will be discussed shortly.
Note that by tradition, the previous case where
x is designed is also referred to as a regression of
Y on
x, where one can think of the conditional expectation in the degenerate sense, i.e., where
x takes its observed value with probability one.
Recall that
\( \arg {{\min}_w}\left\{ {f(w)} \right\} \) denotes the
argument value
w that
minimizes f(
w).
This follows because
z
^{1/2} is a monotonic transformation of
z for
z ≥ 0, so that the minimum (and maximum) of
z
^{1/2} and
z occur at the same values of
z,
z ∈
D,
D being some set of nonnegative numbers.
A matrix
x has full column rank
iff
x′x has full rank. See Rao, C., op. cit., p. 30.
The inverse of a symmetric positive definite matrix is necessarily a symmetric positive definite matrix. This can be shown by noting that a symmetric matrix is positive definite
iff all of its characteristic roots are positive, and the characteristic roots of
A
^{−1} are the reciprocals of the characteristic roots of
A.
Suppose
B is a
\( \left( {k\times k} \right) \) symmetric positive definite matrix. Let
C be a
\( \left( {r\times k} \right) \) matrix such that each row consists of all zeros except for a 1 in one position, and let the
rrows of C be linearly independent. Then
CBC′ defines an
\( \left( {r\times r} \right) \) principal submatrix of
B. Now note that
\( { l^{\prime}} \)
CBC′ =
_{ * }
^{ ′ }
B
_{ * } > 0 ∀ ≠ 0 since
_{*} =
C′ ≠
0 ∀ ≠
0 by the definition of
C, and
B is positive definite. Therefore, the principal submatrix
CBC′ is positive definite.
Consistency of
\( {\hat{\mathbf{\upbeta} }} \) can be proven under alternative conditions on
x. Judge, et al., (1982)
Introduction to the Theory and Practice of Econometrics, John Wiley, pp. 269–269, prove the result using the stronger condition that
\( {{\lim}_{{n\to \infty }}} \) n
^{−1}
x′x = Q, where Q is a finite, positive definitive matrix. Halbert White, (1984)
Asymptotic Theory for Econometricians, Academic Press, p. 20, assumes that n
^{−1}
x′x is bounded and uniformly positive definite, which is also a stronger condition than the one we use here.
We should note, however, that in the specification of some linear models, certain proxy variables might be used to explain
y that literally violate the boundedness assumption. For example, a linear “time trend”
t = (1,2,3,4,…) is sometimes used to explain an apparent upward or downward trend in E(
Y
_{ t })and the trend clearly violates the boundedness constraint. In such cases, one may wonder whether it is really to be believed that
t → ∞ is relevant in explaining
y, or whether the time trend is just an artifice relevant for a certain range of observations, but for which extrapolation ad infinitum is not appropriate.
Regarding the symmetric square root matrix
A
^{1/2}, note that
P′AP =
Λ, where
P is the orthogonal matrix of characteristic vectors of the symmetric positive semidefinite matrix
A, and
Λ is the diagonal matrix of characteristic roots. Then
A
^{1/2} =
PΛ
^{1/2}
P′, where
Λ
^{1/2} is the diagonal matrix formed from
Λ by taking the square root of each diagonal element. Note that since
P′P =
I,
A
^{1/2}
A
^{1/2} =
PΛ
^{1/2}
P′PΛ
^{ 1/2}
P′ =
PΛ
^{1/2}
Λ
^{1/2}
P′ =
PΛP′ =
A. The matrix square root is a continuous function of the elements of
A. Therefore,
\( {{\lim}_{{n\to \infty }}} \)(
A
_{n})
^{1/2} =
\( {\mathop{{\left( {{{{\lim }}_{{n\to \infty }}}{{{\bf A}}_n}} \right)}}\nolimits^{{1/2}} } \).
Positive semidefiniteness can be deduced from the fact that the characteristic roots of a symmetric idempotent matrix are all nonnegative, being a collection of 0’s and 1’s.
We are making the tacit assumption that x contains no lagged values of the dependent variable, which is as it must be if it is presumed that
x can be held fixed. In the event that
x contains lagged values of the dependent variable and the error terms are autocorrelated, then in general E((
X′X)
^{−1}
X′
\( \mathbf{\upvarepsilon} \)) ≠
0, and
\( {\hat{\mathbf{\upbeta} }} \) is biased. Issues related to this case are discussed in subsection 8.2.3.
The characteristic roots of τ
A are equal to the characteristic roots of
A times the scalar τ.
Note tr(
Ωx(
x′x)
^{−1}
x′) = tr(Ω
^{1/2}
x(
x′x)
^{−1}
x′Ω
^{1/2}), and since (
x′x)
^{−1} is positive definite,
Ω
^{1/2}
x(
x′x)
^{−1}
x′ Ω
^{1/2} is at least positive semidefinite and its trace must be nonnegative.
Recall that
\( \mathrm{\rm E}\left( {\mathbf{\upvarepsilon} \mathbf{\upvarepsilon} ^{\prime}} \right) = {\bf {Cov}}\left( \mathbf{\upvarepsilon} \right) + \mathrm{\rm E}\left( \mathbf{\upvarepsilon} \right)\mathrm{\rm E}\left( {\mathbf{\upvarepsilon} ^{\prime}} \right) \).
Note that
β
_{ i }
\( \in \)(−∞,∞), ∀
i, and
σ
^{2} > 0 are the
admissible parameter values for
Y ~
N(
xβ
, σ
^{ 2 }
I). It may be the case that only a subset of these values for the parameters are deemed to be relevant in a given estimation problem (e.g., a price effect may be restricted to be of one sign, or the realistic magnitude of the effect of an explanatory variable may be bounded). Restricting the parameter space comes under the realm of
prior information models, which we do not pursue here. So long as the admissible values of
β and
σ
^{2} form an open rectangle themselves, the range of
c(
β,
σ
^{2}) will contain an open rectangle.
Recall that
\( \arg {{\max}_w} \){
f(
w)}denotes the
argument value of
f(
w) that
maximizes f(
w), where
argument value means
the value of w. Also,
\( \arg {{\max}_{{w\in \Omega }}} \){
f(
w)} denotes the value of
\( w\in \Omega \) that maximizes
f(
w).
Recall that
\( {{\arg}_{{{\mathbf{\Theta}} \in \Omega }}}\left\{ {{\bf g}\left( {\mathbf{\Theta}} \right) = {\bf c}} \right\} \) represents the value of
ΘϵΩ that satisfies or solves
g(
Θ) =
c.
We are suppressing the fact that a maximum of
L(
Θ;
x) may not be attainable. For example, if the parameter space is an open interval and if the likelihood function is strictly monotonically increasing, then no maximum can be stated. If a maximum of
L(
Θ;
x) for
Θ ϵ Ω does not exist, then the MLE of
Θ does not exist.
In the case where the classical first order conditions are applicable, note that if
L(
Θ,
x) > 0 (which will necessarily be true at the maximum value), then
and thus any
Θ for which ∂
L(
Θ;
x)/∂
Θ =
0 also satisfies ∂ ln(
L(
Θ;
x))/∂
Θ =
0. Regarding second order conditions, note that if
Θ satisfies the first order conditions, then
since ∂
L(
Θ;
x)/∂
Θ =
0. Then since
L(
Θ;
x) > 0 at the maximum,
\( {{\partial}^2}\ln \left( {L\left( {{\mathbf{\Theta}}; \mathbf{x}} \right)} \right)/\partial {\mathbf{\Theta}} \partial {\mathbf{\Theta}} ^{\prime} \) is negative definite
iff
\( {{\partial}^2}\left( {L\left( {{\mathbf{\Theta}}; \mathbf{x}} \right)} \right)/\partial {\mathbf{\Theta}} \partial {\mathbf{\Theta}} ^{\prime} \) is negative definite.
$$ {\frac{{\partial \ln \left( {L(\boldsymbol{\varTheta}; \mathbf{x})} \right)}}{{\partial {\varTheta}}} = \frac{1}{{L(\boldsymbol{\varTheta}; \mathbf{x})}}\frac{{\partial L(\boldsymbol{\varTheta}; \mathbf{x})}}{{\partial \boldsymbol{\varTheta}}},} $$
$$ {\frac{{\mathop{\partial}\nolimits^2 \ln \left( {L(\boldsymbol{\varTheta}; \mathbf{x})} \right)}}{{\partial \boldsymbol{\varTheta} \partial \boldsymbol{\varTheta} ^{\prime}}} = \frac{1}{{L(\boldsymbol{\varTheta}; \mathbf{x})}} \frac{{\mathop{\partial}\nolimits^2 L(\boldsymbol{\varTheta}; \mathbf{x})}}{{\partial \boldsymbol{\varTheta} \partial \boldsymbol{\varTheta} ^{\prime}}}  \frac{{\partial L(\boldsymbol{\varTheta}; \mathbf{x})}}{{\partial \boldsymbol{\varTheta}}} \frac{{\partial L(\boldsymbol{\varTheta}; \mathbf{x})}}{{\partial \boldsymbol{\varTheta} ^{\prime}}} = \frac{1}{{L(\boldsymbol{\varTheta}; \mathbf{x})}} \frac{{\mathop{\partial}\nolimits^2 L(\boldsymbol{\varTheta}; \mathbf{x})}}{{\partial \boldsymbol{\varTheta} \partial \boldsymbol{\varTheta} ^{\prime}}}} $$
Economic theory may suggest constraints on the signs of some of the entries in
β (e.g., the effect of income on durables consumption will be positive), in which case
β ∈ Ω
_{β} ⊂ ℝ
^{ k } may be more appropriate.
In the event that a MLE is
not unique, then the
set of MLE’s is a function of any set of sufficient statistics. However, a particular MLE within the set of MLE’s need not necessarily be a function of (
s
_{1},…,
s
_{ r }), although it is always possible to choose an MLE that
is a function of (
s
_{1},…,
s
_{ r }). See Moore, D.S., (1971), “
Maximum Likelihood and Sufficient Statistics”, American Mathematical Monthly, January, pp. 50–52.
Alternatively, the solution for α can be determined by consulting tables generated by Chapman which were constructed specifically for this purpose (Chapman, D.G. “Estimating Parameters of a Truncated Gamma Distribution,”
Ann. Math. Stat., 27, 1956, pp. 498–506).
Bowman, K.O. and L.R. Shenton.
Properties of Estimators for the Gamma Distribution, Report CTC–1, Union Carbide Corp., Oak Ridge, Tennessee.
Norden, R.H. (1972; 1973) “A Survey of Maximum Likelihood Estimation,”
International Statistical Revue, (40): 329–354 and (41): 39–58.
See Lehmann, E. (1983)
Theory of Point Estimation, John Wiley and Sons, pp. 420–427.
We remind the reader of our tacit assumption that Θ is identified (Definition 7.2).
By
true value of Θ, we again mean that Θ
_{o} is the value of Θ ∈ Ω for which
f(
x;Θ
_{o}) ≡
L(Θ
_{o};
x) is the actual joint density function of the random sample
X. The value of Θ
_{ o } is generally unknown, and in the current context is the objective of point estimation.
If X
_{i} is a continuous random variable, then since Θ
_{o} is the true value of Θ,
because
f(
x
_{ i };Θ) is a probability density function. The discrete case is analogous.
$$ \mathrm{E}\left[ {f\left( {{{X}_i};\varTheta } \right)/f\left( {{{X}_i};{{\varTheta}_0}} \right)} \right] = \int_{{  \infty }}^{\infty } {\frac{{f\left( {{\mathbf{x}_i};\varTheta } \right)}}{{f\left( {{\mathbf{x}_i};{{\varTheta}_{\mathrm{o}}}} \right)}}} f\left( {{\mathbf{x}_i},{{\varTheta}_{\mathrm{o}}}} \right)d{\mathbf{x}_i} = \int_{{  \infty }}^{\infty } f \left( {{\mathbf{x}_i};\varTheta } \right)d{\mathbf{x}_i} = 1 $$
That this unique solution is a maximum can be demonstrated by noting that
\( {{{\partial}^2}\ln \left( {L({\Theta}; {\bf x})} \right)/\partial {{{\Theta}}^2} = n/{{{\Theta}}^2}  2\sum\nolimits_{{i = 1}}^n {{\mathbf{x}_i}} /{{\Theta}^3}} \), which when evaluated at the maximum likelihood estimate
\( { \hat{{\Theta}}} = \sum\nolimits_{{i = 1}}^n {{\mathbf{x}_i}/n} \), yields
\( \left( {\left( {{{\partial}^2}\ln \left( {L\left( {\hat{\theta };{\bf x}} \right)} \right)} \right)/\left( {\partial {{\Theta}^2}} \right)} \right) =  {{n}^3}/{{\left( {\sum\nolimits_{{i = 1}}^n {{\mathbf{x}_i}} } \right)}^2} < 0 \).
N(ε) is an open interval, the interior of a circle, the interior of a sphere, and the interior of a hypersphere in 1,2,3, and ≥ 4 dimension, respectively.
It is allowable that conditions (2) and (3) be violated on a set of
xvalues having probability zero.
Change max to sup if max does not exist.
This follows from Khinchin’s WLLN upon recognizing that the right hand side expression represents E(ln(
x
_{ i })).
Recall that the matrix square root is a
continuous function of its arguments, so that plim
\( {\left( {{\bf A}_n^{{1/2}}} \right) = {{{\left( {\rm p\lim {{{\bf A}}_n}} \right)}}^{{1/2}}}} \). Letting
\( {{{\mathbf{A}}_n} = {{n}^{{  1}}}\mathrm{E}\left( {\frac{{\partial \ln L\left( {{{\boldsymbol{\varTheta}}_{\mathrm{o}}};\mathbf{X}} \right)}}{{\partial \boldsymbol{\varTheta}}} \frac{{\partial \ln L\left( {{{\boldsymbol{\varTheta}}_{\mathrm{o}}};\mathbf{X}} \right)}}{{\partial \boldsymbol{\varTheta} ^{\prime}}}} \right)}. \)leads to
\( {{\rm {plim}}\left( {{\bf A}_n^{{1/2}}} \right) = {\bf M}{{{\left( {{{\mathbf{\Theta}}_\mathrm{{\rm o}}}} \right)}}^{{1/2}}}} \).
The gamma function, Γ(
α), is continuous in α, and its first two derivatives are continuous in
α, for
α > 0. Γ(
α) is in fact strictly convex, with its second order derivative strictly positive for
α > 0. See Bartle, op. cit., p. 282.
A way of deriving the expectations involving ln(
X
_{ i }) that is conceptually straightforward, albeit somewhat tedious algebraically, is first to derive the MGF of ln(
X
_{ i }), which is given by
β
^{ t }Γ(
α +
t)/Γ(
α). Then using the MGF in the usual way establishes the mean and variance of ln(
X
_{ i }). The covariance between
\( \left( {{{X}_i}/\beta_\mathrm{{\rm o}}^2} \right) \) and ln(
X
_{ i }) can be established by noting that
where E
_{*} denotes an expectation of ln(
X
_{ i }) using a gamma density having parameter values
α
_{o} + 1 and
β
_{o}. Then
\( \mathrm{\rm cov} \left( {\left( {{{X}_i}/\beta_\mathrm{{\rm o}}^2} \right),\ln ({{X}_i})} \right) \) is equal to
$$ \begin{array}{lll} \beta_{\mathrm{o}}^{{  2}}\mathrm{E}({{X}_i}\ln ({{X}_i})) = \beta_{\mathrm{o}}^{{  2}}\left[ {\begin{array}{lll}{*{20}{c}} \hfill {\frac{1}{{\beta_{\mathrm{o}}^{{{{\alpha}_{\mathrm{o}}}}}\varGamma \left( {{{\alpha}_{\mathrm{o}}}} \right)}}} \\ \end{array} } \right]\int_{\mathrm{o}}^{\infty } {\left( {\ln ({\mathbf{x}_i})} \right)} {\mathbf{x}_i}{{\alpha}_{\mathrm{o}}}{{e}^{{  {\mathbf{x}_i}/{{\beta}_{\mathrm{o}}}}}}d{\mathbf{x}_i} \\ = {{\alpha}_{\mathrm{o}}}\beta_{\mathrm{o}}^{{  1}}{{\mathrm{E}}_{*}}(\ln ({{X}_i})) \\ \end{array}. $$
$$ {{\alpha}_\mathrm{{\rm o}}}\beta_\mathrm{{\rm o}}^{{  1}}{\mathrm{{\rm E}}_{*}}(\ln ({{X}_i}))  \left( \mathrm{{\rm E}(\ln ({{X}_i}))\mathrm{\rm E}\left( {{{X}_i}/\beta_\mathrm{{\rm o}}^2} \right)} \right) = {{\alpha}_\mathrm{{\rm o}}}\beta_\mathrm{{\rm o}}^{{  1}}\left( {{\mathrm{{\rm E}}_{*}}(\ln ({{X}_i}))  \mathrm{\rm E}(\ln ({{X}_i}))} \right) = \beta_\mathrm{{\rm o}}^{{  1}}. $$
Pearson, K.P. (1894),
Contributions to the Mathematical Theory of Evolution, Phil. Trans. R. Soc. London A, 185: pp. 71–110.
Under the assumed conditions, for each density (conditional or not),
in the continuous case, and likewise in the discrete case.
$$ {{\mathrm{{\rm E}}_{{{{{\mathbf{\Theta}}}_\mathrm{{\rm o}}}}}} \left( {\frac{{\partial \ln f\left( {{\bf Z};{{{\mathbf{\Theta}}}_\mathrm{{\rm o}}}} \right)}}{{\partial {\mathbf{\Theta}}}}} \right) = \int_{{  \infty }}^{\infty } {\frac{{\partial f\left( {{\bf z};{{{\mathbf{\Theta}}}_\mathrm{{\rm o}}}} \right)}}{{\partial {\mathbf{\Theta}}}}} d{z} = \frac{{\partial \int_{{  \infty }}^{\infty } {f\left( {{\bf z};{{{\mathbf{\Theta}}}_\mathrm{{\rm o}}}} \right)} d{z}}}{{\partial {\mathbf{\Theta}}}} = {0}} $$
See T. Amemiya, op. cit., pp. 111–112 for a closely related theorem relating to extremum estimators. The assumptions of Theorem 8.26 can be weakened to assuming uniqueness and consistency of
\( {{ \hat{{\mathbf{\Theta}}}}} \) for use in this theorem.
We are suppressing the fact that each row of the matrix
\( {{\partial}^2}{\rm Q_n} ({\rm Y},{{\mathbf{\Theta}}_{*}})/\partial \mathbf{\Theta} \partial \mathbf{\Theta} ^{\prime} \) will generally require a different value of
Θ
_{*} defined by a different value of λ for the representation to hold. The conclusions of the argument will remain the same.
This proof is related to a proof by T. Amemiya,(1985),
Advanced Econometrics, Harvard University Press, p. 107, dealing with the consistency of extremum estimators.
See Bartle, R.G.,(1976)
The Elements of Real Analysis, 2nd Edition, John Wiley, p. 371.
 Title
 Point Estimation Methods
 DOI
 https://doi.org/10.1007/9781461450221_8
 Author:

Ron C. Mittelhammer
 Publisher
 Springer New York
 Sequence number
 8
 Chapter number
 8