main-content

## Über dieses Buch

Generalized method of moments (GMM) estimation of nonlinear systems has two important advantages over conventional maximum likelihood (ML) estimation: GMM estimation usually requires less restrictive distributional assumptions and remains computationally attractive when ML estimation becomes burdensome or even impossible. This book presents an in-depth treatment of the conditional moment approach to GMM estimation of models frequently encountered in applied microeconometrics. It covers both large sample and small sample properties of conditional moment estimators and provides an application to empirical industrial organization. With its comprehensive and up-to-date coverage of the subject which includes topics like bootstrapping and empirical likelihood techniques, the book addresses scientists, graduate students and professionals in applied econometrics.

## Inhaltsverzeichnis

### 1. Introduction

Abstract
The generalized method of moments (GMM) estimation principle compares favorably to alternative methods in numerous estimation problems frequently encountered in applied econometric work. Compared to full information maximum likelihood (ML) estimation the GMM approach requires less restrictive distributional assumptions to obtain a consistent and asymptotically normal distributed estimator of the unknown parameters of interest as shown in the seminal paper by Hansen (1982). In the most simple case only the population mean of some data dependent function has to be specified while the ML principle requires a specification of the complete distribution function. Therefore GMM estimators are usually more robust against distributional misspecification than ML estimators. In addition, GMM estimation of complicated econometric models usually remains computationally attractive when ML estimation by means of conventional numerical computation algorithms becomes burdensome or even impossible. Both properties of the GMM estimator are of major importance when the econometric model consists of multiple estimating equations which are nonlinear in the parameters to be estimated. GMM estimation of such nonlinear equation systems is the main topic of this monograph.
Joachim Inkmann

### 2. The Conditional Moment Approach to GMM Estimation

Abstract
Let Z be a random vector which includes both endogenous and explanatory variables. Suppose the data {Zi : i = 1,···, n } consists of n independent draws from the probability distribution of Z. Assume the equation system of interest can be represented by a s × 1 residual vector ρ(Z,θ)= (ρ1(Z,θ),ρ2(Z,θ),···,ρs(Z,θ))′ whose elements are possibly nonlinear functions of an unknown q×1 parameter vector θ. In the following ρ(Z,θ) will be referred to as the vector of conditional moment functions. The conditional moment estimation principle rests on the assumption that the probability distribution of Z satisfies the conditional moment restrictions
$$E\left[ {\rho \left( {Z,{\theta _0}} \right)|X} \right] = 0,$$
(2.1.1)
where θ0 denotes the population parameter vector to be estimated and X a vector of conditioning variables or, equivalently, instruments. This assumption states that each residual is orthogonal to all instruments in the conditional mean sense. Eventually, the following set of weaker conditional moment restrictions will be imposed
$$E\left[ {{\rho _1}\left( {Z,{\theta _0}} \right)|{X_1}} \right] = 0,for1 = 1, \cdots ,s$$
(2.1.2)
where X1 is a subvector of X having instruments for equation 1 which can be correlated with the other equations’ residuals. Whenever (2.1.2) is assumed to hold in the following it is implicitly assumed that at least in one equation X1 is a proper subvector of X because otherwise (2.1.1) and (2.1.2) are completely equivalent.
Joachim Inkmann

### 3. Asymptotic Properties of GMM Estimators

Abstract
For the discussion of consistency of the GMM estimator it is convenient to start from a basic consistency theorem for the large class of M-estimators defined by either maximizing or minimizing a certain objective function subject to the constraint given by the parameter space (this is the definition used by Gouriéroux and Monfort, 1995a, p. 209). The term ‘M-estimator’ was introduced by Huber (1964) as an abbreviation for minimization estimators. The class of M-estimators also includes maximization approaches like ML and pseudo ML. Accordingly, Amemiya (1985, p. 105) introduces M-estimators as ‘maximum-likelihood-like’ estimators although this seems to be a rather irritating translation having in mind the substantially different approaches summarized under the name M-estimation. Amemiya also uses the terms M-estimator and extremum estimator completely equivalent while other authors, e.g. Newey and McFadden (1994), restrict the latter designation to a subgroup of M-estimators with a quadratic form objective function. These authors only consider estimators optimizing a sample average as belonging to the class of M-estimators.
Joachim Inkmann

### 4. Computation of GMM Estimators

Abstract
For both consistency and asymptotic normality of the GMM estimator it is not necessary to assume that $$\hat \theta$$ precisely minimizes the GMM objective function (2.1.6). Andrews (1997) points out that for Theorem 2 (consistency) $$\hat \theta$$ is required to be within Op(1) of the global minimum and for Theorem 3 (asymptotic normality), $$\hat \theta$$ is required to be within Op(n-0.5), where Xn = Op(an) conveniently abbreviates plimXn/an =0 (cf. Amemiya, 1985, p. 89). The estimator $$\hat \theta$$ is usually obtained by iterative numerical optimization methods like the Newton-Raphson algorithm (cf. Amemiya, 1985, ch. 4.4). Starting from any value of the parameter space this procedure produces a sequence of estimates $$\hat \theta$$j ( j = 0,1,2,…) which hopefully converges to the global minimum of the objective function. A typical Newton-Raphson iteration to the solution of the minimization problem (2.1.6) has the form
$$\tilde \theta _{j + 1} = \tilde \theta _j - \left[ {\left( {\tfrac{1} {n}\sum\limits_{i = 1}^n {G\left( {Z_i ,\tilde \theta _j } \right)} } \right)^\prime \hat W\left( {\tfrac{1} {n}\sum\limits_{i = 1}^n {G\left( {Z_i ,\tilde \theta _j } \right)} } \right)} \right]^{ - 1} x \left( {\tfrac{1} {n}\sum\limits_{i = 1}^n {G\left( {Z_i ,\tilde \theta _j } \right)} } \right)^\prime \hat W\left( {\tfrac{1} {n}\sum\limits_{i = 1}^n {G\left( {Z_i ,\tilde \theta _j } \right)} } \right)$$
(4.1.1)
Convergence to a global minimum is ensured by this algorithm if the objective function is convex which, however, should be the exception for many nonlinear models encountered in microeconometric applications as discussed in the previous chapter. Otherwise the iteration routines could run into a local minimum which renders the parameter estimators inconsistent and alters their asymptotic distribution. To circumvent this problem Andrews (1997) proposes an optimization algorithm which guarantees consistency and asymptotic normality of the resulting GMM estimators provided that r > q holds. Andrews’ method is described in detail in the next section.
Joachim Inkmann

### 5. Asymptotic Efficiency Bounds

Abstract
Any consistent and asymptotically normal estimator with a variance-covariance matrix of the stabilizing transformation attaining the Cramér-Rao efficiency bound is said to be asymptotically efficient (cf. Amemiya, 1985, p. 124). It is well known that the Cramér-Rao bound is given by the inverse of the information matrix. Throughout this chapter, let J(θ0) denote the information matrix for a single observation, evaluated at the true parameter vector, defined as
$$J\left( {\theta _0 } \right) \equiv - E\left[ {\frac{{\partial ^2 \ln f\left( {Z|\theta _0 } \right)}} {{\partial \theta \partial \theta '}}} \right],$$
(5.1.1)
where ∂2 lnf (z | θ )/∂θ∂θ′ is the Hessian matrix for a single observation containing the second derivatives of its loglikelihood contribution ln f(z | θ). Let S(θ) ≡ ∂ln f (z | θ)/∂θ denote the vector of first derivatives of the loglikelihood contribution of a single observation, henceforth referred to as the score. Using the information matrix equality at the individual level, (5.1.1) can be rewritten as
$$J\left( {\theta _0 } \right) = E\left[ {S(\theta _0 )S(\theta _0 )'} \right] = V\left[ {S(\theta _0 )} \right],$$
(5.1.2)
which will be more convenient for the results stated in the following two sections.
Joachim Inkmann

### 6. Overidentifying Restrictions

Abstract
In this section it is shown that the asymptotic efficiency of GMM estimators may increase with an increasing number of overidentifying restrictions. According to the two components of the unconditional moment functions defined in (2.1.4), there are generally two approaches to gain overidentifying restrictions which may be combined. The first approach takes the vector of conditional moment functions as a given and enlarges the set of instruments with additional instruments which do not depend on additional unknown parameters. Using an argument of Davidson and MacKinnon (1993, p. 603), it can be readily seen that the resulting GMM estimators are asymptotically at least as efficient as the ones obtained using the original set of instruments. Taking into account the findings of Section 5.2, this result is shown here for the optimal choice of the weight matrix. Let ψ(Z, θ)= A(X)ρ(Z, θ) denote the unconditional moment functions of the model as introduced in Section 2.1 using the enlarged set of instruments. As seen above the asymptotic variance-covariance matrix of $$\hat \theta$$ equals (G0′V 0 -1 G0)-1 .For some r′ ≤ r consider a r×r′ transformation matrix S which selects some instruments, or linear combinations of some instruments, contained in A(X) such that S′A(X) is a new matrix of instruments with dimension r′×s. The asymptotic variance-covariance matrix of $$\hat \theta$$ associated with this transformed model can be derived as (G0′S(S′V0S)-1S′G0)-1. In the following it is proven that this matrix exceeds the original variance-covariance matrix, in the sense that their difference (G0′S(S′V0S)-1S′G0)-1 − (G0′V 0 -1 G0)-1 is positive semidefinite.
Joachim Inkmann

### 7. GMM Estimation with Optimal Weights

Abstract
Having established the lower bound Λu of the GMM variance-covariance matrix for given unconditional moment functions in Section 5.2 which is attained by an optimal choice of the weight matrix $${\rm{\hat W}}$$ such that W = V 0 -1 , a consistent estimator $${{\rm{\hat V}}^{{\rm{ - 1}}}}$$ of V 0 -1 remains to be derived in order to obtain a feasible GMM estimator. A simple estimator for V0 has been already introduced at the end of Section 3.2. By continuity of matrix inversion a consistent estimator of V 0 -1 results from
$$\hat V^{{\text{ - 1}}} = \left[ {\tfrac{1} {n}\sum\limits_{i = 1}^n \psi \left( {Z_i ,\hat \theta _1 } \right)\psi \left( {Z_i ,\hat \theta _1 } \right)^\prime } \right]^{ - 1}$$
(7.1.1)
with $${{\rm{\hat \theta }}_{\rm{1}}}$$ being some consistent first step estimator. The usual procedure in applied work consists of computing $${{\rm{\hat \theta }}_{\rm{1}}}$$ in a first step by minimizing the GMM objective function (2.1.6) for a weight matrix which is independent of $$\hat \theta$$ , e.g. the identity matrix, and obtaining the final GMM estimator $${{\rm{\hat \theta }}_{\rm{2}}}$$ which reaches the lower bound of the asymptotic variance-covariance matrix in a second step using the weight matrix $${\rm{\hat W}}$$ = $${{\rm{\hat V}}^{{\rm{ - 1}}}}$$. A consistent estimator $${\hat \Lambda _{\rm{u}}}$$ of the asymptotic variancecovariance matrix of the stabilizing transformation of $${{\rm{\hat \theta }}_{\rm{2}}}$$ is obtained afterwards by substituting the elements of Λ u = (G0′V 0 -1 G0)-1 with consistent plug-in estimators. The matrix V 0 -1 can be estimated using either (7.1.1) or a corresponding expression evaluated at the final estimator $${{\rm{\hat \theta }}_{\rm{2}}}$$. Newey and McFadden (1994, p. 2161) point out that there seems to be no evidence if any of these two methods creates efficiency advantages in small samples. A consistent estimator of G0 was introduced in Section 3.2 and replaces the population moment by a sample moment.
Joachim Inkmann

### 8. GMM Estimation with Optimal Instruments

Abstract
It has been shown in Section 6.3 that a GMM estimator attains the semiparametric efficiency bound (5.3.11) for given conditional moment functions if the instruments are chosen optimally. For the case of strict exogeneity, these optimal instruments were given in (5.3.12) as B(X)=D0′Ω 0 -1 , ignoring the transformation matrix F, with D0 = E[∂ρ (Z,θ0)/∂θ′|X] and Ω0 = E[ρ (Z,θ0)ρ (Z,θ0)′|X].32 For the derivation of the lower efficiency bound it has been assumed that the conditional probability density function of Y depends on the parameters of interest, θ , and possibly on additional parameters, η . In the current section this dependence is explicitly taken into account by writing D(X,τ) and Ω(X,τ) with τ = (θ′,η′)′ , hence D0 = D(X,τ0) and Ω0 = Ω(X,τ0). Note that the conditional expectations are usually functions of X which justifies these expressions. Obviously, the optimal instruments are not available and have to be estimated in order to obtain a feasible GMM estimator. Two estimation strategies can be distinguished and will be discussed throughout this chapter. The first strategy, presented in this section, rests on substituting the unknown τ0 with some consistent first step estimator $$\hat \tau$$. Assuming that the functional form of D0 and Ω0 is known, estimators of the unknown conditional expectations follow from $$D\left( X \right) = D\left( {X,\hat \tau } \right)$$ and $$\hat \Omega \left( X \right) = \Omega \left( {X,\hat \tau } \right)$$. The second estimation strategy, which will be discussed throughout the Sections 8.2 − 8.5, rests on an application of nonparametric estimation techniques to obtain the estimators $$\hat D\left( X \right)$$ and $$\hat \Omega \left( X \right)$$ of D0 and Ω0.
Joachim Inkmann

### 9. Monte Carlo Investigation

Abstract
With the first Monte Carlo experiment (which is extracted from Inkmann, 2000) it is attempted to provide evidence on the small sample performance of three estimators which are efficient in three different classes of estimators using an increasing amount of distributional information. The first estimator is the conventional two-step GMM estimator, labeled GMM2 from now on, using the estimator (7.1.1) of the optimal weight matrix. It has been shown in Section 5.2 that this estimator reaches the efficiency bound Λu for a given set of unconditional moment functions. The second estimator under consideration results from using the GMM2 estimator as an initial estimator for the estimation of the unknown optimal instruments. The three-step estimator, GMM3, which is based on these optimal instruments attains the efficiency bound Λc for a given set of conditional moment functions. Because conditional moment restrictions imply an infinite set of orthogonality conditions, the asymptotic efficiency advantage of GMM3 is achieved by imposing a stronger distributional assumption. For the estimation of the optimal instruments the K-nearest neighbor approach presented in Section 8.3 is chosen which is particularly simple to implement. The third estimator is a maximum likelihood estimator which requires a specification of the complete conditional distribution and achieves the efficiency bound in the class of parametric estimators. Therefore the ML estimator can be regarded as a benchmark for the two GMM estimators.
Joachim Inkmann

### 10. Theory of Cooperative R&D

Abstract
It is generally accepted that the incentives of firms to invest in research and development (R&D) are distorted because of the public good characteristic of new information. In particular, the appropriability problem has been widely discussed in the literature (cf. Spence, 1984, Cohen and Levinthal, 1989), which causes firms to underinvest in R&D because they can not completely internalize the social returns of their private efforts in the presence of R&D spillovers. Three instruments are usually considered to restore the firms’ incentives to engage in R&D: Tax policies and direct subsidies, ex-post R&D cooperation through patents and licensing, and ex-ante R&D cooperation (cf. Katz and Ordover, 1990). While the first two instruments require government intervention to determine taxes and subsidies or to strengthen property rights, the third instrument is assumed to work through private incentives because of the possibility to internalize R&D spillovers between cooperating firms.41 Other advantages of R&D cooperation include the elimination of wasteful duplication of R&D efforts and the distribution of risk and fixed costs among participants (cf. Jacquemin, 1988).
Joachim Inkmann

### 11. Empirical Evidence on Cooperative R&D

Abstract
The scope of this chapter is limited to the empirical content of the qualitative theoretical results derived in Section 10.5. It is not intended to estimate structural form equations of the theoretical model which would be too ambitious given the simplifying assumptions of the oligopoly game (e.g. symmetry of firms). In accordance with Slade (1995), the static theoretical model is used as a tool to ‘provide useful summary statistics concerning the outcomes of oligopolistic interactions’ (p. 369). Basically, reduced form R&D intensity equations are estimated to achieve some insight if these outcomes are reflected in real data.
Joachim Inkmann

### 12. Conclusion

Abstract
This monograph presents a comprehensive treatment of the conditional moment approach to GMM estimation of nonlinear equation systems. Particular attention is paid to the analysis of the large sample efficiency properties of GMM estimators. The semiparametric efficiency bounds for given orthogonality conditions and given conditional moment restrictions are derived and feasible GMM estimators attaining these bounds are presented. Conditions for asymptotic efficiency gains through the use of additional moment functions providing over-identifying restrictions are derived and different strategies are proposed for obtaining these additional moment functions in applied work. While most of the large sample properties of the GMM estimators are well understood there remain some open questions. For example, a procedure for obtaining the optimal instruments under the assumption that admissible instruments for one equation are correlated with an other equation’s residual is still missing.
Joachim Inkmann

### Backmatter

Weitere Informationen