Elsevier

Journal of Macroeconomics

Volume 51, March 2017, Pages 143-161
Journal of Macroeconomics

Endogeneity bias and growth regressions

https://doi.org/10.1016/j.jmacro.2017.01.001Get rights and content

Abstract

The problem of regressor endogeneity stemming from reverse casuality is one that has plagued economists working in the field of empirical economic growth for some time. This paper attempts to address the relevant magnitude of this issue in the context of growth regressions based on the Solow growth model. The paper develops a method of running Monte Carlo simulations that allows us to generate simulated data that match the moments of observed real-world data typically used in such regressions while simultaneously allowing us to impose arbitrarily high correlations between the steady-state determinants of the Solow model and the unobserved residual term of the data-generating process. After running simulations that represent a wide sample of the mathematically-possible correlations, we conclude that a between estimator or a random effects estimator will deliever a lower average absolute bias across all coefficients than alternative estimators in almost all of our simulations. Conversely, estimators that use within-country variation will generate lower biases when looking solely at rates of convergence. Furthermore, we conclude that these results are robust when restricting our sample of simulations to several subsets of the assumed parameters and to changing our assumptions about country fixed-effects terms.

Introduction

The problem of endogeneity bias is a central concern in any comparative study of economic growth. Reverse causality and omitted variables bias are the two most common sources of endogeneity bias to which critics of growth regressions refer.1 These sources of bias are surely present in most, if not all, least-squares estimates of the determinants of growth, yet the quantitative magnitude of this bias has not been assessed. In most instances we care about the sign, rough magnitude and statistical significance of the estimated coefficient on a growth determinant, rather than a precise effect. If the quantitative magnitude of endogeneity bias is small, then the endless (and probably futile) quest for the perfect instrument may be misplaced – particularly because instrumental variables estimation is fraught with problems of its own, such as weak instruments and doubts about the validity of the instruments. In small sample studies of the type typically found in growth regressions, we cannot ignore the efficiency of our estimators either. Using Monte Carlo simulations, this paper takes a first step toward evaluating the magnitude of endogeneity bias under various assumptions about the extent of endogeneity itself.

A very simple univariate example may serve to illustrate our goal. In a simple cross-sectional regression equation of the form yi=β0+β1xi+ɛi, the expected value of the OLS estimate of the main coefficient of interest β1 can be written as: E[β^1]=β1+Cov(xi,ɛi)Var(xi)This equation shows that a necessary condition for an unbiased estimator is zero covariance between regressor and the residual term. If this condition does not hold, the extent of resulting endogeneity bias depends on the relative magnitude of the variance of the regressor, compared to its covariance with the residual term. The variance of the regressor is observable from the data. The question we ask is: by varying the covariance between the residual term and the regressor, for a given variance of the regressor, how much bias do we generate on β^1?

The specific application that we consider is growth regressions. We start from a data generating process based on the augmented Solow (1956) model. We chose this model largely because of its empirical tractability, and also because it serves as a theoretical basis for a wide body of empirical work on the determinants of growth (see for instance Barro and Sala-i Martin, 2003). The empirical literature that has attempted to estimate the parameters of the Solow model has used a wide variety of panel data estimators.2 These various estimators take different approaches to the empirical issues created by the estimation of the model. In particular, some use a GMM approach with instrumental variables designed to address the potential problem of endogeneity bias in the regressions. As we note below, all of these estimators have some disadvantages, and the relative magnitude of the errors generated thereby are not easily assessed using econometric theory. Consequently, we use Monte Carlo techniques to assess the average absolute bias of several of the estimators from this literature in the presence of endogeneity bias.3 Our method is similar to that used in Hauk and Wacziarg (2009). However, this paper extends their analysis in several dimensions. Whereas that paper focused on the measurement error in the regressors, this study focuses on the properties of the residual term.4 We generate simulated data based on moments observed from real data and fix the parameters of the Solow model to values commonly used in the theoretical literature. Also, using the latest data from version 8.1 of the Penn World Tables,5 we are able to extend our dataset for one extra decade. We generate an error term that is allowed to covary to a specified degree with the steady-state determinants of income. We then run regressions using various panel data estimators on this simulated data, and compare the average of the absolute biases of the estimates obtained over many runs to the known, “true” parameters of the model.

The panel data estimators that we focus on in this paper are the fixed effects estimator (henceforth, FE), the between estimator (BE), the “Mankiw et al. (1992)” estimator (MRW),6 the random effects estimator (RE), the Arellano and Bond GMM estimator (AB), and the Blundell and Bond system GMM estimator (BB). We also consider the effects of differing assumptions about country-level heterogeneity across our simulations.

Our findings suggest that the BE and RE estimators (i.e. the “cross-sectional” estimators that use across country variation to identify their estimated parameters) perform better than the other estimators, if our goal is to minimize the average absolute bias across all of the estimated coefficients. We come to this conclusion after examining the performance of the estimators listed above across a wide sample of mathematically-possible assumptions about the extent of the correlation between the steady-state determinants of the Solow growth model and the unobserved residual terms from growth regressions, and within several sub-samples of the Monte Carlo simulations that we conduct. However, as much of this literature has been focused primarily on estimating the rate at which countries converge to a growth steady state, we also look at the performance of the various estimators at estimating just the coefficient on lagged income, which tells us the rate of convergence. Here we find that the various “within-country” estimators (FE, AB and BB) perform better than their cross-sectional counterparts. Even here, though, the two GMM estimators generally do not perform noticably better than FE. Because one estimator does not clearly dominate the other estimators across all simultations, we conclude that the “best” estimator to use in growth regressions will depend on the question that is being asked, and to some extent, on the assumed level of regressor endogeneity.

This paper is structured as follows: Section 2 briefly discusses theoretical considerations related to the methodology of growth regressions. Section 3 presents our basic simulation methodology. Section 4 discusses our results, Section 5 does a robustness check using differing assumptions about country-level heterogeneity, Section 6 does a robustness check looking at sample size variation, and Section 7 concludes.

Section snippets

Growth regressions and the solow model

The theoretical basis for our data-generating process is Solow (1956) model. We choose it, not because of any prior beliefs that it is a particularly compelling model of growth, but because it is tractable and constitutes arguably the only strict theoretical basis for the specific functional forms often estimated in the vast cross-country growth literature. This model is also well-suited for generating simulated data. Mankiw et al. (1992) and Islam (1995) showed that the Solow growth model can

Simulation methodology

We take as our starting point the Solow model of economic growth, as transformed into a linear regression model by Mankiw et al. (1992) and Islam (1995). We collect the data typically used in such regressions – PPP-adjusted per-capita GDP (log yit in the regressions), investment as a share of total GDP (logsk,itτ) and population growth (log(n+g+δ)itτ)13

Table 1, Row 1 – no correlation between steady-state determinants and residuals

We begin our analysis of the results by looking at Table 1, in which the correlations between the residuals and all three steady-state determinants are fixed at specified levels.20 The results of our baseline case, with no correlation between the residuals

Extensions: changing assumptions about country-level heterogeneity

As noted above, we have assumed thus far that the correlations between the country fixed-effects term and the regressors used in our data-generating process are 50% as large as those predicted by a fixed-effects regression on the real-world data. Similarly, we assume that the variance of the fixed-effects term is 50% as large as that predicted by the same regression. The assumption that the “real” correlations and variance are smaller than those predicted in a fixed-effects regression is

Extensions: changing the sample size

One of the more difficult problems to address in the growth regressions literature is that researchers only have a limited number of observations to use in their empirical analyses. As of this writing, there are 195 independent countries in the world, and not all of them have the relevant economic data going back several decades needed to merit their inclusion as an observation in a growth regression. In particular, the simulations described in this article use N=97 as the cross-sectional

Conclusion

This paper has examined the role that endogenous regressors play in growth regressions based on the Solow growth model. We have derived a method of running Monte Carlo simulations that generates data to match the real-world moments of observable growth data while allowing us to impose arbitrary correlation levels between the steady-state determinants of the Solow model and the unobservable residual term.

As a result of our simulations, we have concluded that the BE and RE estimators (especially

References (28)

  • B. Bernanke et al.

    Is growth exogenous? Taking Mankiw, Romer and weil seriously

    NBER Macroecon. Annu.

    (2001)
  • Bond, S., Hoeffler, A., Temple, J., 2001. GMM Estimation of Empirical Growth Models. CEPR Discussion Paper...
  • F. Caselli et al.

    Reopening the convergence debate: a new look at cross-country growth empirics

    J. Econ. Growth

    (1996)
  • F. Caselli et al.

    The marginal product of capital

    Q. J. Econ.

    (2007)
  • Cited by (10)

    View all citing articles on Scopus
    View full text