23112020  Issue 1/2021 Open Access
A homogeneous approach to testing for Granger noncausality in heterogeneous panels
 Journal:
 Empirical Economics > Issue 1/2021
Important notes
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
Predictive causality and feedback between variables is one of the main subjects of applied time series analysis. Granger (
1969) provided a definition that allows formal statistical testing of the hypothesis that one variable is not temporally related to (or does not “Grangercause”) another one. Besides time series models, this hypothesis is also important in panel data analysis when examining relationships between macroeconomic or microeconomic variables.
The seminal paper of HoltzEakin et al. (
1988) provided one of the early contributions to the panel data literature on Granger noncausality testing. Using Anderson and Hsiao (
1982) type moment conditions, the authors put forward a Generalised Method of Moments (GMM) testing framework for short
T panels with homogeneous coefficients. Unfortunately, this approach is less appealing when
T is sizeable. This is due to the wellknown problem of using too many moment conditions, which often renders the usual GMMbased inference highly inaccurate. While there exist alternative fixed
T procedures that can be applicable to cases where
T is large (e.g. those of Binder et al.
2005; Karavias and Tzavalis
2017; Juodis
2013; Arellano
2016; Juodis
2018), these methods are designed to estimate panels with homogeneous slope parameters only. Thus, when feedback based on past own values is heterogeneous (i.e. the autoregressive parameters vary across individuals), inferences may not be valid even asymptotically.
Advertisement
For the reasons above, one of the most popular approaches among practitioners has been the one proposed by Dumitrescu and Hurlin (
2012), which can accommodate heterogeneous slopes under both null and alternative hypotheses. Their approach is reminiscent of the socalled IPS panel unit root test for heterogeneous panels proposed by Im et al. (
2003) and involves averaging of individual Wald statistics. The resulting standardized Wald test statistic has asymptotic normal limit as
\(T\rightarrow \infty \) followed by
\(N\rightarrow \infty \). However, this approach does not account for “Nickell” bias, and therefore, it is theoretically justified only for sequences with
\(N/T^{2}\rightarrow 0\), as it is the case with standard MeanGroup type approaches.
^{1}
The aim of this paper is to propose a new test for Granger noncausality that explicitly accounts for “Nickell” bias and is valid in both homogeneous and heterogeneous panels. The novelty of our approach comes from exploiting the fact that under the null hypothesis, while the individual effects and the autoregressive parameters may be heterogeneous across individuals, the Grangercausation parameters are all equal to zero and thus they are homogeneous. We therefore propose the use of a pooled estimator for these parameters only. Pooling over cross sections guarantees that the estimator has the faster
\(\sqrt{NT}\) convergence rate.
The pooled estimator suffers from the incidental parameters problem of Neyman and Scott (
1948) due to the presence of the predetermined regressors, see, e.g. Nickell (
1981) and Karavias and Tzavalis (
2016). This result implies that standard tests for pooled estimators do not control size asymptotically, unless
\(N<<T\). To overcome this problem, we use the idea of Split Panel Jackknife (SPJ) of Dhaene and Jochmans (
2015) and construct an estimator that is free from the “Nickell bias”. This type of bias correction works very well under circumstances that are empirically relevant: moderate time dimension, heterogeneous nuisance parameters, and high persistence, as argued by Dhaene and Jochmans (
2015), FernándezVal and Lee (
2013) and Chambers (
2013), respectively. Furthermore, Chudik et al. (
2018) argue that SPJ procedures are suitable so long as
\(N/T^{3} \rightarrow 0\). Thus, we test the null hypothesis of Granger noncausality by using a Wald test based on our biascorrected estimator.
A Monte Carlo study shows that the proposed method has good finite sample properties even in panels with a moderate time dimension. In contrast, the Wald statistic of Dumitrescu and Hurlin (
2012) can suffer from substantial size distortions, especially when
\(T<<N\). In terms of power, the proposed method appears to dominate the method of Dumitrescu and Hurlin (
2012), especially so in panels with
N and
T both large.
Advertisement
Using a panel data set of 350 U.S. banks observed during the period 2006:Q12019:Q4, we test for Granger noncausality between banks’ profitability and cost efficiency. The null hypothesis is rejected in all cases, except for large banks during a period spanning the financial crisis (2007–2009) and prior to the introduction of the Dodd–Frank Act in 2011. This outcome may be conducive of past moral hazardtype behaviour of large financial institutions.
The remainder of the present paper is organized as follows: Sect.
2 sets up the model and the hypothesis of interest. Section
3 outlines the SPJ estimator and the proposed test statistic. Section
4 studies the finite sample performance of the approach using Monte Carlo experiments. Section
5 presents the empirical illustration, and Sect.
6 concludes.
2 Testing framework
We consider a simple linear dynamic panel data model with a single covariate
\(x_{i,t}\):
for
\(i=1,\ldots ,N\), where
\(\phi _{0,i}\) captures the individualspecific fixed effects,
\(\varepsilon _{i,t}\) denotes the innovation for individual
i at time
t,
\(\phi _{p,i}\) denotes the heterogeneous autoregressive coefficients and
\(\beta _{q,i}\) denotes the heterogeneous feedback coefficients or Granger causation parameters.
^{2} Thus, we assume that
\(y_{i,t}\) follows an ARDL(P,Q) process; more generally,
\(y_{i,t}\) can be considered as one of the equations of a joint VAR model for
\((y_{i,t},x_{i,t})'\). Such bivariate system is studied for simplicity of presentation, as our results are straightforwardly extendable to multivariate systems.
^{3}
$$\begin{aligned} y_{i,t}=\phi _{0,i}+\sum _{p=1}^{P}\phi _{p,i}y_{i,tp}+\sum _{q=1}^{Q} \beta _{q,i} x_{i,tq}+\varepsilon _{i,t}; \quad t=1,\ldots ,T, \end{aligned}$$
(2.1)
The null hypothesis that the time series
\({x_{i,t}}\) does not Grangercause (linearly) the time series
\(y_{i,t}\) can be formulated as a set of linear restrictions on the
\(\beta \)’s in Eq. (
2.1):
against the alternative
The model, null and alternative hypotheses presented here are as in Dumitrescu and Hurlin (
2012). Similarly to the case of panel unit root testing, rejection of the null hypothesis should be interpreted as evidence of the existence of a large enough number of crosssectional units
i in which the null hypothesis is violated (see, e.g. Pesaran
2012).
$$\begin{aligned} H_{0}: \quad \beta _{q,i}=0, \quad \text {for all}\ i\ \text {and}\ q, \end{aligned}$$
(2.2)
$$\begin{aligned} H_{1}: \quad \beta _{q,i}\ne 0 \quad \text {for some}\ i\ \text {and}\ q. \end{aligned}$$
(2.3)
3 Approach
Equation (
2.1) can be rewritten as follows:
where
\(\varvec{z}_{i,t}=(1,y_{i,t1},\ldots ,y_{i,tP})'\) and
\(\varvec{x}_{i,t}=(x_{i,t1},\ldots ,x_{i,tQ})'\) are column vectors of order
\(1+P\) and
Q, respectively, while
\(\varvec{\phi }_{i}=(\phi _{0,i},\ldots ,\phi _{P,i})'\) and
\(\varvec{\beta }_{i}=(\beta _{1,i},\ldots ,\beta _{Q,i})'\) denote the corresponding parameter vectors.
$$\begin{aligned} y_{i,t}=\varvec{z}_{i,t}'\varvec{\phi }_{i} +\varvec{x}_{i,t}'\varvec{\beta }_{i}+\varepsilon _{i,t}, \end{aligned}$$
(3.1)
Define
\(\varvec{y}_{i}=(y_{i,1},\ldots ,y_{i,T})'\) and
\(\varvec{\varepsilon }_{i}=(\varepsilon _{i,1},\ldots ,\varepsilon _{i,T})'\), both of which are column vectors of order
T, and let
\(\varvec{Z}_{i}=(\varvec{z}_{i,1},\ldots ,\varvec{z}_{i,T})'\) be a matrix of dimension
\(\left[ T \times (1+P) \right] \), and
\(\varvec{X}_{i}=(\varvec{x}_{i,1},\ldots ,\varvec{x}_{i,T})'\), a matrix of dimension
\(\left[ T \times Q \right] \). Equation (
3.1) can be expressed in vector form as
Observe that under the null hypothesis of Granger noncausality, the true coefficient vector of
\(\varvec{X}_{i}\) equals zero. Thus, assuming homogeneity in
\(\varvec{\beta }_{i}\), Eq. (
3.2) becomes
In what follows, we shall use the above model specification to estimate the common parameters
\(\varvec{\beta }\). In particular, we propose the following leastsquares (fixed effects type) estimator of
\(\varvec{\beta }\):
where
\(\varvec{M}_{\varvec{Z}_{i}}\) denotes a
\(\left[ T \times T \right] \) matrix that projects on the orthogonal complement of
\(\varvec{Z}_{i}\), i.e.
\(\varvec{M}_{\varvec{Z}_{i}} =\varvec{I}_{T}\varvec{Z}_{i}\left( \varvec{Z}_{i}^{\prime } \varvec{Z}_{i} \right) ^{1}\varvec{Z}_{i}^{\prime }\). The estimator in Eq. (
3.4) generalizes the standard FE estimator, as the latter imposes that
all slope coefficients are homogeneous, including the autoregressive parameters (see, e.g. Hahn and Kuersteiner
2002). Note that for this estimator to be well defined, a sufficient number of
\(\varvec{M}_{\varvec{Z}_{i}}\) matrices should be nonzero. As in that paper, we limit our attention to balanced panels, and so the necessary condition is
\(T>1+P\), which ensures that the coefficients
\(\varvec{\phi }_{i}\) are estimable.
$$\begin{aligned} \varvec{y}_{i}=\varvec{Z}_{i}\varvec{\phi }_{i} +\varvec{X}_{i}\varvec{\beta }_{i}+\varvec{\varepsilon }_{i}. \end{aligned}$$
(3.2)
$$\begin{aligned} \varvec{y}_{i}=\varvec{Z}_{i}\varvec{\phi }_{i} +\varvec{X}_{i}\varvec{\beta }+\varvec{\varepsilon }_{i}. \end{aligned}$$
(3.3)
$$\begin{aligned} \hat{\varvec{\beta }}=\left( \sum _{i=1}^{N}\varvec{X}_{i}' \varvec{M}_{\varvec{Z}_{i}} \varvec{X}_{i}\right) ^{1} \left( \sum _{i=1}^{N}\varvec{X}_{i}'\varvec{M}_{\varvec{Z}_{i}} \varvec{y}_{i}\right) , \end{aligned}$$
(3.4)
The model in (
2.1) belongs to a class of panel data models with nonadditive unobserved heterogeneity studied in FernándezVal and Lee (
2013). In particular, under Conditions 1–2 of that paper, which restrict
\(\varvec{q}_{i,t}=(y_{i,t},x_{i,t})'\) to be a strong mixing sequence, conditional on all timeinvariant effects, with at least
\(4+\delta \) moments (for some
\(\delta >0\)), the asymptotic distribution of
\(\hat{\varvec{\beta }}\) is readily available. Note that the aforementioned restriction rules out nonstationary and localtounity dynamics in
\(\varvec{y}_{i}\) and
\(\varvec{X}_{i}\).
In order to facilitate further discussion, we shall adapt the conclusions of Theorem 1 in FernándezVal and Lee (
2013) to the present setup:
Theorem 3.1
Under Conditions 1–2 (FernándezVal and Lee
2013) and given
\(N/T\rightarrow a^{2} \in [0;\infty )\) as
\(N,T\rightarrow \infty \) jointly:
$$\begin{aligned} \sqrt{NT}\left( \hat{\varvec{\beta }}\varvec{\beta }_{0}\right) \mathop {\rightarrow }\limits ^{d}\varvec{J}^{1}N\left( a \varvec{b},\varvec{V}\right) . \end{aligned}$$
(3.5)
The Hessian matrix
\(\varvec{J}\) in our case is given by:
while the exact form of
\(\varvec{V}\) and
\(\varvec{b}\) depends on the underlying assumptions of
\(\varepsilon _{i,t}\). For example, if
\(\varepsilon _{i,t}\) are independent and identically distributed (i.i.d.) over
i and
t, i.e.
\(\varepsilon _{i,t}\sim i.i.d.(0,\sigma ^{2})\), then
The vector
\(\varvec{b}\) captures the incidental parameter bias of the common parameter estimator, which is induced by estimation of
\(\varvec{\phi }_{1},\ldots ,\varvec{\phi }_{N}\). We will not elaborate on the exact form of this matrix, as it is not needed for the purposes of this paper.
^{4}
$$\begin{aligned} \varvec{J}=\mathrm{plim}_{N,T\rightarrow \infty }\frac{1}{NT} \sum _{i=1}^{N}\varvec{X}_{i}'\varvec{M}_{\varvec{Z}_{i}}\varvec{X}_{i}, \end{aligned}$$
(3.6)
$$\begin{aligned} \varvec{V}=\sigma ^{2} \varvec{J}. \end{aligned}$$
(3.7)
Although
\(\hat{\varvec{\beta }}\) is consistent, the asymptotic distribution of the estimator is not centered around zero under sequences where
N and
T grow at a similar rate. The presence of bias invalidates any asymptotic inference because the bias is of the same order as the variance (that is, unless
\(a=0\)). In particular, the use of
\(\hat{\varvec{\beta }}\) for Granger noncausality testing of
\(H_{0}: \varvec{\beta }_{0}=\varvec{0}_{Q}\) will not lead to a test with correct asymptotic size. As a result, the Wald test statistic:
converges to a noncentral
\(\chi ^{2}(Q)\) distribution under the null hypothesis even if
\(\varvec{J}\) and
\(\varvec{V}\) are assumed to be known.
$$\begin{aligned} W=NT \hat{\varvec{\beta }}'\left( \varvec{J}^{1} \varvec{V}\varvec{J}^{1}\right) ^{1}\hat{\varvec{\beta }}, \end{aligned}$$
(3.8)
The above discussion implies that
\(\hat{\varvec{\beta }}\) should not be used in the construction of the Wald test statistic (
3.8). Instead, we suggest the use of the same test statistic, but based on an alternative estimator that is free from the asymptotic bias term
\(a \varvec{b}\). Below, we shall focus on a biascorrected estimator constructed based on the Jackknife principle, using the Half Panel Jackknife (HPJ) procedure of Dhaene and Jochmans (
2015). Given a balanced panel with an even number of time series observations, the HPJ estimator is defined as
where
\(\hat{\varvec{\beta }}_{1/2}\) and
\(\hat{\varvec{\beta }}_{2/1}\) denote the FE estimators of
\(\varvec{\beta }\) based on the first
\(T_{1}=T/2\) observations, and the last
\(T_{2}=TT_{1}\) observations, respectively. The HPJ estimator can be decomposed into a sum of two terms:
where the second component implicitly estimates the bias term in (
3.5). The use of this estimator can be justified in our setting given that the bias of
\(\hat{\varvec{\beta }}\) is of order
\((T^{1})\) and thus satisfies the expansion requirement of Dhaene and Jochmans (
2015). Although there do exist alternative ways of splitting the panel to construct a biascorrected estimator, as shown in Dhaene and Jochmans (
2015), the HPJ estimator minimizes the higher order bias in the class of Split Panel Jackknife (SPJ), provided that the data are stationary. For this reason, we limit our attention to Eq. (
3.9).
$$\begin{aligned} \tilde{\varvec{\beta }}\equiv 2\hat{\varvec{\beta }} \frac{1}{2}\left( \hat{\varvec{\beta }}_{1/2} +\hat{\varvec{\beta }}_{2/1}\right) , \end{aligned}$$
(3.9)
$$\begin{aligned} \tilde{\varvec{\beta }}=\hat{\varvec{\beta }}+\left( \hat{\varvec{\beta }} \frac{1}{2} \left( \hat{\varvec{\beta }}_{1/2}+\hat{\varvec{\beta }}_{2/1}\right) \right) =\hat{\varvec{\beta }}+T^{1}\hat{\varvec{b}}, \end{aligned}$$
(3.10)
Corollary 3.1
Under Conditions 1–2 of FernándezVal and Lee (
2013) and given
\(N/T\rightarrow a^{2}\in [0;\infty )\) as
\(N,T\rightarrow \infty \) jointly:
where, assuming
\(\varepsilon _{i,t}\sim i.i.d.(0,\sigma ^{2})\),
$$\begin{aligned} \hat{W}_{HPJ}=NT \tilde{\varvec{\beta }}'\left( \hat{\varvec{J}}^{1} \hat{\varvec{V}} \hat{\varvec{J}}^{1}\right) ^{1} \tilde{\varvec{\beta }}\mathop {\rightarrow }\limits ^{d}\chi ^{2}(Q), \end{aligned}$$
(3.11)
$$\begin{aligned} \hat{\varvec{J}}&=\frac{1}{NT}\sum _{i=1}^{N}\varvec{X}_{i}' \varvec{M}_{\varvec{Z}_{i}}\varvec{X}_{i}\\ \hat{\varvec{V}}&=\hat{\sigma }^{2}\hat{\varvec{J}}\\ {\hat{\sigma }}^{2}&=\frac{1}{N(T1P)Q}\sum _{i=1}^{N}\left( \varvec{y}_{i} \varvec{X}_{i}\hat{\varvec{\beta }}\right) ' \varvec{M}_{\varvec{Z}_{i}}\left( \varvec{y}_{i} \varvec{X}_{i}\hat{\varvec{\beta }}\right) . \end{aligned}$$
The proof of this corollary follows from the corresponding results in FernándezVal and Lee (
2013) and Dhaene and Jochmans (
2015). The formula for
\(\hat{\varvec{V}}\) can be easily modified to allow for heteroskedasticity in both crosssectional and timeseries dimensions, based, e.g. on the clusteredcovariance matrix estimator of Arellano (
1987). For instance, crosssectional heteroskedasticity can be accommodated by setting
where
\(\hat{\varvec{\varepsilon }}_{i}=\varvec{y}_{i} \varvec{X}_{i}\hat{\varvec{\beta }}\). Given the recent results in Chudik et al. (
2018), we conjecture that for the HPJ approach to work it is only necessary to assume
\(N/T^{3}\rightarrow 0\).
$$\begin{aligned} \hat{\varvec{V}}=\frac{1}{N(T1P)Q}\sum _{i=1}^{N} \varvec{X}_{i}'\varvec{M}_{\varvec{Z}_{i}} \hat{\varvec{\varepsilon }}_{i}\hat{\varvec{\varepsilon }}_{i}' \varvec{M}_{\varvec{Z}_{i}}\varvec{X}_{i}, \end{aligned}$$
(3.12)
Remark 3.1
An alternative homogeneous estimator is available by taking into account the fact that under the null hypothesis, not only
\(\varvec{\beta }_{i}=\varvec{\beta }\) for all
i but also
\(\beta _{1}=\beta _{2}=\ldots =\beta _{Q}=0\). Therefore, letting
\(\varvec{x}_{i,1}=(x_{i,0},\ldots ,x_{i,T1})'\), one can also consider the following restricted fixed effects type estimator:
This estimator is attractive because, under the null hypothesis, it does not require specifying a value for
Q. However, the resulting Wald test statistic is expected to have lower power compared to that in Eq. (
3.11).
$$\begin{aligned} \hat{\beta }_{1}=\left( \sum _{i=1}^{N}\varvec{x}_{i,1}' \varvec{M}_{\varvec{Z}_{i}} \varvec{x}_{i,1}\right) ^{1} \left( \sum _{i=1}^{N}\varvec{x}_{i,1}'\varvec{M}_{\varvec{Z}_{i}} \varvec{y}_{i}\right) . \end{aligned}$$
(3.13)
Remark 3.2
Jackknife is by no means the only approach that corrects the incidental parameters bias of the FE estimator. Alternatively, one can consider an analytical biascorrection, as in Hahn and Kuersteiner (
2002) and FernándezVal and Lee (
2013). However, the analytical approach has several practical limitations such as the need to specify a kernel function and the corresponding bandwidth. In this respect, the HPJ approach of Dhaene and Jochmans (
2015) has some clear advantages.
4 Monte Carlo simulation
4.1 Design
To illustrate the performance of the new testing procedure, we adapt the Monte Carlo setup of Binder et al. (
2005) and Juodis (
2018). In particular, we assume that the bivariate vector
\(\varvec{y}_{i,t} =(y_{i,t},x_{i,t})'\) is subject to the following VAR(1) process:
for all
\(i=1,\ldots ,N,\) and
\(t=1,\ldots ,T\). The vector
\(\varvec{y}_{i,t}\) is assumed to be initialized in a distant past, in particular we set
\(\varvec{y}_{i,50} =\varvec{0}_{2}\) and discard the first 50 observations in estimation.
$$\begin{aligned} \varvec{y}_{i,t}={\varvec{\varPhi }}_{i}\varvec{y}_{i,t1} +\varvec{\varepsilon }_{i,t}; \quad \varvec{\varepsilon }_{i,t} \sim N(\varvec{0}_{2},\varvec{\varSigma }), \end{aligned}$$
(4.1)
In order to simplify parametrization, our baseline setup specifies that some of the design matrices are common for all
i. In particular, we adopt Design 2 of Juodis (
2018) for the error variance matrix, setting
Matrix
\({\varvec{\varPhi }}_{i}\) is set equal to
where in the homogeneous case we impose
\(\alpha _{i}=\alpha =0.4\) while in the heterogeneous case
\(\alpha _{i}=\alpha +\xi _{i}^{(y)} =0.4+\xi _{i}^{(y)}\),
\(\xi _{i}^{(y)} \sim i.i.d.U \left[ .15,.15\right] \).
\(\rho \) alternates such that
\(\rho =\{0.4;0.8\}\). This parameter controls the degree of persistence in
\(x_{i,t}\), which can be either moderate (
\(\rho =0.4\)) or high (
\(\rho =0.8\)).
$$\begin{aligned} \varvec{\varSigma } \equiv \left( \begin{array}{ll} \sigma _{\varepsilon _{y}}^{2} &{}\quad \sigma _{\varepsilon _{y,x}} \\ \sigma _{\varepsilon _{y,x}} &{}\quad \sigma _{\varepsilon _{x}}^{2} \\ \end{array}\right) =\left( \begin{array}{ll} 0.07 &{}\quad 0.05 \\ 0.05 &{}\quad 0.07 \\ \end{array}\right) . \end{aligned}$$
(4.2)
$$\begin{aligned} {\varvec{\varPhi }}_{i}=\left( \begin{array}{ll} \alpha _{i} &{}\quad \beta _{i} \\ 0.5 &{}\quad \rho \\ \end{array}\right) , \end{aligned}$$
(4.3)
The main parameter of interest is
\(\beta _{i}\). For
\(\beta _{i}=0\), the
\({\varvec{\varPhi }}_{i}\) matrix is lower triangular so that
\(x_{i,t}\) does not Grangercause
\(y_{i,t}\). In this case, the empirical rejection rate corresponds to the size of the test. On the other hand, for
\(\beta _{i}\ne 0\), the empirical rejection rate reflects power. In order to cover a broad range of possible alternative hypotheses, we consider the following schemes:
The homogeneous design covers the classical pooled setup of HoltzEakin et al. (
1988). On the other hand, heterogeneity introduced in the second design is qualitatively closer to Dumitrescu and Hurlin (
2012). Note that in the heterogeneous case
\(\mathrm{E}[\beta _{i}]=\beta \).
1.
(Homogeneous).
\(\beta _{i}=\beta \) for all
i.
\(\beta =\{0.00;0.02;0.03;0.05\}\).
2.
(Heterogeneous).
\(\beta _{i}=\beta +\xi _{i}^{(x)}\),
\(\xi _{i}^{(x)} \sim i.i.d.U\left[ 0.1;0.1\right] \), where
\(\beta \) is as in the homogeneous case.
Given that the procedure of Dumitrescu and Hurlin (
2012) is primarily used in mediumsize macropanels, we focus on combinations of (
N,
T) that better reflect these applications. In particular, we limit our attention to the following 9 combinations:
We consider the following test statistics:
Inference is conducted at the
\(5\%\) level of significance. The total number of Monte Carlo replications is set to 5, 000. Sizeadjusted power is reported.
$$\begin{aligned} N=\{50;100;200\};\quad T=\{20;50;100\}. \end{aligned}$$
(4.4)

“HPJ”—the proposed pooled Wald test statistic in Eq. ( 3.11), which is based on the HPJ biascorrected estimator.
In an alternative setup, we also consider heteroskedastic innovations, where the topdiagonal entry of the variance–covariance matrix
\(\varvec{\varSigma }\) in Eq. (
4.2),
\(\sigma _{\varepsilon _{y}}^{2}\), is scaled by
\(\xi _{i}^{(\varepsilon )} \sim i.i.d.U\left[ 0,2\right] \), such that
\(\mathrm{E}\left[ \sigma _{\varepsilon _{y},i}^{2}\right] =\sigma _{\varepsilon _{y}}^{2} \mathrm{E}\left[ \xi _{i}^{(\varepsilon )} \right] =0.07\).
4.2 Results
This section provides a brief summary of the simulation results, which are reported in Tables
3,
4,
5 and
6 in Appendix A. In specific,
In summary, the above results suggest that HPJ has good finite sample properties even in panels with a moderate time dimension. In contrast, DHT can suffer from substantial size distortions, especially when
\(T<<N\). Moreover, in terms of power, HPJ dominates DHT, especially so in panels where
N and
T are both large.
^{6}

( size) when the degree of persistence in \(x_{i,t}\) is moderate, such that \(\rho =0.4\), both HPJ and DHT tests perform similarly. In particular, empirical size is fairly close to its nominal value in most circumstances, with some size distortions observed when \(T<< N\), especially for DHT. On the other hand, for \(\rho =0.8\), the performance of both tests deteriorates. This is particularly so for DHT, where in 8 out of 18 cases size exceeds 20%. In fact, for the case where \(N=200\) & \(T=20\) size is over 50%. On the other hand, HPJ appears to be more reliable and size remains below 15% under all circumstances.

( power) for \(\rho =0.4\) HPJ dominates DHT almost uniformly in terms of power. Similar conclusions can be drawn for \(\rho =0.8\). Note that on average, for any fixed value of N, power increases with T at a higher rate for HPJ than DHT, which reflects the \(\sqrt{NT}\) convergence rate of the biascorrected leastsquares estimator employed by the HPJ test.

( homogeneous vs heterogeneous models) The performance of the tests in the heterogeneous model is similar to the homogeneous one in terms of both size and power.

( homoskedasticity vs heteroskedasticity) The results are similar in terms of both size and power under homoskedasticity and heteroskedasticity. This implies that heteroskedasticity does not distort the performance of the tests, once appropriately accounted for.
5 Illustration: Granger causality evidence on bank profitability and efficiency
We perform Granger noncausality tests in order to examine the sign and the type of temporal relation between banks profitability and cost efficiency. We employ panel data from a random sample of 350 U.S. banking institutions, each one observed over 56 time periods, namely 2006:Q12019:Q4. This data set has also been used by Cui et al. (
2020), albeit in a different context related to the estimation of a spatial dynamic panel model with common factors. The data are publicly available, and they have been downloaded from the Federal Deposit Insurance Corporation (FDIC) website.
^{7}
5.1 Data and model specification
We consider the following specification:
for
\(i=1,\ldots ,N\) and
\( t=1,\ldots ,T\), where
y denotes
profitability, which is proxied by the return on assets (ROA), defined as annualized net income after taxes expressed as a percentage of average total assets, and
x denotes the timevarying operational
cost efficiency of bank
i at period
t, to be defined shortly. The parameters of the model above are described in Sect.
2. For the purposes of the present illustration, we shall focus on the unidirectional link (oneway causation) from cost efficiency to profitability. In addition, we shall impose
\(P=Q\).
$$\begin{aligned} y_{i,t}=\phi _{0,i}+\sum _{p=1}^{P}\phi _{p,i}y_{i,tp} +\sum _{q=1}^{Q}\beta _{q,i} x_{i,tq}+\varepsilon _{i,t}, \end{aligned}$$
(5.1)
A measure of cost efficiency has been constructed based on a cost frontier model using a translog functional form, two outputs and three inputs. In particular, following Altunbas et al. (
2007), we specify
where
TC represents total cost, while
\(Y_{1}\) and
\(Y_{2}\) denote two outputs, net loans and securities, respectively;
\(Y_{1}\) is defined as gross loans minus reserves for loan loss provision.
\(Y_{2}\) is the sum of securities held to maturity and securities held for sale.
\(P_{1}\),
\(P_{2}\) and
\(P_{3}\) denote three input prices, namely the price of capital, price of labour and price of loanable funds. The model above is estimated using twoway fixed effects regression. The bankspecific, timevarying operational inefficiency component is captured by the sum of the two fixed effects, i.e.
\(\eta _{i} + \tau _{t}\). Subsequently, cost efficiency,
\(x_{i,t}\) is computed as follows:
which ensures that larger scores imply higher cost efficiency such that the most efficient bank scores one.
$$\begin{aligned} \mathrm{ln} TC_{i,t}&= \sum _{h=1}^{3} \gamma _{h} \mathrm{ln} P_{h,i,t} + \sum _{h=1}^{2} \delta _{h} \mathrm{ln} Y_{h,i,t} + 0.5 \sum _{m=1}^{2}\sum _{n=1}^{2} \mu _{mn} \mathrm{ln} Y_{m,i,t} \mathrm{ln} Y_{n,i,t}\nonumber \\&\quad + \sum _{m=1}^{3}\sum _{n=1}^{3} \pi _{mn} \mathrm{ln} P_{m,i,t} \mathrm{ln} P_{n,i,t} +\sum _{m=1}^{2}\sum _{n=1}^{3} \xi _{mn} \mathrm{ln} Y_{m,i,t} \mathrm{ln} P_{n,i,t} +\eta _{i} + \tau _{t} + \upsilon _{it}, \end{aligned}$$
(5.2)
$$\begin{aligned} x_{i,t}= e^{ \mathrm{min}\{\hat{\eta }_{i} + \hat{\tau }_{t}\}_{i,t} (\hat{\eta }_{i} + \hat{\tau }_{t}) }, \end{aligned}$$
(5.3)
We initially test for Granger noncausality using Eq. (
5.1) based on the entire sample, i.e. all 350 banks during 2006:Q12019:Q4. Subsequently, we split banks into two groups based on their average size, which is proxied by the natural logarithm of banks total assets. The grouping of banks is performed using a kmeans algorithm, as advocated, e.g. in Lin and Ng (
2012) and Sarafidis and Weber (
2015). In addition, we distinguish between two subperiods, namely “Basel II” (2006:Q12010:Q4) and a period under the DoddFrank Act “DFA” (2011:Q12019:Q4). Basel II represents the second of the Basel Accords and constitutes recommendations on banking laws and regulations issued by the Basel Committee on Banking Supervision (BCBS).
^{8} The DFA is a federal law enacted towards the end of 2010, aiming “to promote the financial stability of the United States by improving accountability and transparency in the financial system, to end “too big to fail”, to protect the American taxpayer by ending bailouts, to protect consumers from abusive financial services practices, and for other purposes”.
^{9} In a nutshell, the DFA has instituted a new failureresolution regime, which seeks to ensure that losses resulting from bad decisions by managers are absorbed by equity and debt holders, thus potentially reducing moral hazard.
5.2 Results
Table
1 reports summary statistics for the two groups of banks in terms of their size, proxied by the natural logarithm of the average value (over time) of total assets.
Table 1
Summary statistics for bank size
Mean

SD

Min

Max



Small banks

11.31

.599

9.71

12.29

Large banks

13.28

1.07

12.31

18.89

Table
2 reports results for the Wald test statistic and its
p value for the null hypothesis
\({H_{0}}: {\beta _{q,i}=0} \quad \text {for all}\,\, {i} \text { and } q\). We also report the estimated number of lags employed,
\(\hat{P}\), which is obtained using BIC,
^{10} as well as estimates for the pooled estimator (standard errors in parentheses) of the Grangercausation parameters, defined in Eq. (
3.9) and denoted as
\(\hat{\beta }\). When
\(\hat{P}=1\),
\(\hat{\beta }=\hat{\beta }_{1}\) in Eq. (
5.1), whereas for
\(\hat{P}>1\) we report the sum of the estimates of
\(\beta _{q}\),
\(q=1,\dots ,\hat{P}\), i.e.
\(\hat{\beta }=\sum _{q=1}^{\hat{P}} \hat{\beta }_{q}\). The variance–covariance matrix of the pooled estimator,
\(\widehat{\varvec{V}}\), is computed as in Eq. (
3.12), i.e. it accommodates crosssectional heteroskedasticity. For the purposes of comparison, we also report the meangroup estimator of the Grangercausation parameters,
\(\hat{\beta }_{MG}\), computed using the sample mean (across
i) of the corresponding individualspecific regression estimates.
Table 2
Results for the HPJbased Wald test approach
Full

Basel II

DFA



All banks


Waldstat.

19.67

7.69

10.22

p value

[.000]

[.006]

[.001]

\(\widehat{P}\)

1

1

1

\(\hat{\beta }\)

.266 (.038)

.476 (.047)

.186 (.031)

\(\hat{\beta }_{MG}\)

.349 (.082)

.284 (.161)

.371 (.097)

N

350

350

350

T

56

20

36

Small banks


Waldstat.

12.2

7.32

10.74

p value

[.000]

[.007]

[.001]

\(\widehat{P}\)

1

1

1

\(\hat{\beta }\)

.244 (.045)

.575 (.059)

.189 (.031)

\(\hat{\beta }_{MG}\)

.338 (.099)

.525 (.208)

.302 (.116)

N

211

211

211

T

56

20

36

Large banks


Waldstat.

9.13

.436

12.65

p value

[.003]

[.509]

[.000]

\(\widehat{P}\)

1

1

2

\(\hat{\beta }\)

.346 (.025)

.132 (.019)

.423 (.036)

\(\hat{\beta }_{MG}\)

.366 (.142)

− .082 (.252)

.477 (.168)

N

139

139

139

T

56

20

36

The top panel corresponds to the entire sample of 350 banks. Column “Full” reports results for the entire period of the sample, i.e. 2006:Q12019:Q4. Columns “Basel II” and “DFA” present results for two different subperiods, namely 2006:Q12010:Q4 and 2011:Q12019:Q4, respectively. The middle panel contains results for “smallsized” banks, followed by “largesized” banks at the bottom panel.
As we can see, in almost all cases the null hypothesis is rejected at the
\(1\%\) level of significance, which implies that cost efficiency Grangercauses profitability, i.e. past values of
x contain information that helps to predict
y over and above the information contained in past values of
y. The only exception occurs when it comes to large banks during Basel II, where the null hypothesis is not rejected, with a
p value approximately equal to 0.509. This result is important because it signifies potential moral hazardtype behaviour prior to the introduction of the DFA; such outcome is consistent with findings in the existing literature, such as those of Cui et al. (
2020) and Zhu et al. (
2020). However, following the introduction of DFA, the null of Granger noncausality is rejected for large banks as well.
In regards to the remaining quantities, in most cases
\(\widehat{P}=1\), i.e. the optimal lagged value of
x and
y equals unity except for large banks during DFA, where
\(\widehat{P}=2\). As expected, the Grangercausation parameters are statistically significant at the
\(5\%\) level, except for
\(\hat{\beta }_{MG}\) when the null hypothesis of Granger noncausality is not rejected.
We have also run Granger noncausality tests based on the method of Dumitrescu and Hurlin (
2012) (the “DHT” test statistic) using the Stata algorithm developed by Lopez and Weber (
2017).
^{11} The results are identical when it comes to lag model selection using BIC. However, as it turns out, this time the null hypothesis of Granger noncausality is rejected in
all cases, including for the sample of large banks during the subperiod under Basel II. In particular, in this case the DHT statistic equals 2.58 with a
p value of 0.0099. Given that the result is marginal at the
\(1\%\) level of significance, and taking into account the potentially substantial size distortions observed in the simulations for the DHT test when
\(T=20\), one is inclined to trust the outcome of the HPJbased Wald test reported in Table
2.
6 Conclusions
This paper considers the problem of Granger noncausality testing in panels with large crosssectional and time series dimensions. First, we put forward a pooled fixed effects type estimator for the Grangercausation parameters, which makes use of the fact that, under the null hypothesis, these parameters are all equal to zero and, thus, they are homogeneous. Pooling over cross sections guarantees that the estimator has a
\(\sqrt{NT}\) convergence rate. In order to account for the wellknown “Nickell bias”, we make use of the Split Panel Jackknife procedure of Dhaene and Jochmans (
2015). Subsequently, a Wald test is proposed, which is based on the biascorrected fixed effects type estimator. The resulting approach is valid irrespective of whether the alternative hypothesis is homogeneous or heterogeneous, or whether the autoregressive parameters vary across individuals or not, so long as
T is (at least moderately) large.
The statistical model considered in this paper rules out any forms of the crosssectional dependence in
\(\varepsilon _{i,t}\). This restriction can be easily relaxed if one is willing to assume that crosssectional dependence is strong, generated by an unobserved factor component,
\(\varvec{\lambda }_{i}'\varvec{f}_{t}\). In particular, in this case one can use either the Common Correlated Effects (CCE) approach of Pesaran (
2006)/Chudik and Pesaran (
2015) combined with HPJ as in Juodis et al. (
2020), or the PC estimator of Bai (
2009)/Ando and Bai (
2017). In these setups, the HPJbased statistic provides a natural starting point, as the finite
T corrections proposed by Dumitrescu and Hurlin (
2012) are not feasible. In panels with homogeneous autoregressive parameters and
T fixed, one can employ the GMM framework of Robertson and Sarafidis (
2015) and the linear GMM estimator of Juodis and Sarafidis (
2020).
^{12} We leave these avenues for future research.
Acknowledgements
We are delighted to contribute this paper to a special issue in honour of Professor Badi Baltagi, who has made an enormous contribution to the field of econometrics. We are grateful to the Guest Editors and all referees involved. We would like to thank participants at the IPDC2019 conference (Vilnius) for useful comments and suggestions. Part of this research project was conducted while the first author was visiting the Applied Macroeconomic Research Division (TMTS) at the Bank of Lithuania (BoL). The views expressed in this paper are those of authors and do not necessarily represent the official views of the Bank of Lithuania or the Eurosystem. Financial support from the Netherlands Organization for Scientific Research (NWO) is gratefully acknowledged by Juodis. Sarafidis gratefully acknowledges financial support from the Australian Research Council, under research Grant No. DP170103135.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflicts of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
http://creativecommons.org/licenses/by/4.0/.
Appendix A: Monte Carlo Results
Table 3
Empirical rejection rates
N

T

\(\rho =0.4\)

\(\rho =0.8\)



\(\beta =0\)

\(\beta =.02\)

\(\beta =.03\)

\(\beta =.05\)

\(\beta =0\)

\(\beta =.02\)

\(\beta =.03\)

\(\beta =.05\)


HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT


50

20

9.1

8.8

9.8

8.7

14.0

10.4

30.3

17.4

14.4

21.5

7.6

9.6

10.7

12.1

22.6

18.0

50

7.1

6.7

12.9

9.7

26.8

14.7

62.7

30.4

9.5

10.5

13.5

10.3

25.5

17.9

61.7

37.2


100

5.7

4.7

27.4

13.5

53.9

20.9

91.1

52.1

7.9

8.3

29.4

14.6

58.8

25.3

94.3

63.2


100

20

11.3

12.6

13.3

8.1

19.8

9.4

46.3

19.4

14.1

35.9

11.7

8.2

17.3

11.5

40.6

22.1

50

7.1

6.6

25.6

13.2

52.2

22.2

91.0

48.4

9.5

15.3

22.0

13.0

48.3

24.3

91.1

54.9


100

5.9

4.8

48.3

17.3

81.1

33.0

99.5

75.9

6.7

7.9

55.6

21.2

88.6

44.1

99.8

89.7


200

20

10.9

15.1

21.5

11.8

40.2

18.2

77.7

37.0

14.3

55.5

19.2

10.6

35.0

16.4

69.2

35.7

50

5.8

8.3

46.1

16.9

79.8

31.2

99.8

72.5

9.6

22.1

38.4

20.6

77.0

38.9

99.8

82.1


100

5.1

6.3

77.8

22.9

98.0

47.2

100

95.4

7.0

11.4

84.4

32.9

99.3

63.5

100

99.3

Table 4
Empirical rejection rates
N

T

\(\rho =0.4\)

\(\rho =0.8\)



\(\beta =0\)

\(E\left[ \beta _{i}\right] =.02\)

\(E\left[ \beta _{i}\right] =.03\)

\(E\left[ \beta _{i}\right] =.05\)

\(\beta =0\)

\(E\left[ \beta _{i}\right] =.02\)

\(E\left[ \beta _{i}\right] =.03\)

\(E\left[ \beta _{i}\right] =.05\)


HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT


50

20

8.7

8.9

10.3

8.0

16.6

9.8

31.3

16.7

13.7

22.5

8.7

8.7

14.0

10.2

23.9

18.3

50

6.6

6.4

14.4

10.3

30.0

15.3

63.3

29.0

8.5

11.9

13.3

10.0

30.0

17.8

62.5

35.2


100

5.3

5.7

27.9

11.5

54.9

19.1

93.2

52.3

6.9

7.4

29.5

15.8

62.5

28.8

95.8

68.4


100

20

10.6

11.1

11.9

9.5

22.2

13.7

47.0

23.8

14.7

36.8

9.2

9.0

18.3

13.8

37.7

22.0

50

6.8

7.5

25.7

11.3

48.1

18.7

91.2

44.7

9.1

15.5

24.1

14.9

48.0

25.3

90.8

58.4


100

5.5

5.7

49.6

18.6

81.7

30.8

100

72.9

6.6

8.3

58.5

26.7

89.1

46.8

100

90.5


200

20

10.3

13.8

24.4

15.1

41.4

21.2

78.9

41.1

14.9

56.4

19.5

15.9

32.3

23.4

69.5

44.2

50

5.8

9.4

49.2

18.1

80.1

30.8

99.6

71.4

11.1

22.5

41.6

23.1

75.4

41.3

99.8

84.2


100

6.5

6.8

75.1

24.2

98.1

48.9

100

94.7

8.4

11.0

83.9

35.3

99.5

66.0

100

99.1

Table 5
Empirical rejection rates
N

T

\(\rho =0.4\)

\(\rho =0.8\)



\(\beta =0\)

\(\beta =.02\)

\(\beta =.03\)

\(\beta =.05\)

\(\beta =0\)

\(\beta =.02\)

\(\beta =.03\)

\(\beta =.05\)


HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT


50

20

8.7

9.4

9.6

8.4

14.3

13.4

29.2

25.0

11.2

24.4

8.6

6.8

12.8

10.3

22.8

21.3

50

8.0

7.5

12.0

12.9

25.6

23.1

54.7

52.2

9.7

12.3

11.3

11.5

23.1

22.8

55.1

55.6


100

6.2

5.8

23.7

20.2

45.7

39.2

88.3

78.1

7.9

9.1

30.2

24.4

57.5

47.1

93.3

88.4


100

20

9.3

10.9

13.6

11.7

21.1

18.8

42.0

41.0

11.6

36.9

9.7

10.0

17.1

13.5

35.8

32.8

50

6.2

6.4

23.4

20.3

46.5

40.3

87.8

80.1

8.8

15.1

21.1

17.9

42.4

36.6

86.4

82.2


100

5.5

5.6

45.2

28.3

77.0

59.7

99.3

97.0

5.8

8.2

55.6

40.6

87.0

73.6

99.9

99.6


200

20

9.8

15.2

19.6

15.2

37.5

28.4

71.9

63.9

11.6

57.6

17.8

12.2

31.4

23.9

65.3

53.9

50

6.2

9.2

42.1

28.6

73.9

54.3

99.1

95.2

7.5

24.1

39.5

29.1

73.7

59.2

99.2

96.8


100

5.2

4.6

73.6

51.4

96.8

86.0

100

100

7.4

11.3

80.4

58.1

98.7

92.8

100

100

Table 6
Empirical rejection rates
N

T

\(\rho =0.4\)

\(\rho =0.8\)



\(\beta =0\)

\(E\left[ \beta _{i}\right] =.02\)

\(E\left[ \beta _{i}\right] =.03\)

\(E\left[ \beta _{i}\right] =.05\)

\(\beta =0\)

\(E\left[ \beta _{i}\right] =.02\)

\(E\left[ \beta _{i}\right] =.03\)

\(E\left[ \beta _{i}\right] =.05\)


HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT

HPJ

DHT


50

20

9.6

8.8

6.4

10.4

12.1

14.1

23.8

25.0

13.4

23.3

8.7

10.8

12.6

15.2

25.3

23.5

50

7.7

7.5

13.3

11.8

23.2

21.0

55.7

51.5

9.1

10.8

12.9

15.7

24.0

26.7

59.8

61.9


100

7.4

4.5

21.3

19.1

44.1

39.8

85.2

83.0

7.5

7.0

28.2

25.4

56.4

49.3

93.3

91.2


100

20

9.7

11.7

12.2

10.6

21.9

16.8

42.6

35.6

11.5

36.4

11.0

8.8

19.9

13.3

39.5

30.0

50

6.7

7.2

22.7

19.2

46.1

33.6

85.7

77.0

8.5

15.4

22.7

20.0

46.7

39.1

86.9

82.8


100

6.6

6.1

43.6

27.0

75.4

58.1

99.4

97.3

7.7

10.0

54.7

37.5

86.0

72.7

100.0

99.5


200

20

8.5

15.7

22.2

16.8

37.4

25.4

75.2

58.4

10.8

58.3

18.9

11.9

32.2

19.6

67.5

50.6

50

6.3

7.2

41.8

30.2

74.9

59.3

99.1

96.7

8.4

24.3

39.8

31.4

75.3

60.1

99.0

98.2


100

6.9

5.8

67.8

45.8

96.1

82.7

100.0

100.0

7.7

12.2

79.6

61.3

99.0

94.5

100.0

100.0

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Footnotes
1
For panels with a fixed
T dimension, and under normality of the innovations, Dumitrescu and Hurlin (
2012) propose centering their test statistic using moments of an appropriate F distribution rather than
\(\chi ^{2}\). However, the modified statistic is not standard normal for fixed
T (even under normality of the innovations) because the suggested approximation assumes that regressors are strictly exogenous.
2
Since the model above is observed over
T time periods, it is implicitly assumed that
\(y_{i,P+1}, y_{i,P+2}, \dots , y_{i,0}\) are observed, and so are
\(x_{i,Q+1}, x_{i,Q+2}, \dots , x_{i,0}\).
3
Also, to save space, we do not provide an exposition for how to test bidirectional causality, which can take place in a similar manner by expressing
x as a function of own lags and lagged values of
y.
5
The authors also propose an alternative Wald test statistic that is not centered. However, in the present setup we prefer using DHT because it provides better size control.
6
In further simulations, we have studied cases where both
y and
x are drawn based on a VAR(2) process with either homogeneous or heterogeneous coefficients. The results are similar to those already reported here, and so we refrain from discussing these further.
10
To ensure BIC is consistent under both under null and alternative hypotheses, we estimate
P under the alternative, thus allowing for heterogeneity of the Granger causation parameters.