Skip to main content
Top

Open Access 24-04-2025

A selective review of panel approaches to construct counterfactuals

Author: Cheng Hsiao

Published in: Empirical Economics

Activate our intelligent search to find suitable subject content or patents.

search-config
download
DOWNLOAD
print
PRINT
insite
SEARCH
loading …

Abstract

The article presents a selective review of panel data approaches to measure treatment effects, focusing on the construction of counterfactuals. It begins by outlining the fundamental issues in measuring treatment effects using panel data, where the observed data for each unit at a given time period is either under treatment or not, but not both. This necessitates the prediction of missing outcomes to estimate treatment effects accurately. The text explores both causal and non-causal approaches, highlighting the advantages and limitations of each method. The causal approach assumes that observed outcomes can be decomposed into observed covariates and the impact of unobserved factors, allowing for the estimation of average treatment effects. However, this approach requires assumptions about the data-generating process, which may not always be valid. The non-causal approach, on the other hand, focuses on generating accurate predictions without assuming a specific data-generating process. It considers data-driven modeling approaches, such as factor modeling and linear projection, to generate counterfactuals. The article also discusses the practical implications of these methods, including the advantages of panel data in controlling for selection bias and identifying heterogeneous treatment effects. It concludes by arguing that the emphasis on measurement of treatment effects is fundamentally a prediction issue, and that data-driven approaches can yield more accurate predictions regardless of the sample configuration. The article provides a comprehensive review of the existing literature and offers insights into the application of these methods in real-world scenarios.
Notes
I would like to thank the three referees for their careful reading of the manuscript and helpful comments.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

We consider issues of measuring treatment effects using panel data. We assume there are N cross-sectional units, each observed over T time periods. The treatment effects for the ith unit at time t are typically measured as the difference between the outcomes under the treatment, \(y^1_{it}\), and the outcomes in the absence of the treatment, \(y^0_{it}\), are given by
$$\begin{aligned} \Delta _{it} =y^1_{it} - y^0_{it}. \end{aligned}$$
(1.1)
However, the observed data for the ith individual at time t take the form
$$\begin{aligned} y_{it} =d_{it} y^1_{it} +(1-d_{it}) y^0_{it}, \ i=1, \ldots , N; t=1, \ldots , T, \end{aligned}$$
(1.2)
where \(d_{it}\) denotes the treatment status dummy with \(d_{it}=1\) if the ith individual at time t is under the treatment and 0 if not. That is, the observed data are either \(y^1_{it} \) or \(y^0_{it} \), not simultaneously. To provide an estimate of the treatment effect, \(\Delta _{it}\), one needs to substitute the missing \(y^1_{it} \) or \(y^0_{it} \) by its predicted value.
Assuming \(y^1_{it}\) is observed but not \(y^0_{it}\), then the estimated treatment effects for the ith unit at time t are
$$\begin{aligned} \hat{\Delta }_{it}= y^1_{it}-\hat{y}^0_{it}, \end{aligned}$$
(1.3)
where \(\hat{y}^0_{it}\) denotes the predicted value (or the counterfactuals) of \(y^0_{it}\). Conditional on observed \(y^1_{it}\), the bias and the variance of estimated \(\hat{\Delta }_{it}\),
$$\begin{aligned} & E( \hat{\Delta }_{it } |y^1_{it})= [E (y^1_{it } - \hat{y}^0_{it} | y^1_{it})] = E [(y^1_{it}-y^0_{it}) + (y^0_{it}-\hat{y}^0_{it}) | y^1_{it}]\nonumber \\ & \quad \quad \quad \quad =E(\Delta _{it}| y^1_{it})+ E(y^0_{it}-\hat{y}^0_{it}), \end{aligned}$$
(1.4)
and
$$\begin{aligned} \text {Var} (\hat{\Delta }_{it}|y^1_{it})= E\left[ (y^0_{it}-\hat{y}^0_{it})^2 \right] . \end{aligned}$$
(1.5)
In other words, the bias and variance of \(\hat{\Delta }_{it}\) conditional on \(y^1_{it}\) only depend on the bias and error variance of the prediction error of \(\hat{y}^0_{it}\) (or \(\hat{y}^1_{it}\) if \(y^0_{i0}\) is observed). That is, to obtain accurate measurement of \(\Delta _{it}\) is fundamentally an issue of getting good prediction of \(y^0_{it}\). However, there is no realized \(y^0_{it}\) to evaluate how close \(\hat{y}^0_{it}\) is to \(y^0_{it}\). Any \(\hat{y}^0_{it}\) is a counterfactual. There is no way to say one method to generate \(\hat{y}^0_{it}\) is better than the other method by comparing the difference between \(y_{it}\) and \(\hat{y}^0_{it}\). Therefore, any preference for a particular method to generate counterfactuals must be based on the compatibility of the underlying assumptions with the data-generating process of observed \(\hat{y}^0_{i0}\) and the predictive accuracy of the method is conditional on the data-generating process of \(y^0_{it}\).
Panel data sets provide the possibility to simultaneously capturing inter-individual differences and intra-individual dynamics. Compared to the cross-sectional \((T=1)\) or time series \((N=1)\) data sets, panel data possess several advantages:
1.
It provides better possibility to control issues arising from observed data subject to selection on observables and/or selection on unobservables in the estimation of average treatment effects (ATE) with less restrictive assumptions (e.g., Heckman and Vytlacil 2001, 2007; Hsiao 2022).
 
2.
Information on individual’s response to policy changes provides the possibility to identify if the differences in individual treatment effects can be considered as due to chance events (i.e., homogeneous) or due to some fundamental differences (i.e., heterogeneous), thus whether it makes sense to consider the estimation of average treatment effects (ATE) (e.g., Hsiao 2022).
 
3.
Information across individuals over time not only provides the possibility of examining if there are ’treatment effects,’ but also provides the possibility of examining whether treatment effects are evolutionary over time or stationary around a common mean (e.g., Hsiao et al. 2012; Ke and Hsiao 20222023).
 
4.
It provides the possibility to blend the advantages of both the nonparametric approach to estimate the treatment effects with the parametric approach to identify the causal factors (e.g., Ke et al. 2017).
 
In this paper, we selectively review the panel data approaches to measure the treatment effect in light of these advantages. We assume there are \(N_1\) units receiving the treatment and \((N-N_1)\) unit not receiving the treatment. However, for ease of illustration of the fundamental methodology, we consider using panel data to measure the treatment effects of the first unit. In other words, only the first unit is in the treatment group; the rest of units are in the control group. We assume that before period T, all cross-sectional units received no treatment. At period \(T +1, \ldots , T+m\) onwards, the first unit received treatment, i.e., \(y_{it}= y^1_{it}\) but not for the rest of the units where \(y_{it}=y^0_{it}\) for \(i=2,\ldots ,N\), \(t=T +1, \ldots , T +m\). Sections 2 and 3 consider the causal and non-causal approaches to construct causal approach for a single treated unit. In Sect. 4, we consider issues of multiple treated units. Since as argued in (1.4) and (1.5), conditional on \(y_{it}= y^1_{it}\) for \(t=T+1,\ldots , T+m,\) the measurement of treatment effects \(\Delta _{it}\) is essentially a prediction issue for \(y^0_{it}\), for notational ease, we shall drop the superscript“0”and simply use \(y_{it}\) for \(y^0_{it}\). Concluding remarks are in Sect. 5.

2 Causal approach

The causal approach essentially assumes the observed outcomes can be decomposed as the sum of some observed covariates \(\varvec{x}_{it}\) and the impact of unobserved factors represented by the error terms
$$\begin{aligned} y^1_{it} = g_1(\varvec{x}_{it})+ \varepsilon ^1_{it}, \end{aligned}$$
(2.1)
and
$$\begin{aligned} y^0_{it} = g_0(\varvec{x}_{it})+ \varepsilon ^0_{it}, \end{aligned}$$
(2.2)
where the error terms, \(\varepsilon ^1_{it}\) and \(\varepsilon ^0_{it}\), are typically assumed to be uncorrelated with \(\varvec{x}_{it}\),
$$\begin{aligned} E(\varepsilon ^1_{it}|\varvec{x}_{it})= E(\varepsilon ^1_{it})=0 \end{aligned}$$
(2.3)
and
$$\begin{aligned} E(\varepsilon ^0_{it}|\varvec{x}_{it})= E(\varepsilon ^0_{it})=0. \end{aligned}$$
(2.4)
Then the average treatment effects conditional on \(\varvec{x}\), ATE \((\varvec{x})\), are just
$$\begin{aligned} \text {ATE} (x)= g_1(\varvec{x}) -g_0 (\varvec{x}). \end{aligned}$$
(2.5)
However, since the observed data take the form (1.2), participation of treatment \((d_{it})\) could be correlated with the outcome (e.g., Heckman and Vytlacil 2001). Suppose, the treatment status dummy or participation decision model for \(d_{it}\) can be postulated by introducing a latent response function,
$$\begin{aligned} d^*_{it}=h(\varvec{z}_{it})+ v_{it}, \ E(v_{it}| \varvec{x}_{it}, \varvec{z}_{it})=0, \end{aligned}$$
(2.6)
where
$$\begin{aligned} d_{it} =\left\{ \begin{array}{ll} & 1 \quad \text {if} \quad d^{*}_{it}> 0, \\ & 0 \quad \text {if} \quad d^{*}_{it} \le 0. \end{array} \right. \end{aligned}$$
(2.7)
Then, conditional on \(\varvec{x}_{it}\) and \(d_{it}\), the expected value of \(\varepsilon ^j_{it}\) could be either
$$\begin{aligned} E(\varepsilon ^j_{it}|\varvec{x}_{it}, d_{it})=0,\quad j=0,1. \end{aligned}$$
(2.8)
or
$$\begin{aligned} E(\varepsilon ^j_{it}|\varvec{x}_{it}, d_{it}) \ne 0,\quad j=0,1. \end{aligned}$$
(2.9)
When (2.8) holds, i.e., \(f(\varepsilon ^1, \varepsilon ^0, v | \varvec{x} )=f(\varepsilon ^1, \varepsilon ^0 |x) f(v | \varvec{x})\), models (2.1) and (2.2) are typically called the two part model. If the conditional mean function, \(g_1 (\varvec{x})\) and \(g_0 (\varvec{x})\), is known, regression methods can be applied to obtain consistent estimator of their parameters.1 When the conditional mean functions are unknown, nonparametric methods can be applied to identify \(g_1 (\varvec{x})\) and \(g_0 (\varvec{x})\) (e.g., Li and Racine 2007).
If (2.9) holds, then \(f(\varepsilon ^1, \varepsilon ^0 | v )\ne f(\varepsilon ^1, \varepsilon ^0)\). Models (2.1), (2.2), (2.6), and (2.7) together with \(f(\varepsilon ^1, \varepsilon ^0, v )\) are typically referred to as sample selection models (e.g., Heckman 1979) and the observed data are subject to selection on unobservables, (e.g., Heckman and Vytlacil 2001).
If \(g_1(\varvec{x}), g_0(\varvec{x})\) and \(f(\varepsilon ^1, \varepsilon ^0, v)\) are known, then observed \(y^1_{it}\) or \(y^0_{it}\) under the assumption that \(g_1 (\varvec{x})=\varvec{x}' \varvec{\beta }_1\) and \(g_0(\varvec{x})=\varvec{x}' \varvec{\beta }_0\) takes the form
$$\begin{aligned} y^1_{it}&=E(y^1_{it}|\varvec{x}_{it},d_{it}=1)+ \eta ^1_{it} \nonumber \\&=\varvec{x}'_{it} \varvec{\beta }_1+ \gamma ^1 (\varvec{z}_{it})+ \eta ^1_{it}, \end{aligned}$$
(2.10)
and
$$\begin{aligned} y^0_{it}&=E(y^0_{it}|\varvec{x}_{it},d_{it}=0)+ \eta ^0_{it}, \nonumber \\&=\varvec{x}'_{it} \varvec{\beta }_0+ \gamma ^0 (\varvec{z}_{it})+ \eta ^0_{it}, \end{aligned}$$
(2.11)
where \(\gamma ^1(\varvec{z}_{it})= E(\varepsilon ^1_{it}|v_{it} > -h(\varvec{z}_{it}) )\), \(\gamma ^0(\varvec{z}_{it})= E(\varepsilon ^0_{it}|v_{it} < -h(\varvec{z}_{it}))\), and \(\eta ^1_{it}, \eta ^0_{it}\) denote the residuals. If \(f(\varepsilon ^1, \varepsilon ^0, v)\) are known, maximum likelihood method can be implemented to estimate \(\varvec{\beta }_1, \varvec{\beta }_0, \gamma ^1 (\cdot )\) and \( \gamma ^0 (\cdot )\) (e.g., Damrongplasit et al. 2010). If the joint distribution of \(f(\varepsilon ^1, \varepsilon ^0, v)\) is unknown, \( \gamma ^1 (\cdot )\) and \( \gamma ^0 (\cdot )\) are unknown.
When \(T=1\) (i.e., cross-sectional data), we drop the subscript t for ease of notation. Robinson (1988) notes that conditional on \(\varvec{z}_i,\)
$$\begin{aligned} E(y^j_i| \varvec{z}_i)= E(\varvec{x}_i| \varvec{z}_i)' \varvec{\beta }_j + \gamma ^j(\varvec{z}_i), \quad j=0,1. \end{aligned}$$
(2.12)
Subtracting (2.12) from (2.10) (or (2.11)) yields2
$$\begin{aligned} y_i-E(y^j_i| \varvec{z}_i)=(\varvec{x}_i -E(\varvec{x}_i|\varvec{z}_i))' \varvec{\beta }_j + \eta ^j_i, \quad j=0,1, \end{aligned}$$
(2.13)
where \(\eta ^j_i\) denotes the residual. Then a consistent estimator of \(\varvec{\beta }\) can be obtained by least-squares regression of (2.13). Ahn and Powell (1993) suggest to pairwise difference between \(y^j_i\) and \(y^j_l\) conditional on \(h(\varvec{z}_i)= h(\varvec{z}_l)\) to eliminate the sample selection effect,
$$\begin{aligned} y^j_i- y^j_l=(\varvec{x}_i-\varvec{x}_l)' \varvec{\beta }+ (\eta ^j_i- \eta ^j_l), \quad j=0,1. \end{aligned}$$
(2.14)
All these methods can be applied straightforwardly to panel data (i.e., \(T\ge 2\)) if \(\varepsilon ^1_{it}\) and \(\varepsilon ^0_{it}\) are independently, identically distributed over i and t. However, the availability of panel data allows one to relax this assumption and the assumption that \(E(\varepsilon ^j| \varvec{x})=0\) ((2.3) and (2.4)) by allowing \(E(\varepsilon ^j | \varvec{x}) \ne 0\) through decomposing \(\varepsilon ^j_{it}\) as the sum of two parts,
$$\begin{aligned} \varepsilon ^j_{it} = \delta ^j_{it} + u^j_{it}, \end{aligned}$$
(2.15)
such that \(E(\varvec{x}_{it}\delta ^j_{it})\ne 0\) and \(E(x_{it}u_{it})=0\) (e.g., Sickles 2005).
When
$$\begin{aligned} \delta ^j_{it}= \alpha ^j_i \quad \text {for} \quad t=1,\ldots , T, \end{aligned}$$
(2.16)
and \(u_{it}\) i.i.d. over i and t, Honore (1992), Honore and Kyriazidou (2000a) suggested to first take the difference over time to eliminate the individual specific effects \((\alpha ^j_i, \ j=0,1)\), then apply the Ahn and Powell (1993) pairwise difference method to get rid of the sample selection effects.
When the part of the error term correlated with \(\varvec{x} _{it}\), \(\delta ^j_{it}\), takes the interactive form,
$$\begin{aligned} \delta ^j_{it} = \varvec{\lambda }'_{i} \varvec{f}_t,\end{aligned}$$
(2.17)
where \(\varvec{f}_t\) is an r-dimensional common factors that stay the same across individuals, but vary over time and \(\varvec{\lambda }_i\) is an r-dimensional factor loading vector that stays constant over time but vary across i, Kong and Hsiao (2025) suggest first to follow Robinson’s procedure to get rid of sample selection effects, then apply the Pesaran (2006) common correlated effects approach or Hsiao et al. (2022a) transformal approach to estimate \(\varvec{\beta }\). When neither the conditional mean function \(g_1 (\varvec{x})\) or \(g_0 (\varvec{x})\) nor the joint distribution function of \(f(\varepsilon ^1, \varepsilon ^0, v)\) are known, under the unconfoundedness assumption, (i.e., no sample selection effects), nonparametric method can be used to identify E(y|x) (e.g., Li and Racine 2007). However, nonparametric methods suffer from the curse of dimensionality; (Stone 1980; Imbens and Angrist 1994; Imbens and Lemieux 2008; Rosenbaum and Rubin 1983, 1985), etc. have suggested various methods to get around the issues of the curse of dimensionality.
Table 1
Advantages and disadvantages—parametric, semiparametric, and nonparametric approaches
 
Advantage
Disadvantage
Parametric approach
Simultaneously control the selection of observables and unobservables issues. Estimate the average treatment effects (ATE) and the impact of each factor. Can obtain efficient estimation of the parameters of the conditional mean function
Specification of the conditional mean function and the probability distribution of the impact of omitted factors
Semiparametric approach
Simultaneously control the selection of observables and unobservables issues. Estimate the impact of most (or some) factors on the outcomes and ATE (in some cases). No need to specify the probability distribution of the impact of omitted factors
Specification of the conditional mean function
  
Estimates, although may achieve the same speed of convergence as the parametric approach, it is less efficient
Nonparametric approach
No need to specify the conditional mean function and the probability distribution of the impact of omitted factors
Unconfoundedness is the maintained hypothesis
  
Curse of dimensionality
Table 1 summarizes the advantages and disadvantages of the parametric, semiparametric, and nonparametric methods to construct the counterfactuals. Essentially, the advantage of parametric and semiparametric approaches to estimate the treatment effect is that it can simultaneously take account of selection on observables and selection on unobservables. The disadvantage is that the conditional mean functions \( E(y^1 | \varvec{x})\) and \( E(y^0|\varvec{x})\) are assumed known. The advantage of nonparametric approach is that there is no need to make any assumption of the conditional mean function, nor the joint distribution of the random error terms. The disadvantage is that some sort of unconfoundedness assumption has to be made, which is a maintained hypothesis, not a testable hypothesis. In other words, the advantages of parametric or semiparametric approach are the disadvantages of the nonparametric approach and the advantages of nonparametric approach are the disadvantages of the parametric or semiparametric approach. Unfortunately, without precise knowledge of how observables interact with unobservables, it is hard to choose between the two.

3 Non-causal approach

As discussed in introduction section, measuring the treatment effects is essentially a prediction issue. The focus of causal approach is to identify the parameters of the data-generating process of \(y_{it}\). Knowing the data-generating process of \(y^1_{it}\) and \(y^0_{it}\) provides useful information to generate a good prediction. However, it is not essential. Anything that is correlated with \(y^1_{it}\) or \(y^0_{it}\) could help prediction, even they are not causal. In that sense, the non-causal approach is less restrictive. No assumptions of the data-generating process of \(y_{it}\) need to be made. In this section, we consider non-causal approach to generate predictions very much similar to the idea of modeling time series data based on autocorrelations and partial autocorrelations, etc. (e.g., Box and Jenkins 1970). However, no lagged variables of treated units will be taken into consideration because they could be subject to the impact of treatment.
The non-causal approach of generating prediction only cares about how to generate good prediction. It does not concern about identifying the parameters of the true data-generating process. So we shall assume that the nontreated outcomes are unconfounded, i.e.,
$$\begin{aligned} f(y_{it}| d_{1t}) =f(y_{it}), \ \text {for} \ i=1,\ldots , N; \ t=1,\ldots , T. \end{aligned}$$
(3.1)
We consider two data-driven modeling approaches to generate predictions-factor approach or linear projection approach under the assumption that the model parameters stay constant over time.3 Namely, our focus is only on measurement.4
(a) Factor Modeling
We assume strong cross section and time dependence across N cross-sectional units over T time periods of \(y_{it}\) can be captured by a factor model of the form
$$\begin{aligned} y_{it}=\varvec{\lambda }_{i}^{\prime }\varvec{f}_{t}+u_{it},\quad i=1,\ldots ,N;t=1,\ldots ,T, \end{aligned}$$
(3.2)
where \(\varvec{f}_t\) is r dimensional common factors that are the same across i but vary over t, \(\varvec{\lambda }_i\) is r dimensional factor loadings that stay constant over t but vary across i to represent the innate differences or endowment between individuals, and \(u_{it}\) is the random error term that conditional on \(\varvec{\lambda }_i\) and \(\varvec{f}_t\) has mean zero, \(E(u_{it}|\varvec{\lambda }_i, \varvec{f}_t)=0\), but could be weakly cross-correlated.5
Stacking all N cross-sectional units one after another at time t, \(\varvec{y}_{t}=(y_{1t},y_{2t},\mathbf {\ldots },y_{Nt})^{\prime }=(y_{1t},\varvec{\tilde{y}}_{t}^{\prime })^{\prime }\), we have
$$\begin{aligned} \textbf{y}_{t}=\Lambda \varvec{f}_{t}+\textbf{u}_{t},\quad \ \ t=1,\ldots ,T, \end{aligned}$$
(3.3)
where \(\Lambda =(\varvec{\lambda }_1, \varvec{\lambda }_2, \ldots , \varvec{\lambda }_{N})'=(\varvec{\lambda }_1, \tilde{\Lambda })\) and \(\varvec{u}_t = ( u_{1t}, \ldots , u_{Nt})'= (u_{1t},\tilde{\varvec{u}}'_t )'\). Alternatively, we can stack the ith individual’s T time series observations as \(\textbf{y}_{i}=(y_{i1},\mathbf {\ldots },y_{iT})^{\prime }\),
$$\begin{aligned} \textbf{y}_{i}=\mathbf {F \varvec{\lambda }}_{i}+\textbf{u}_{i},\quad \ \ i=1,\ldots ,N, \end{aligned}$$
(3.4)
where \(F=(\varvec{f}_1, \ldots , \varvec{f}_T)'\) and \(\varvec{u}_i=(\varvec{u}_{i1}, \ldots , u_{iT})'\).
The common assumptions for \(\varvec{f}_t, \varvec{\lambda }_i\) and \(u_{it}\) are:
Assumption 1
The factor process satisfies \(E\left\| \varvec{f} _{t}\right\| ^{4}\le M<\infty \) and \(\frac{1}{T}\sum _{t=1}^{T}\varvec{f} _{t}\varvec{f}_{t}^{\prime }\rightarrow _{p}\Sigma _{f},\) where \(\Sigma _{f}\) is an \(r\times r\) non-singular constant matrix.
Assumption 2
The loading \(\varvec{\lambda }_{i}\) is either fixed constant or is stochastic with \(E\left\| \varvec{\lambda }_{i}\right\| ^{4}\le M<\infty .\) In either case \(\frac{1}{N} \sum _{i=1}^{N}\varvec{\lambda }_{i}\varvec{\lambda }_{i}^{\prime }\rightarrow _{p}\Sigma _{\lambda },\) where \( \Sigma _{\lambda }\) is an \(r\times r\) non-singular constant matrix.
We merge the impact of those common components that only exert influences over finite number of cross-sectional units into \(u_{it}\) by allowing the \(N \times 1\) vector, \(\varvec{u}_{t}\) to be weakly cross-dependent.
Assumption 3
The random error terms \(\textbf{u}_{t}=(u_{it}, \ldots , u_{it})'\) are independently, identically, distributed over t with non-singular covariance matrix
$$\begin{aligned} E(\varvec{u}_{t}\varvec{u}_{t}^{\prime })=\tilde{\Omega }=\left( \begin{array}{cc} \sigma _{1}^{2} & \varvec{c}^{\prime } \\ \varvec{c} & \Omega \end{array} \right) , \end{aligned}$$
where \(\sigma _{1}^{2}=E\left( u_{1t}^{2}\right) \), \(\Omega =E(\tilde{\textbf{u}}_{t}\tilde{\textbf{u}}_{t}^{\prime })\), and \(\textbf{c}=E(\tilde{\textbf{u}}_{t}u_{1t})\). Moreover, all N nonzero eigenvalues of \(\tilde{\Omega } \) are O(1).
Modeling panel data by a factor model are a very useful dimensional reduction approach to summarize the variation across individual (i) over time (t), (e.g., Anderson and Rubin 1956; Lawley and Maxwell 1971). They are widely applied to macro and financial economics (e.g., Chamberlain and Rothschild 1983; Connor and Korajczyk 1986; Forni et al. 1998; Ross 1976; Sargent and Sims 1977) and are also used to generate parsimonious predictive models for high-dimensional time series data (e.g., Stock and Watson 1989, 2002). Factor models are also used as the basis for panel approach to construct counterfactuals to measure the treatment effects of a social program (e.g., Hsiao et al. 2012).
Under Assumptions 1 and 2, \(\varvec{\lambda }_1\) and \(\varvec{f}_{t}\) are not identified. Since our focus is on prediction, not identifying the parameters of the data-generating process of (3.2), there is no loss of generality to follow Anderson and Rubin (1956), Bai (2003, 2009), etc. to use the normalization conditions \( \Sigma _\lambda =I_r\) and \(\Sigma _f\) diagonal. Then, conditional on \(Y^T =(\varvec{y}_1, \ldots , \varvec{y}_T)\), \(\Lambda \) can be estimated as \(\sqrt{N} \) times the eigenvectors corresponding to the r largest eigenvalues of the determinant equation
$$\begin{aligned} \left| \frac{1}{T}\sum \limits _{t=1}^{T}\varvec{y}_{t}\varvec{y} _{t}^{\prime }-\delta \ \varvec{I}_{N}\right| =0. \end{aligned}$$
(3.5)
Conditional on \(\hat{\Lambda } = (\varvec{\hat{\lambda }}_1,\ldots ,\varvec{\hat{\lambda }}_N) = (\varvec{\hat{\lambda }}_1,\hat{\tilde{\Lambda }})\), and \(\varvec{f}_t\) can be estimated by
$$\begin{aligned} \varvec{\hat{f}}_t=(\hat{\tilde{\Lambda }}' \hat{\tilde{\Lambda }})^{-1} \hat{\tilde{\Lambda }}' \varvec{\tilde{y}}_t. \end{aligned}$$
(3.6)
(b) Linear Projection Modeling6
Alternatively, we can express \(y_{1t}\) as a function of \(\varvec{\tilde{y}}_t=(y_{2t},\ldots ,y_{Nt}),\)
$$\begin{aligned} y_{1t}=E^*(y_{1t}|\tilde{\varvec{y}} _t)+ \eta _{t} = \varvec{w}' \tilde{\varvec{y}} _t + \eta _t, \end{aligned}$$
(3.7)
where \(E^*(y_{1t}|\tilde{\varvec{y}} _t)\) denotes the linear projection of \(y_{1t}\) on \(\tilde{\varvec{y}}_t\) or the conditional mean of \(y_{1t}\) on \(\tilde{\varvec{y}_t}\) if the conditional mean is linear in \(\tilde{\varvec{y}} _t\) (e.g., \((y_{1t}, \tilde{\varvec{y}} _t)\) are Gaussian). Under Assumptions 13, the coefficients \(\varvec{w}\) are related to the underlying factor model as
$$\begin{aligned} \varvec{w} = [ E (\tilde{\varvec{y}} _t \tilde{\varvec{y}}'_t)]^{-1} E(\tilde{\varvec{y}}_t y_{1t})=( \tilde{\Lambda }\tilde{\Sigma }_f \tilde{\Lambda '}+ \Omega )^{-1} (\tilde{\Lambda } \tilde{\Sigma }_f \lambda '_1+ \varvec{c}) \end{aligned}$$
(3.8)
with \(\tilde{\Lambda }\) denoting the factor loading matrix for control units \(\tilde{\varvec{y}}_t\). The coefficients \(\varvec{w}\) based on \(Y^T\) can be estimated by
$$\begin{aligned} \varvec{\hat{w}}=\left( \sum ^T_{t=1}\varvec{\tilde{y}}_t \varvec{\tilde{y}'}_t \right) ^{-1}\left( \sum ^T_{t=1}\varvec{\tilde{y}}_t y_{it}\right) . \end{aligned}$$
(3.9)
The error term \(\eta _t\), by construction, is orthogonal to \(\varvec{\tilde{y}}_t\) with mean square error \(\sigma ^2_\eta =E(\eta ^2_t)\), where
$$\begin{aligned} \sigma _{\eta }^{2} =\sigma _{1}^{2} +\varvec{\lambda }_{1}^{\prime }\Sigma _{f}\varvec{\lambda } _{1} -( \varvec{\lambda }_{1}'\Sigma _{f}\tilde{\Lambda }'+ \varvec{c}') (\tilde{\Lambda } \Sigma _{f}\tilde{\Lambda }^{\prime }+\Omega )^{-1} (\tilde{\Lambda }\Sigma _{f}\varvec{\lambda }_1+\varvec{c}). \end{aligned}$$
(3.10)
The LP model is closely related to the panel data approach (PDA) and synthetical control approach (SCM).
(i) The PDA Approach
Let \( \varvec{z}_t\) denote all observed variables that are independent of \(d_{1t}\) at time t. For simplicity, we let \( \varvec{z}_t= (y_{2t},\ldots , y_{Nt},\varvec{x}'_{1t}, \ldots , \varvec{x}'_{Nt})\). Under the assumption that
$$\begin{aligned} \varvec{z}_{t} \perp d_{1t} \end{aligned}$$
(3.11)
Hsiao et al. (2012) ((HCW)) suggest to approximate \(y_{1t}\) through
$$\begin{aligned} y_{1t}=E^*(y_{1t}|\varvec{z}_t)+ \eta _{1t}, \ t=1,\ldots , T, \end{aligned}$$
(3.12)
where \(E(\eta _{1t}| \varvec{z}_t)=0\). Then \(E^*(y_{1t}|\varvec{z}_t)\) is an unbiased predictor for \(y_{1t}\) conditional on \(\varvec{z}_t\) as shown in (3.7). HCW suggests to approximate (3.12) by a subset of \(\varvec{z}_t,\)
$$\begin{aligned} y_{1t}= \ a+ \varvec{c}' \varvec{z}^*_t + \eta _{1t}, \end{aligned}$$
(3.13)
where \(\varvec{z}^*_{t}\) is selected through a model selection procedure, while Li and Bell (2017) proposes to use LASSO (Tibshirani 1996).
When \(\varvec{z}_t = \tilde{\varvec{y}}_t\), the PDA approach is identical to the LP approach. The process of selecting a subset of \(\varvec{z}_t, \varvec{z}^*_t\) in (3.12) is equivalent to consider a subset of \(\tilde{\varvec{y}}_t\), say \(\tilde{\varvec{y}}^*_t\). As long as the dimension of \(\tilde{\varvec{y}}^*_t\) is greater than r,
$$\begin{aligned} \left( \begin{array}{cc} y_{1t} \\ \varvec{\tilde{y}}^*_t \end{array} \right) = \left( \begin{array}{cc} \lambda '_1 \\ \tilde{\Lambda }^* \end{array} \right) \varvec{f}_t + \left( \begin{array}{cc} u_{1t} \\ \varvec{\tilde{u}}^*_t \end{array} \right) . \end{aligned}$$
(3.14)
LP of \(y_{1t}\) on \(\tilde{\varvec{y}}^*_t\) is equal to
$$\begin{aligned} y_{1t}= \varvec{w}^{*'} \varvec{\tilde{y}^*_t} + \eta ^*_t, \end{aligned}$$
(3.15)
with
$$\begin{aligned} & \varvec{w}^* = (\tilde{\Lambda }^* \Sigma _f \tilde{\Lambda }^{*'}+ \Omega ^*)^{-1}(\tilde{\Lambda }^{*'} \Sigma _f \varvec{\lambda }_1 + \varvec{c}^*), \end{aligned}$$
(3.16)
$$\begin{aligned} & \sigma _{\eta ^*}^{2} =\sigma _{1}^{2} +\varvec{\lambda }_{1}^{\prime }\Sigma _{f}\varvec{\lambda } _{1} - (\varvec{\lambda }_{1}'\Sigma _{f} \tilde{\Lambda }^{*'}+ \varvec{c}^{*'}) (\tilde{\Lambda }^* \Sigma _{f}\tilde{\Lambda }^{*\prime }+\Omega ^*)^{-1} (\tilde{\Lambda }^*\Sigma _{f}\varvec{\lambda }_1+\varvec{c}^*), \nonumber \\ \end{aligned}$$
(3.17)
where \( \Omega ^* = E(\varvec{\tilde{u}}^*_t \varvec{\tilde{u}}^{*'}_t)\) and \(\varvec{c}^*=E( \varvec{\tilde{u}}^* u_{1t})\).
(ii) Synthetic Control Method
Abadie et al. (2010, 2015) proposed a synthetic control method (SCM) that predicts \(y_{1t}\) by
$$\begin{aligned} y_{1t}= \sum _{i=2}^{N} b_{i}y_{it}, \ \ t=T +1, \ldots , T+m, \end{aligned}$$
(3.18)
where the \(y_{it}\) are selected to be those in the control units that have data-generating process similar to that of \(y_{1t}\). The \(b_i\) is obtained by minimizing
$$\begin{aligned} \left[ \left( \begin{array}{cc} \varvec{y}_{1} \\ \varvec{\tilde{x}}_1 \end{array} \right) - \left( \begin{array}{cc} Y \\ \tilde{X} \end{array} \right) \varvec{b} \right] ' V \left[ \left( \begin{array}{cc} \varvec{y}_{1} \\ \varvec{\tilde{x}}_1 \end{array} \right) - \left( \begin{array}{cc} Y \\ \tilde{X} \end{array} \right) \varvec{b} \right] , \end{aligned}$$
(3.19)
subject to the constraint that
$$\begin{aligned} b_i \ge 0, \ \text {and} \ \sum ^N_{i=2} b_i=1, \end{aligned}$$
(3.20)
where \(\varvec{y}_1\) and Y denote the \(T \times 1\) and \(T \times (N-1)\) matrix of pre-treatment \(y_{1t}\) and \(y_{jt}, j=2, \ldots , N\), respectively, \(\tilde{\varvec{x}}_1\) and \(\tilde{X}\) denote the pre-treatment time series average of \(\varvec{x}_{1t}\) and \(\varvec{x}_{jt}, j=2, \ldots , N\), respectively, and V is a positive definite matrix.
Conditional on (\(\varvec{y},\tilde{X}\)) independently of \(d_{1t}\), the difference between PDA and SCM is that the former is an unconstraint regression, while the latter restricts the regression model (3.18) with intercepts \(a=0\) and \( \varvec{c}\) satisfying (3.20). If the restrictions are correct, then the SCM is more efficient. If the restrictions are not correct, SCM could lead to biased prediction of the counterfactuals while the PDA remains unbiased. For a general discussion of LP, PDA, or SCM, Gardeazabal and Vega-Bayo (2016), Wan et al. (2018).
(c) Prediction Error Comparison
Under model (3.2) or (3.7), the best predictor for \(y_{1,T+h}\) is
$$\begin{aligned} \tilde{y}_{1,T+h}= \varvec{\lambda }'_1 \varvec{f}_{T+h} \end{aligned}$$
(3.21)
or
$$\begin{aligned} \hat{y}_{1,T+h}= \varvec{w}' \tilde{\varvec{y}} _{T+h}, \end{aligned}$$
(3.22)
respectively. However, \(\Lambda , F\), and W are unknown. Estimation of \(\Lambda \) by (3.5), conditional on \(\hat{\Lambda } = (\varvec{\hat{\lambda }}_1,\ldots ,\varvec{\hat{\lambda }}_N) = (\varvec{\hat{\lambda }}_1,\hat{\tilde{\Lambda }})\), \(\varvec{f}_{T+h}\) can be estimated by
$$\begin{aligned} \varvec{\hat{f}}_{T+h}=(\hat{\tilde{\Lambda }}' \hat{\tilde{\Lambda }})^{-1} \hat{\tilde{\Lambda }}' \varvec{\tilde{y}}_{T+h}, \quad h=1, \ldots , m. \end{aligned}$$
(3.23)
Substituting \(\hat{\varvec{\lambda }}_1\) and \(\hat{\varvec{f}}_{T+h}\) into (3.21), the prediction error of \(\hat{\tilde{y}}_{1,T+h}\) = \(\varvec{\hat{\lambda }}'_{1} \varvec{\hat{f}}_{T+h}\) is
$$\begin{aligned} \tilde{\varphi }_{1,T+h}= y_{1,T+h}-\hat{\tilde{y}}_{1,T+h}=u_{1,T+h} +(\varvec{\lambda }'_1 \varvec{f}_{T+h} - \varvec{\hat{\lambda }}'_1 \varvec{\hat{f}}_{T+h} ). \end{aligned}$$
(3.24)
When N is fixed and \(T\rightarrow \infty , \Lambda \) is \(\sqrt{T}\) consistent, but \(\hat{f}_{T+h} - f_{T+h}= O(\frac{1}{\sqrt{N}})\). When \((N,T) \rightarrow \infty \), Bai (2003) showed that the asymptotic variance of \(\tilde{\varphi }_{1,T+h}\)
$$\begin{aligned} \text {Var} (\tilde{\varphi }_{1,T+h})= \sigma ^2_1 + \frac{1}{N} \varvec{\lambda }'_1 \ \Sigma ^{-1}_\lambda \left( \frac{1}{N}\Lambda ' \Omega \Lambda \right) \Sigma ^{-1}_{\lambda } \varvec{\lambda }_1 + \frac{\sigma ^2_1}{T} \varvec{f}'_{T+h} \Sigma ^{-1}_f \varvec{f}_{T+h} + o(1). \nonumber \\ \end{aligned}$$
(3.25)
The linear projection model coefficients \(\varvec{w}\) based on \(Y^T =(\varvec{y}_1,\ldots , \varvec{y}_T)\) are estimated by (3.8). Substituting (3.8) into (3.22), the prediction error of \(y_{1,T+h}\) by \(\hat{\hat{y}}_{1,T+h}=\varvec{\hat{w}'}\varvec{\tilde{y}}_{T+h} \) is
$$\begin{aligned} \hat{\varphi }_{1,T+h}= y_{1,T+h} - \varvec{\hat{w} }' \tilde{\varvec{y}}_{T+h}, \quad h=1,\ldots , m \end{aligned}$$
(3.26)
with prediction error variance
$$\begin{aligned} \text {Var} \left( \hat{\varphi }_{1,T+h}\right) = \sigma ^2_\eta \left[ 1+\varvec{\tilde{y}}'_{T+h} \left( \sum \limits _{t=1}^{T}\tilde{\textbf{y}}_{t} \tilde{\textbf{y}} _{t}^{\prime }\right) ^{-1} \varvec{\tilde{y}}_{T+h} \right] \end{aligned}$$
(3.27)
We note that for a good estimate of \(\varvec{\lambda }_1\), (or \(\hat{\Lambda }\)), we need T to go to infinity. For a good estimate of \(\varvec{f}_{T+h}\), we need N to go to infinite. On the other hand, a good estimate of \(\varvec{w}\) only requires T to go to infinity. When N or T is finite, the predictor \(\varvec{\hat{\lambda }'}_1 \varvec{\hat{f}}_{T+h}\) may be a biased predictor of \(y_{T+h}\). The mean square prediction error of (3.24) depends on the configurations of N and T. On the other hand, the LP predictor \(\hat{\hat{y}}_{1,T+h}\) is always unbiased conditional on \((\varvec{\tilde{y}}_1, \ldots , \varvec{\tilde{y}}_T,\varvec{\tilde{y}}_{T+h})\) because \(\varvec{\hat{w}}\) is an unbiased estimator of \(\varvec{w}\).
Hsiao (2024) have shown that, under Assumptions 1-3, for the
(i) Case 1: \((N,T)\rightarrow \infty \). When \(\frac{N}{T}\rightarrow a\ne 0<\infty ,\) the asymptotic mean square prediction error (MSPE) of (3.24) and (3.26) is identical, \(MSPE\left( \widehat{\tilde{y}}_{1,T+h}\right) =MSPE\left( \widehat{\hat{y}}_{1,T+h}\right) \) if \(u_{1t}\) is independent of \(u_{jt}\) when \(j\ne 1.\) If \(u_{1t}\) is correlated with \(\tilde{\textbf{u}}_{t},\) then (3.26) has smaller MSPE than (3.24).
The reason LP approach \((\hat{\hat{y}}_{1,T+h})\) is in general more efficient than the factor modeling approach (FB \((\hat{\tilde{y}}_{1,T+h})\)) in view of MSPE because the LP approach is able to take into account the correlation between \(u_{1t}\) and \(\tilde{\textbf{u}}_{t}\) while the factor modeling approach (FB) does not. Should one replace the predictor \(\varvec{\hat{\lambda }}'_1 \varvec{\hat{f}}_{T+h} \) by \(\varvec{\tilde{\lambda }}'_1 \varvec{\hat{f}}_{T+h} + E(u_{1,T+h}| \varvec{\tilde{u}}_{T+h}) \) (see, e.g., Hsiao and Zhou 2019), we expect that they will have the same prediction error as the LP prediction. However, the LP approach is computationally more convenient because it only requires an OLS estimation of \(y_{1t}\) on \(\tilde{\textbf{y}} _{t}\), while the FB approach requires the identification of the number of factors and the principal component estimation of the latent factor structure is more laborious.
(ii) Case 2: N fixed, \(T\rightarrow \infty \). As long as \(N>r,\) when \(T\rightarrow \infty ,\) the LP predictor \(\varvec{\hat{w}}' \varvec{\tilde{y}} _{T+h}\) has smaller mean square prediction error (MSPE) than the FB predictor \(\varvec{\hat{\lambda }}'_1 \varvec{\hat{f}}_{T+h}\).7
(iii) Case 3: T fixed, \(N\rightarrow \infty \).
It is not feasible to directly implement the LP approach (4.7) when \(N> T\) because \(\frac{1}{T}\sum \limits _{t=1}^{T}\tilde{\textbf{y}}_{t}\tilde{\textbf{y}} _{t}^{\prime }\) could be a near singular matrix. Therefore, with finite T, \(\varvec{\hat{w}}\) is not likely to be a good estimator of \(\varvec{w}\). However, when N is large, it is not unreasonable to assume that the N cross-sectional units are randomly drawn at time t.Hsiao and Zhou (2024) have suggested to randomly break up the \(\left( N-1\right) \) control units into G subgroups, each consists of \(N_{g}\) cross-sectional units subject to \(N_{g}\) less than T,  and then use LP to generate the gth group predicted value of \( y_{1,T+h}\) as
$$\begin{aligned} \hat{y}_{1,T+h}^{g}=\hat{\textbf{w}}_{g}^{\prime }\tilde{\textbf{y}} _{T+h}^{g}, \ \ g=1,\ldots ,G, \end{aligned}$$
(3.28)
where
$$\begin{aligned} \hat{\textbf{w}}_{g}=\left( \sum \limits _{t=1}^{T}\tilde{\textbf{y}}_{t}^{g} \tilde{\textbf{y}}_{t}^{g\prime }\right) ^{-1}\sum \limits _{t=1}^{T}\tilde{\textbf{y}}_{t}^{g}y_{1t}, \end{aligned}$$
(3.29)
and \(\tilde{\textbf{y}}_{t}^{g}\) is a \(N_{g}\times 1\) vector consists of \( N_{g}\) cross-sectional units that belong to the gth subgroup, i.e., \( \tilde{\textbf{y}}_{t}^{g}=\left( 1_{\left( i\in g\right) }y_{it}\right) ,\) \( g=1,\ldots ,G.\) Then generate the predicted value of \(y_{1,T+h}\) as the average of G predictors \(\left\{ \hat{y}_{1,T+h}^{g}:g=1,\ldots ,G\right\} \)
$$\begin{aligned} \widehat{\bar{y}}_{1,T+h}^{G}=\frac{1}{G}\sum \limits _{g=1}^{G}\hat{y} _{1,T+h}^{g}. \end{aligned}$$
(3.30)
Hsiao and Zhou (2024) have shown that when T is fixed and \( N\rightarrow \infty ,\) the mean square prediction error (MSPE) of \(\varvec{\lambda }'_{1} \varvec{\hat{f}}_{T+h}\) is greater than (3.30).
(iv) Case 4: Both N and T are finite.
It is hard to compare the mean square prediction error between the FB and LP predictor. However, Hsiao and Zhou (2024) have argued that if post-treatment outcomes in the absence of treatment are similar to the outcomes before the treatment, the LP method is likely to be more accurate.
(v) Case 5: Combination of causal factors and factor modeling.
Xu (2017), Hsiao and Zhou (2019) consider
$$\begin{aligned} & y_{it} =\textbf{x}_{it}^{\prime }\varvec{\beta }+\varepsilon _{it}, \end{aligned}$$
(3.31)
$$\begin{aligned} & \varepsilon _{it} =\varvec{\lambda }_{i}^{\prime }\textbf{f}_{t}+u_{it}, { \ \ }i=1,\ldots ,T;i=1,\ldots ,N, \end{aligned}$$
(3.32)
where \(\varvec{\lambda }_{i},\) \(\textbf{f}_{t}\) and \(u_{it}\) satisfy Assumptions 1-3, where \(\textbf{x}_{it}\) could be correlated with \(\varvec{\lambda }_{i},\) and \(\varvec{f}_{t}\), but \(E(u_{it}|\varvec{x}_{it},\varvec{\lambda }'_{i}\varvec{f}_{t})=0\) (e.g., Bai 2009; Hsiao and Zhou 2019; Hsiao et al. 2022a;  Pesaran 2006; Xu 2017). The error of predicting \(y_{it}\) then takes the form
$$\begin{aligned} \tilde{\varepsilon }_{it}=y_{it}-\hat{y}_{it}=\textbf{x}_{it}^{\prime }\left( \varvec{\beta }-\varvec{\hat{\beta }}\right) +\left( \varepsilon _{it}-\hat{ \varepsilon }_{it}\right) , \end{aligned}$$
(3.33)
i.e., the prediction error consists of two parts, the part due to the error of estimating \(\varvec{\beta }\) and the part due to \(\left( \varepsilon _{it}-\hat{\varepsilon }_{it}\right) .\) If the same estimation method is applied to estimate \(\varvec{\beta }, \) the part due to \(\textbf{x} _{it}^{\prime }\left( \varvec{\beta }-\varvec{\hat{\beta }}\right) \) is identical in (3.33) independent of which method is utilized to estimate \(\varvec{\beta }.\) Thus, the analysis of the relative merits between FB and LP methods continues to hold for model of (3.32)–(3.33).
Table 2
Mean square prediction error (MSPE)-linear projection (LP) versus factor models (FB)
 
N fixed
\(N\rightarrow \infty \)
T fixed
No definite conclusion. The MSPE depends on realized \(\mathrm{y_{it}}\) and \({{\textbf {x}}_{\textrm{it}}}\)
MSPE (modified LP) \(\le \) MSPE (FB)
\(T\rightarrow \infty \)
MSPE (LP) \(\le \) MSPE (FB)
MSPE (LP) \(\le \) MSPE (FB)
We summarize the prediction error comparison between FB and LP methods in terms of N and T in Table 2. The limited Monte Carlo studies conducted by Hsiao and Zhou (2024) also showed that the above analytical results based on large sample analysis also hold when N and T are finite. Their empirical analysis of the 1990 Germany reunification effects based on LP projection approach appears to show steady and plausible results, while the factor analysis appears to be erronic.

4 Multiple treated units

In Sects. 2 and 3, we consider the accuracy of measuring treatment effects based on the assumption that a single unit receives the treatment. When there are multiple units receiving the treatment, in principle we can still consider the single equation approach one by one and then aggregate the micro-predictions to obtain the aggregated predictions. Based on the criterion of \(|| \varvec{y} - \hat{\varvec{y}}||\) and the transversality argument, as long as one method is likely to generate a more accurate prediction for any treated unit, aggregating more accurately predicted units is likely to generate more accurate aggregated predictions for whatever linear aggregation method is used (e.g., Hsiao et al. 2022b). However, predicting each treated unit one by one could be computationally laborious if there are many treated units.
(i) Distribution of Treatment Effects
An alternative approach could be to first aggregate multiple treated units into single unit; then, use single equation approach to generate the predictions for the aggregated units. However, aggregation could raise complicated issues as discussed in Hsiao (2022). Moreover, there could be issues of whether to summarize the outcomes of multiple units in terms of some moment conditions (say, ATE) or in terms of the distribution of individual outcomes yield more information. For instance, Maasoumi and Wang (2022) consider the measurement of treatment effects of closing earning gender gap between the policy option 1 where women and men are paid on the scale conditional on human capital characteristics (structural effect) and the policy option 2 where the current pay scale between men and women remains the same, but women’s human capital characteristics become the same as men’s (composition effect). Based on the US Current Population Survey data from 1976–2013, they found for some quantiles of the distributions of treatment effects between option 1 and option 2, option 1 could be preferred, but for other quantiles, option 2 could be preferred. However, it is difficult to summarize the information from decomposition of distribution analysis. To obtain a unique ranking, Maasoumi and Wang (2019, 2022) suggest a stochastic dominance ranking criterion within the class of weakly increasing utility function u(y) in the form such that when F(y) and G(y) are the distribution of treatment effects for options 1 and 2 and u(y) be every weakly increasing utility function of y, then F(y) is first-order stochastically dominating G(y) if and only if
$$\begin{aligned} \int u(y) \ d F(y) \ge \int u(y) \ d G((y). \end{aligned}$$
(4.1)
The criterion (4.1) not only provides an unambiguous ranking when there exists dispersion of treatment effects between different policy options, but also allows the checking of the robustness of treatment effects comparison through tests of stochastic dominance (e.g., Linton et al. 2005).
(ii) Identifying Causal Factors
The prediction approach to measure the treatment effects is a non-causal approach. If the treatment effects for individuals, \(\varvec{\hat{\Delta }}_{it},\) are different for different i (or heterogeneous), it provides a way to link the non-causal approach with the causal approach. Consider
$$\begin{aligned} \hat{\Delta }_{it}=a+ \varvec{b}' \varvec{x}_{it} + \varepsilon _{it}.\end{aligned}$$
(4.2)
One may regress estimated \(\Delta _{it}\) on \(\varvec{x}_{it}\) to find the impact of changes in \(\varvec{x}\) on \(\Delta \). For example, Ke et al. (2017) showed that the impact of China’s high-speed rail projects are different for different localities. They then regressed the estimated treatment effects \(\Delta _{it}\) on causal factors such as industrial share in employment, service share in employment, size of state enterprises employment, university enrollment per 10,000 people, number of star hotels (an approximation of tourist attraction, etc.) and showed that they are important causal factors to differences in treatment effects.8

5 Concluding remarks

Under the assumption that the panel data contain both the pre-treatment and post-treatment control units information, we review both the panel causal approach and non-causal approach to construct the counterfactuals for the measurement of treatment effects. We argue that if the emphasis is on the measurement of treatment effects, then it is just an issue of predictions. There is no need to consider the identification and estimation of the parameters of the data-generating process. For models under the nonconfoundedness assumption, we mainly review data-driven LP and FB approach to generate counterfactuals. In general, the linear projection approach can be applied for any data-generating process and is likely to yield more accurate predictions whatever the sample configuration of N and T. Moreover, the LP approach can be applied to a variety of data-generating processes. The equation
$$\begin{aligned} y_{it}=a+E\left( y_{it}|\textbf{X}_{t}\right) +\eta _{it}, \end{aligned}$$
(5.1)
always holds, where \(\textbf{X}_{t}\) can include \(\tilde{\textbf{y}}_{t},\) lagged \(\tilde{\textbf{y}}_{t}\) or any covariates that satisfy \(f\left( \textbf{X}_{t}|d_{1t}\right) =f\left( \textbf{X}_{t}\right) .\) Chen (2023) has shown that the LP is an attractive choice against a wide class of matching or difference-in-difference estimators.9
We did not consider another widely applied difference-in-difference approach (DID, e.g., Cameron and Trivedi 2005). The DID approach is mainly suggested for the analysis of repeated cross-sectional data. Although one can apply the methodology to the panel data, the application of DID in panel data to the parametric approach is just the conventional dummy variable approach (e.g., Damrongplasit 2009). For the application to nonparametric approach, it requires a nonparametric estimation of \(E(y_{1,T+h}| \varvec{z}_{T+h})\) period by period, \(h=1, \ldots , m\). It is computationally more cumbersome than the LP or FB approach. Moreover, as shown by Stone (1980) the convergence rate for nonparametric estimates is \(N^{-\frac{2\alpha }{2 \alpha +r}}\), where \(\alpha \) denotes the degree of smoothness of \(E(y|\varvec{z})\) (e.g., Chen 2007; Newey 1997) and r denotes the dimension of the conditional covariate \(\varvec{z}\), while the LP and FP approach is computationally straightforward and has faster convergence rate of either \(T^{-1/2}\) or \(N^{-1/2}\).
However, it should be noted that the data-driven approaches to construct counterfactuals, although reasonably simple to implement, are difficult to simulate the effects of treatment outcomes under different policy scenarios because they do not involve the identification and estimation of the parameters of data-generating process. Under the assumption that policy change does not change the decision rules (i.e., Lucas 1976 Critique does not apply), a typical procedure to consider outcomes under different policy scenarios is through the following steps:
Step 1: Construct a theoretical model for the outcomes of interest.
Step 2: Estimate the parameters of the theoretical model from observed data.
Step 3: Simulate the potential outcomes under different scenarios by manipulating the conditional covariates of the theoretical model.
For instance, Pesaran and Yang (2022) use a stochastic model of epidemics on networks to consider COVID-19 infection rate outcomes under different policy scenarios. But without data showing the outcomes under different policy scenarios, it is not possible to duplicate what they did through a noncausal approach.
It should also be noted that our review is based on the assumption that the observed data are either subject to the treatment or in the absence of the treatment. In many cases, the observed data could be subject to multiple treatments (e.g., Fujiki and Hsiao 2015; Ke and Hsiao 2023). For instance, the data observed during the pandemic, say COVID-19, are either the outcomes due to the outcomes of the pandemic and the specific disease control policy under the pandemic or in the absence of both. To separate the impact of pandemic and the effectiveness of specific control policies, a single equation approach, in general, is incapable of separating the two. It may need the combination of different approaches to provide separate estimates of each specific treatment (e.g., Ke and Hsiao 2022, 2023). Moreover, our review is based on a single equation approach (i.e., cateris paribus approach). When a policy has macro (or global)-impacts, one needs to take a system of equations (or mutatis mutandis) approach (e.g., Hood and Koopmans 1953; Hsiao 1983; Theil 1958). In particular, when policy changes lead to changes in decision rule (e.g., Lucas 1976), the value of the conditional covariates could also change due to policy change. Without a model capturing how other variables would change due to policy changes, the predictions based only on a specific policy change variable while holding the values of other variables constant are bound to be misleading. Furthermore, incorporating machine learning algorithms to convert the text or sentimental data into digital form may also provide a real-time possibility to capture the underlying nonlinearities that could not be captured with a linear functional form (e.g., Hsiao 2024; Hsiao and Zhao 2000).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Footnotes
1
The widely used difference-in-difference approach (e.g., Cameron and Trivedi 2005) applied to the panel parametric specification is equivalent to assuming \(f(\varepsilon ^1,\varepsilon ^0, v|\varvec{x})= f(\varepsilon ^1,\varepsilon ^0 | \varvec{x}) f(v|\varvec{x}) \) (e.g., Damrongplasit 2009; Hsiao 2022).
 
2
Robinson (1988) considered sample selection issue with cross-sectional data only \((i.e., T=1)\). For a generalization of Robinson (1988) assumption to obtain \(E(\cdot | \cdot )\), see Kong and Hsiao (2025).
 
3
Our limited experience did not favor a time-varying parameter approach (e.g.,Wan et al. 2021).
 
4
For models where treatment might affect the parameters of the data-generating process, e.g., See Gao (2024).
 
5
Bai and Ng (2002) have suggested using the information criterion to identify the factor dimension r, also see Hsiao et al. (2021).
 
6
In this paper, we only consider projecting \(y_{1t}\) on \(\varvec{\tilde{y}}_t\) given by (3.7). An alternative is to project \(y_{1t}\) on its past values (referred to as vertical regression and horizontal regression, respectively, by Athey et al. 2021). For a synthesis, see Shen et al. (2023), Hsiao et al. (2025).
 
7
As discussed in subsection 4b, the PDA approach can be considered as a special case of N fixed and \(T\rightarrow \infty \).
 
8
It should be cautioned that such approach may miss some important causal factors that drive the outcomes.
 
9
As a matter of fact, Chen (2023)’s comparison is based on the Abadie et al. (2010) synthetic control method (SCM). However, as shown by Wan et al. (2018), the SCM is equivalent to imposing the prior restrictions of no intercept and all the slope coefficients are nonnegative and sum to 1. As discussed in Sect. 1, the construction of counterfactuals is a prediction problem. Any variable that helps prediction can be used irrespective to their relations to the outcome variables. Wan et al. (2018)’s simulation results of SCM vs. LP show that LP dominates in most occasions. So by the transversality argument, we expect Chen’s results also hold for the LP approach.
 
Literature
go back to reference Abadie A, Diamond A, Hainmueller J (2010) Synthetic control methods for comparative case studies: estimating the effect of California’s Tobacco Control Program. J Am Stat Assoc 105:493–505 Abadie A, Diamond A, Hainmueller J (2010) Synthetic control methods for comparative case studies: estimating the effect of California’s Tobacco Control Program. J Am Stat Assoc 105:493–505
go back to reference Abadie A, Diamond A, Hainmueller J (2015) Comparative politics and the synthetic control method. Am J Polit Sci 59:495–510 Abadie A, Diamond A, Hainmueller J (2015) Comparative politics and the synthetic control method. Am J Polit Sci 59:495–510
go back to reference Anderson TW, Rubin H (1956), Statistical inference in factor analysis. In: Neyman J (ed) Proceedings of the 3rd Berkeley symposium on mathematical statistics and probability, vol 5. Berkeley, pp. 111–150 Anderson TW, Rubin H (1956), Statistical inference in factor analysis. In: Neyman J (ed) Proceedings of the 3rd Berkeley symposium on mathematical statistics and probability, vol 5. Berkeley, pp. 111–150
go back to reference Ahn H, Powell JL (1993) Semiparametric estimation of censored selection models with a nonparametric selection mechanism. J Econom 58:3–30 Ahn H, Powell JL (1993) Semiparametric estimation of censored selection models with a nonparametric selection mechanism. J Econom 58:3–30
go back to reference Athey S, Bayati M, Doudchenko N, Imbens G, Khosravi K (2021) Matrix completion methods for causal panel data models. J Am Stat Assoc 116:1716–1730 Athey S, Bayati M, Doudchenko N, Imbens G, Khosravi K (2021) Matrix completion methods for causal panel data models. J Am Stat Assoc 116:1716–1730
go back to reference Bai J (2003) Inferential theory for factor models of large dimensions. Econometrica 71(1):135–171 Bai J (2003) Inferential theory for factor models of large dimensions. Econometrica 71(1):135–171
go back to reference Bai J (2009) Panel data models with interactive fixed effects. Econometrica 77:1229–1279 Bai J (2009) Panel data models with interactive fixed effects. Econometrica 77:1229–1279
go back to reference Bai J, Ng S (2002) Determining the number of factors in approximate factor models. Econometrica 70:191–221 Bai J, Ng S (2002) Determining the number of factors in approximate factor models. Econometrica 70:191–221
go back to reference Box GEP, Jenkins GM (1970) Time series analysis, forecasting and control. Holden-Day Inc., San Francisco Box GEP, Jenkins GM (1970) Time series analysis, forecasting and control. Holden-Day Inc., San Francisco
go back to reference Cameron AC, Trivedi P (2005) Microeconometrics. Cambridge University Press, Cambridge Cameron AC, Trivedi P (2005) Microeconometrics. Cambridge University Press, Cambridge
go back to reference Chamberlain G, Rothschild M (1983) Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica 51:1281–1304 Chamberlain G, Rothschild M (1983) Arbitrage, factor structure, and mean-variance analysis on large asset markets. Econometrica 51:1281–1304
go back to reference Chen X (2007) Large sample sieve estimation of semiparametric models. In: Heckman JJ, Leamer E (eds) Handbook of econometrics, vol 6. North Holland, Amsterdam, pp 5549–5632 Chen X (2007) Large sample sieve estimation of semiparametric models. In: Heckman JJ, Leamer E (eds) Handbook of econometrics, vol 6. North Holland, Amsterdam, pp 5549–5632
go back to reference Chen J (2023) Synthetic control as online linear regression. Econometrica 9:465–491 Chen J (2023) Synthetic control as online linear regression. Econometrica 9:465–491
go back to reference Connor G, Korajczyk R (1986) Performance measurement with the arbitrage pricing theory: a new framework for analysis. J Financ Econ 15:373–394 Connor G, Korajczyk R (1986) Performance measurement with the arbitrage pricing theory: a new framework for analysis. J Financ Econ 15:373–394
go back to reference Damrongplasit K (2009) Evaluation of Malaysian capital controls in the short, medium and long runs. Singap Econ Rev 54:233–247 Damrongplasit K (2009) Evaluation of Malaysian capital controls in the short, medium and long runs. Singap Econ Rev 54:233–247
go back to reference Damrongplasit K, Hsiao C, Zhao X (2010) Decriminalization policy and marijuana smoking prevalence: evidence from Australia. J Bus Econ Stat 28:344–356 Damrongplasit K, Hsiao C, Zhao X (2010) Decriminalization policy and marijuana smoking prevalence: evidence from Australia. J Bus Econ Stat 28:344–356
go back to reference Forni M, Hallin M, Lippi M, Reichlin L (1998) The generalized dynamic-factor model: identification and estimation. Rev Econ Stat 82:540–554 Forni M, Hallin M, Lippi M, Reichlin L (1998) The generalized dynamic-factor model: identification and estimation. Rev Econ Stat 82:540–554
go back to reference Fujiki H, Hsiao C (2015) Disentangle multiple treatment effects—measuring the net economic impact of the 1995 Great Hanshin–Awaji earthquake. J Econom 186:66–73 Fujiki H, Hsiao C (2015) Disentangle multiple treatment effects—measuring the net economic impact of the 1995 Great Hanshin–Awaji earthquake. J Econom 186:66–73
go back to reference Gao M (2024) Endogenous interference in randomized experiments. Mimeo Gao M (2024) Endogenous interference in randomized experiments. Mimeo
go back to reference Heckman JJ (1979) Sample selection bias as a specification error. Econometrica 47:153–161 Heckman JJ (1979) Sample selection bias as a specification error. Econometrica 47:153–161
go back to reference Heckman JJ, Vytlacil E (2001). In: Hsiao C, Morimune K, Powell JL (eds) Local Instrumental variables in nonlinear statistical inference. Cambridge University Press, New York, pp 1–46 Heckman JJ, Vytlacil E (2001). In: Hsiao C, Morimune K, Powell JL (eds) Local Instrumental variables in nonlinear statistical inference. Cambridge University Press, New York, pp 1–46
go back to reference Heckman JJ, Vytlacil EJ (2007) Econometric evaluation of social programs in handbook of econometrics, vol 6B. North-Holland, Amsterdam Heckman JJ, Vytlacil EJ (2007) Econometric evaluation of social programs in handbook of econometrics, vol 6B. North-Holland, Amsterdam
go back to reference Honoré BE (1992) Trimmed LAD and least square estimation of truncated and censored regression model with fixed effects. Econometrica 63:1133–1159 Honoré BE (1992) Trimmed LAD and least square estimation of truncated and censored regression model with fixed effects. Econometrica 63:1133–1159
go back to reference Honoré BE, Kyriazidou E (2000a) Panel data discret choice models with lagged dependent variables. Econometrica 68:839–874 Honoré BE, Kyriazidou E (2000a) Panel data discret choice models with lagged dependent variables. Econometrica 68:839–874
go back to reference Hood WC, Koopmans TC (eds) (1953) Studies in econometric method, vol 14. Cowles foundation monographs. Wiley, New York Hood WC, Koopmans TC (eds) (1953) Studies in econometric method, vol 14. Cowles foundation monographs. Wiley, New York
go back to reference Hsiao C (1983). In: Griliches Z, Intriligator M (eds) Identification, handbook of econometrics, vol 1. Elsevier, Amsterdam, pp 223–283 Hsiao C (1983). In: Griliches Z, Intriligator M (eds) Identification, handbook of econometrics, vol 1. Elsevier, Amsterdam, pp 223–283
go back to reference Hsiao C (2021) Some thoughts on prediction in the presence of big data. China J Econom 1:1–16 (in Chinese) Hsiao C (2021) Some thoughts on prediction in the presence of big data. China J Econom 1:1–16 (in Chinese)
go back to reference Hsiao C (2022) Analysis of panel data, 4th edn. Cambridge University Press, Cambridge Hsiao C (2022) Analysis of panel data, 4th edn. Cambridge University Press, Cambridge
go back to reference Hsiao C (2024) Machine learning and econometrics. Singap Econ Rev (forthcoming) Hsiao C (2024) Machine learning and econometrics. Singap Econ Rev (forthcoming)
go back to reference Hsiao C, Zhao Z (2000) Combining opinion surveys with time series data to forecast Japanese economy. Jpn Econ Rev 51:155–169 Hsiao C, Zhao Z (2000) Combining opinion surveys with time series data to forecast Japanese economy. Jpn Econ Rev 51:155–169
go back to reference Hsiao C, Zhou Q (2019) Panel Parametric, semiparametric, and nonparametric construction of counterfactuals. J Appl Econom 34(4):463–481 Hsiao C, Zhou Q (2019) Panel Parametric, semiparametric, and nonparametric construction of counterfactuals. J Appl Econom 34(4):463–481
go back to reference Hsiao C, Ching S, Wan SK (2012) A panel data approach for program evaluation: measuring the benefits of political and economic integration of Hong Kong with mainland China. J Appl Econom 27:705–740 Hsiao C, Ching S, Wan SK (2012) A panel data approach for program evaluation: measuring the benefits of political and economic integration of Hong Kong with mainland China. J Appl Econom 27:705–740
go back to reference Hsiao C, Xie Y, Zhou Q (2021) Factor dimension determination for panel interactive effects models: an orthogonal projection approach. Comput Stat 36:1481–1497 Hsiao C, Xie Y, Zhou Q (2021) Factor dimension determination for panel interactive effects models: an orthogonal projection approach. Comput Stat 36:1481–1497
go back to reference Hsiao C, Shi Z, Zhou Q (2022a) Transformed estimation for panel interactive effects models. J Bus Econ Stat 40:1831–1848 Hsiao C, Shi Z, Zhou Q (2022a) Transformed estimation for panel interactive effects models. J Bus Econ Stat 40:1831–1848
go back to reference Hsiao C, Shen Y, Zhou Q (2022b) Multiple treatment effects in panel-heterogeneity and aggregation. Adv Econom 43B:81–101 (Essays in honor of Hashem Pesaran) Hsiao C, Shen Y, Zhou Q (2022b) Multiple treatment effects in panel-heterogeneity and aggregation. Adv Econom 43B:81–101 (Essays in honor of Hashem Pesaran)
go back to reference Hsiao C, Zhou Q (2024) Panel treatment effects measurement-factor or linear projection modeling? J Appl Econom (forthcoming) Hsiao C, Zhou Q (2024) Panel treatment effects measurement-factor or linear projection modeling? J Appl Econom (forthcoming)
go back to reference Hsiao C, Kong J, Xie Y, Zhou QK (2025) Horizontal or vertical regression to construct counterfactuals. Mimeo Hsiao C, Kong J, Xie Y, Zhou QK (2025) Horizontal or vertical regression to construct counterfactuals. Mimeo
go back to reference Imbens GW, Angrist JD (1994) Identification and estimation of local average treatment effects. Econometrica 62:467–475 Imbens GW, Angrist JD (1994) Identification and estimation of local average treatment effects. Econometrica 62:467–475
go back to reference Imbens GW, Lemieux T (2008) Regression discontinuity designs: a guide to practice. J Econom 142:615–635 Imbens GW, Lemieux T (2008) Regression discontinuity designs: a guide to practice. J Econom 142:615–635
go back to reference Ke X, Hsiao C (2022) Economic impact of the most drastic lockdown during COVID-19 pandemic—the experience of Hubei, China. J Appl Econom 37:187–209 Ke X, Hsiao C (2022) Economic impact of the most drastic lockdown during COVID-19 pandemic—the experience of Hubei, China. J Appl Econom 37:187–209
go back to reference Ke X, Hsiao C (2023) Data subject to multiple treatment effects-disentangle the impacts of global pandemic and a specific disease control policy. Singap Econ Rev 68:1507–1527 Ke X, Hsiao C (2023) Data subject to multiple treatment effects-disentangle the impacts of global pandemic and a specific disease control policy. Singap Econ Rev 68:1507–1527
go back to reference Ke X, Chen H, Hong Y, Hsiao C (2017) Do China’s high speed rail projects promote local economy? New evidence from a panel data approach. China Econ Rev 44:203–226 Ke X, Chen H, Hong Y, Hsiao C (2017) Do China’s high speed rail projects promote local economy? New evidence from a panel data approach. China Econ Rev 44:203–226
go back to reference Kong J, Hsiao C (2025) Panel sample selection model with interactive effects. Mimeo Kong J, Hsiao C (2025) Panel sample selection model with interactive effects. Mimeo
go back to reference Lawley DN, Maxwell AE (1971) Factor analysis as a statistical method, 2nd edn. Butterworths, London Lawley DN, Maxwell AE (1971) Factor analysis as a statistical method, 2nd edn. Butterworths, London
go back to reference Li KT, Bell D (2017) Estimation of average treatment effects with panel data: asymptotic theory and implementation. J. Econom. 197:65–75 Li KT, Bell D (2017) Estimation of average treatment effects with panel data: asymptotic theory and implementation. J. Econom. 197:65–75
go back to reference Li Q, Racine JS (2007) Nonparametric econometrics. Princeton University Press, Princeton Li Q, Racine JS (2007) Nonparametric econometrics. Princeton University Press, Princeton
go back to reference Linton O, Maasoumi E, Whang YJ (2005) Consistant testing for stochastic dominance—a subsampling approach. Rev Econ Stud 72:735–765 Linton O, Maasoumi E, Whang YJ (2005) Consistant testing for stochastic dominance—a subsampling approach. Rev Econ Stud 72:735–765
go back to reference Lucas L (1976) Econometric policy evaluation: a critique. In: Brumer K, Meltzer A (eds) The Phillips curve and labor market—Carnegie Rochester conference seminar, public policy, I. Elsevier, New York, pp 19–46 Lucas L (1976) Econometric policy evaluation: a critique. In: Brumer K, Meltzer A (eds) The Phillips curve and labor market—Carnegie Rochester conference seminar, public policy, I. Elsevier, New York, pp 19–46
go back to reference Maasoumi E, Wang L (2019) The gender gap between earning distributions. J Polit Econ 127:2438–2504 Maasoumi E, Wang L (2019) The gender gap between earning distributions. J Polit Econ 127:2438–2504
go back to reference Maasoumi E, Wang L (2022) Women’s counterfactual earnings distributions, in essays in honor of M.H. Pesaran, ed. A. Chudik, C. Hsiao and A. Timmermann. Adv Econom 43B:229–252 Maasoumi E, Wang L (2022) Women’s counterfactual earnings distributions, in essays in honor of M.H. Pesaran, ed. A. Chudik, C. Hsiao and A. Timmermann. Adv Econom 43B:229–252
go back to reference Newey W (1997) Convergence rates and asymptotic normality for series estimators. J Econom 79:147–168 Newey W (1997) Convergence rates and asymptotic normality for series estimators. J Econom 79:147–168
go back to reference Pesaran MH (2006) Estimation and inference in large heterogeneous panels with cross section dependence. Econometrica 74:967–1012 Pesaran MH (2006) Estimation and inference in large heterogeneous panels with cross section dependence. Econometrica 74:967–1012
go back to reference Pesaran MH, Yang CF (2022) Matching theory and evidence on Covid-19 using a stochastic network SIR model. J Appl Econom 37:1204–1229 Pesaran MH, Yang CF (2022) Matching theory and evidence on Covid-19 using a stochastic network SIR model. J Appl Econom 37:1204–1229
go back to reference Robinson PM (1988) Root-N-consistent semiparametric regression. Econometrica 56:931–954 Robinson PM (1988) Root-N-consistent semiparametric regression. Econometrica 56:931–954
go back to reference Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55 Rosenbaum PR, Rubin DB (1983) The central role of the propensity score in observational studies for causal effects. Biometrika 70:41–55
go back to reference Rosenbaum PR, Rubin DB (1985) Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 39:33–38 Rosenbaum PR, Rubin DB (1985) Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat 39:33–38
go back to reference Ross S (1976) The arbitrage theory of capital asset pricing. J Econ Theory 13:341–360 Ross S (1976) The arbitrage theory of capital asset pricing. J Econ Theory 13:341–360
go back to reference Sargent TJ, Sims CA (1977) Business cycle modeling without pretending to have too much a priori economic theory. Federal Reserve Bank of Minneapolis, Minneapolis Sargent TJ, Sims CA (1977) Business cycle modeling without pretending to have too much a priori economic theory. Federal Reserve Bank of Minneapolis, Minneapolis
go back to reference Shen D, Ding P, Sekhon J, Yu B (2023) Same root different leaves: time series and cross-sectional methods in panel data. Econometrica 91:2125–2154 Shen D, Ding P, Sekhon J, Yu B (2023) Same root different leaves: time series and cross-sectional methods in panel data. Econometrica 91:2125–2154
go back to reference Sickles RC (2005) Panel estimators and the identification of firm-specific efficiency levels in parametric, semi-parametric and non-parametric settings. J Econom 126:305–334 Sickles RC (2005) Panel estimators and the identification of firm-specific efficiency levels in parametric, semi-parametric and non-parametric settings. J Econom 126:305–334
go back to reference Stock JH, Watson MW (1989) New indexes of coincident and leading economic indicators, NBER Chapters. In: NBER macroeconomics annual volume 4. National Bureau of Economic Research, Inc., pp 351–409 Stock JH, Watson MW (1989) New indexes of coincident and leading economic indicators, NBER Chapters. In: NBER macroeconomics annual volume 4. National Bureau of Economic Research, Inc., pp 351–409
go back to reference Stock JH, Watson MW (2002) Forecasting using principal components from a large number of predictors. J Am Stat Assoc 97:1167–1179 Stock JH, Watson MW (2002) Forecasting using principal components from a large number of predictors. J Am Stat Assoc 97:1167–1179
go back to reference Stone CJ (1980) Optimal rates of convergence for nonparametric estimators. Ann Stat 8:1348–1360 Stone CJ (1980) Optimal rates of convergence for nonparametric estimators. Ann Stat 8:1348–1360
go back to reference Theil H (1958) Economic forecasts and policy. North Holland, Amsterdam Theil H (1958) Economic forecasts and policy. North Holland, Amsterdam
go back to reference Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288 Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288
go back to reference Wan SK, Xie Y, Hsiao C (2018) Panel data approach vs synthetic control method. Econ Lett 164:121–123 Wan SK, Xie Y, Hsiao C (2018) Panel data approach vs synthetic control method. Econ Lett 164:121–123
go back to reference Wan S, Hsiao C, Zhou QK (2021) Can a time-varying structure provide a more robust panel construction of counterfactuals—straitjackets? Empir Econ 60:113–129 Wan S, Hsiao C, Zhou QK (2021) Can a time-varying structure provide a more robust panel construction of counterfactuals—straitjackets? Empir Econ 60:113–129
go back to reference Xu Y (2017) Generalized synthetic control method: causal inference with interactive fixed effects model. Polit Anal 25(1):57–76 Xu Y (2017) Generalized synthetic control method: causal inference with interactive fixed effects model. Polit Anal 25(1):57–76
Metadata
Title
A selective review of panel approaches to construct counterfactuals
Author
Cheng Hsiao
Publication date
24-04-2025
Publisher
Springer Berlin Heidelberg
Published in
Empirical Economics
Print ISSN: 0377-7332
Electronic ISSN: 1435-8921
DOI
https://doi.org/10.1007/s00181-025-02738-9

Premium Partner