Skip to main content
Erschienen in: Political Behavior 4/2017

03.12.2016 | Original Paper

At the Nexus of Observational and Experimental Research: Theory, Specification, and Analysis of Experiments with Heterogeneous Treatment Effects

verfasst von: Cindy D. Kam, Marc J. Trussler

Erschienen in: Political Behavior | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Experimentalists are increasingly examining heterogeneous treatment effects, in which observed individual-level characteristics are hypothesized to moderate an experimental treatment effect. Such work places researchers at the nexus of experimental and observational approaches. In this paper, we discuss the theoretical and statistical issues that can arise in testing such hypotheses. We note that inclusion of an observed (as opposed to randomly-assigned) moderator introduces the possibility of confounds that are commonplace in observational data analysis but too-easily ignored in experimental data analysis. We simulate several different data generating processes that include heterogeneous treatment effects, and we discuss the implications of various statistical models. We aim to provide researchers who examine heterogeneous treatment effects with background and advice that enable them to identify where common issues may arise and to develop research designs and implement statistical tests that will mitigate them.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Hypotheses most commonly examine whether, on average, the treatment is zero for the units in the study. Tests of the sharp null (that the treatment effect is zero for all units in the study) are beyond the scope of this paper (for an instructive distinction between the two, see Keele 2015).
 
2
Second-generation questions include not only questions of moderation but also of mediation. Through mediation analysis, researchers attempt to establish the degree to which a given treatment operates through a particular causal mechanism (i.e., see Baron and Kenny 1986 for a clear exposition). Questions of moderation can examine how observed, longstanding predispositions that are not affected by the treatment (and are often measured pre-treatment) causally affect the magnitude of the treatment effect. Questions of mediation, in contrast, may be approached by examining how the treatment affects some observed variable that is measured post-treatment (i.e., a psychological attitude) that subsequently affects the key outcome variable of interest; or questions of mediation may be approached by embedding the proposed mediator in the experimental design. Mediation analysis has received its own fair share of attention from political scientists recently (i.e., Imai et al. 2011, 2013; Bullock et al. 2010) and lies outside the scope of this paper.
 
3
Bivariate OLS (regressing y on T) returns an unbiased estimate of the average treatment effect, even if the actual treatment effect varies across units. We will return to this point later when we walk through our scenarios.
 
4
The journals are: American Journal of Political Science, American Political Science Review, Journal of Politics, Political Behavior, and Political Psychology.
 
5
This was defined as any effort to look at the conditional effect of an experimental treatment across a third observed variable, regardless of statistical test used.
 
6
Values do not sum to 100% as only the most frequently used moderators are included.
 
7
Researchers have developed various other techniques to uncover heterogeneous treatment effects. Two-way (m)ANOVA and regression analysis yield identical hypothesis tests, but (m)ANOVA does not as easily lend itself to discussions of the substantive magnitude of heterogeneous treatment effects or the calculation of marginal effects (or statistical tests therein) across heterogeneous groups. Split-sample approaches involve comparisons of means or regression. Comparison of means across different subgroups allows for a more flexible specification for the moderation effect in comparison to the standard OLS method which assumes linearity (Hainmueller et al. 2016). However, it does not allow the researcher to simultaneously model multiple moderating relationships as OLS does. Split-sample regression (that is, splitting the sample by categories of the proposed moderator) can conveniently display differential treatment effects across different groups of units, which can be informative, but does not as conveniently facilitate statistical tests of different treatment effects compared with pooled-sample regression with an interaction (Kam and Franzese 2007). Moreover, splitting the sample by groups is non-intuitive for continuous variables (Green and Kern 2012), and it quickly becomes inefficient as the number of groups increases (and the number of units in each group decreases) (Horiuchi et al. 2007). In addition to these regression-based approaches, Horiuchi et al. (2007) and Green and Kern (2012) each provide Bayesian approaches to estimating heterogeneous treatment effects, but these approaches are well beyond the scope of this paper, and neither directly addresses our core concern regarding confounders.
 
8
Mutz (2011) notes that “A central advantage of experiments is that analysis is typically simpler and more elegant than for observational data” (p. 126).
 
9
We also note that these moderators are typically (and preferably) measured prior to the delivery of the stimulus or they are assumed (or already shown) to be stable across time and thereby uncontaminated by the experiment. Such decisions about when to measure moderators are beyond the scope of this paper; for a lively discussion on this topic, see Huber and Lapinski (2006) and Mendelberg (2008).
 
10
These 100 articles had a median sample size of 293 participants and a mean of 887. As the difference in the median and mean indicates, the data are heavily skewed by several large N field studies.
 
11
Split sample, in this case, refers to those analyses that run separate tests for the significance of the treatment on a dependent variable for different subgroups without testing for the difference between those estimates. Nine of twelve cases used OLS for their main analysis, partitioning the sample across subgroups. Three articles simply displayed differences in means for different subgroups of the sample.
 
12
For ease of exposition, our example only has one treatment group. The lessons easily extend to multiple treatment groups.
 
13
It is worth noting an important difference between our framework and that of Freedman (2008). Freedman (2008) specifies a finite sample of units in the study that constitute the population of interest and that are not randomly drawn from a superpopulation. He also specifies that the only source of variability is the randomization to treatment. Using this finite-sample approach, Freedman (2008) notes that regression analysis on experimental data with covariate adjustment is biased in small samples that have unequally sized treatment and control groups. We note that Freedman’s (2008) finite-sample view diverges from our approach to analysis of experimental data. Green (2009), “Rarely do political scientists take the view that their experimental subjects exhaust the population of observations that is governed by a given set of parameters” (21). We use an infinite sampling approach, considering that any given dataset is but one of an infinite set of possible instantiations. With this framework, we can examine the properties of estimators in expectation. It is also worth mentioning that even if one adopts Freedman’s finite-sample approach, Green (2009) notes that Freedman’s warnings about bias are “negligible” for studies that are drawn from populations greater than 20—a floor condition that nearly all experiments in political science exceed.
 
14
In finite samples, imbalanced treatment and control conditions may introduce their own set of issues such as “tyranny of the majority” which can be addressed through weighted least squares (Lin 2013).
 
15
εi ~ N(0,4).
 
16
A script file containing replication code is available in the Political Behavior Dataverse.
 
17
In the 1000 simulations, the average R2, or squared correlation between Xβ (the true signal) and y (signal plus noise), ranges from 0.20 (in Case 1) to 0.49 (Case 3). This is a reasonable degree of noisiness for social-science data.
 
18
See Note 13 on how our approach differs from that of Freedman (2008).
 
19
Note that we also include the interaction between M and T, but that is irrelevant to the point here.
 
20
On this point, the Variance Inflation Factor can inform the degree to which a particular covariate (or its interaction) imposes an efficiency cost. There is a degrees-of-freedom cost to the inclusion of covariates but such degrees of freedom issues are trivial in most experiments, where N ≫ k.
 
21
If the covariate (M) is mean-centered before being interacted with the treatment, then a regression of y on T, mean-centered M, and their interaction will allow bT to reveal the treatment effect at the covariate’s sample mean, not at the value where M happens to equal zero, and some of the issues of bias and inefficiency in bT are remedied. Since, in practice, most applied researchers do not mean-center their covariates, we have not done so here.
 
22
In Eq. (2), δy/δM = βM + βMTT = 0.5 + 0.5*T. βM- reflects the effect of M when T = 0, and βMT indicates how the effect of M changes when T = 1. When the interaction term is omitted, βM returns the weighted average effect of M, which in this scenario is the average of 0.5 (the effect of M in the control condition) and 1 (the effect of M in the treatment condition). Note the average estimate of βM is ~0.75 in Table 2, model (b).
 
23
The extent of the efficiency loss for bMT depends upon the correlation between the two moderators (M and Z). In supplemental simulations (not shown), we varied Corr(M,Z) from 0 to 0.9. When the two moderators were not all correlated (Corr(M, Z) = 0), model(d) yields 20 simulations (of 1000) where the null hypothesis (βMT = 0) would be rejected, and model(e) yields exactly the same number where the null hypothesis would be rejected. As Corr(M, Z) rises, the likelihood of making a Type II error (rejecting the null hypothesis that bMT = 0 when there actually is an effect) rises. In the text, where Corr(M, Z) = 0.5, and 97.7% of the 1000 simulations correctly reject the null when model (d) is used, whereas only 92.7% of the 1000 simulations correctly reject the null when model(e) is used. At the extreme, where Corr(M,Z) = 0.9, 97.2% of the 1000 simulations correctly reject the null when model(d) is used, whereas only 40.2% of the simulations correctly reject the null when model(e) is used. In short, when an extraneous interaction between Z and T is included, the more correlated Z and M are, the more likely it is that the researcher will incorrectly infer a lack of a moderating effect due to the collinearity-induced inflation of the standard errors.
 
24
The estimates for βM and βZ in model (b) are the weighted averages for the effects of M in the control and treatment (~0.75) and the effects of Z in the control and treatment (~1.125).
 
25
If the conditional expectation function is not linear, regression adjustment with covariates can be mildly biased, though consistent. We thank an anonymous reviewer for this point.
 
26
As with OLS, matching will have no effect on the estimate of the ATE as long as the matching scheme is unrelated to assignment to treatment.
 
27
See Imai et al. (2008) for discussion of the Sample Average Treatment Effect.
 
28
Researchers may argue that the meaning of particular measures may vary by sample. Party identification, for example, may be less crystallized among a convenience sample of undergraduates than among a general population sample, and therefore a moderated treatment effect in one sample may not generalize to a moderated treatment effect in another. This sort of theorizing suggests finer-grained conceptualization of what exactly the appropriate moderator should be, along with finer-grained measures to tap that reconceptualised moderator.
 
Literatur
Zurück zum Zitat Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182.CrossRef Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182.CrossRef
Zurück zum Zitat Berry, W. D., Golder, M., & Milton, D. (2012). Improving tests of theories positing interaction. Journal of Politics, 74(3), 653–671.CrossRef Berry, W. D., Golder, M., & Milton, D. (2012). Improving tests of theories positing interaction. Journal of Politics, 74(3), 653–671.CrossRef
Zurück zum Zitat Bullock, J. G., Green, D. P., & Shang, E. Ha. (2010). Yes, but what’s the mechanism? (Don’t expect an easy answer). Journal of Personality and Social Psychology, 98(4), 550–558.CrossRef Bullock, J. G., Green, D. P., & Shang, E. Ha. (2010). Yes, but what’s the mechanism? (Don’t expect an easy answer). Journal of Personality and Social Psychology, 98(4), 550–558.CrossRef
Zurück zum Zitat Ding, P., Feller, A., & Miratrix, L. (2016). Randomization inference for treatment effect variation. Journal of the Royal Statistical Society: Series B, 78, 655–671.CrossRef Ding, P., Feller, A., & Miratrix, L. (2016). Randomization inference for treatment effect variation. Journal of the Royal Statistical Society: Series B, 78, 655–671.CrossRef
Zurück zum Zitat Druckman, J. N., & Kam, C. D. (2011). Students as experimental participants: A defense of the ‘narrow data base’. In J. N. Druckman, D. P. Green, J. H. Kuklinski, & A. Lupia (Eds.), Cambridge handbook of experimental political science (pp. 41–57). Cambridge: Cambridge University Press.CrossRef Druckman, J. N., & Kam, C. D. (2011). Students as experimental participants: A defense of the ‘narrow data base’. In J. N. Druckman, D. P. Green, J. H. Kuklinski, & A. Lupia (Eds.), Cambridge handbook of experimental political science (pp. 41–57). Cambridge: Cambridge University Press.CrossRef
Zurück zum Zitat Freedman, D. A. (2008). On regression adjustments to experimental data. Advances in Applied Mathematics., 40(2), 180–193.CrossRef Freedman, D. A. (2008). On regression adjustments to experimental data. Advances in Applied Mathematics., 40(2), 180–193.CrossRef
Zurück zum Zitat Gerber, A. S., & Green, D. P. (2012). Field experiments: Design, analysis, and interpretation. New York: W.W. Norton. Gerber, A. S., & Green, D. P. (2012). Field experiments: Design, analysis, and interpretation. New York: W.W. Norton.
Zurück zum Zitat Green, D. P. (2009). Regression adjustments to experimental data: do david freedman’s concerns apply to political science? In Presented at the 26th annual meeting of the society for political methodology, Yale University. Green, D. P. (2009). Regression adjustments to experimental data: do david freedman’s concerns apply to political science? In Presented at the 26th annual meeting of the society for political methodology, Yale University.
Zurück zum Zitat Green, D. P., & Aronow, P. M. (2011). Analyzing experimental data using regression: When is bias a practical concern? Working paper, Yale University. Green, D. P., & Aronow, P. M. (2011). Analyzing experimental data using regression: When is bias a practical concern? Working paper, Yale University.
Zurück zum Zitat Green, D. P., & Kern, H. L. (2012). Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Public Opinion Quarterly, 76(3), 491–511.CrossRef Green, D. P., & Kern, H. L. (2012). Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Public Opinion Quarterly, 76(3), 491–511.CrossRef
Zurück zum Zitat Horiuchi, Y., Imai, K., & Taniguchi, N. (2007). Designing and analysing randomized experiments: Application to a Japanese election survey experiment. American Journal of Political Science, 51(3), 669–687.CrossRef Horiuchi, Y., Imai, K., & Taniguchi, N. (2007). Designing and analysing randomized experiments: Application to a Japanese election survey experiment. American Journal of Political Science, 51(3), 669–687.CrossRef
Zurück zum Zitat Huber, G. A., & Lapinski, J. S. (2006). The ‘race card’ revisited: assessing racial priming in policy contests. American Journal of Political Science, 50(2), 421–440.CrossRef Huber, G. A., & Lapinski, J. S. (2006). The ‘race card’ revisited: assessing racial priming in policy contests. American Journal of Political Science, 50(2), 421–440.CrossRef
Zurück zum Zitat Imai, K., Keele, L., Tingley, D., & Yamamoto, T. (2011). Unpacking the black box of causality: Learning about causal mechanisms from experimental and observational studies. American Political Science Review, 105(4), 765–789.CrossRef Imai, K., Keele, L., Tingley, D., & Yamamoto, T. (2011). Unpacking the black box of causality: Learning about causal mechanisms from experimental and observational studies. American Political Science Review, 105(4), 765–789.CrossRef
Zurück zum Zitat Imai, K., King, G., & Stuart, E. (2008). Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society Series A, 171(2), 481–502.CrossRef Imai, K., King, G., & Stuart, E. (2008). Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society Series A, 171(2), 481–502.CrossRef
Zurück zum Zitat Imai, K., & Ratkovic, M. (2013). Estimating treatment effect heterogeneity in randomized program evaluation. The Annals of Applied Statistics, 7(1), 443–470.CrossRef Imai, K., & Ratkovic, M. (2013). Estimating treatment effect heterogeneity in randomized program evaluation. The Annals of Applied Statistics, 7(1), 443–470.CrossRef
Zurück zum Zitat Imai, K., & Strauss, A. (2011). Estimation of heterogeneous treatment effects from randomized experiments with application to the optimal planning of the get-out-the-vote campaign. Political Analysis, 19(1), 1–19.CrossRef Imai, K., & Strauss, A. (2011). Estimation of heterogeneous treatment effects from randomized experiments with application to the optimal planning of the get-out-the-vote campaign. Political Analysis, 19(1), 1–19.CrossRef
Zurück zum Zitat Imiai, K., Tingley, D., & Yamamoto, T. (2013). Experimental designs for identifying causal mechanisms. Journal of the Royal Statistical Society, 176(1), 5–51.CrossRef Imiai, K., Tingley, D., & Yamamoto, T. (2013). Experimental designs for identifying causal mechanisms. Journal of the Royal Statistical Society, 176(1), 5–51.CrossRef
Zurück zum Zitat Kam, C. D., & Franzese, R. J., Jr. (2007). Modeling and interpreting interactive hypotheses in regression analysis. Ann Arbor: University of Michigan Press. Kam, C. D., & Franzese, R. J., Jr. (2007). Modeling and interpreting interactive hypotheses in regression analysis. Ann Arbor: University of Michigan Press.
Zurück zum Zitat Keele, L. (2015). The statistics of causal inference: A view from political methodology. Political Analysis, 23(3), 313–335.CrossRef Keele, L. (2015). The statistics of causal inference: A view from political methodology. Political Analysis, 23(3), 313–335.CrossRef
Zurück zum Zitat Lin, W. (2013). Agnostic notes on regression adjustments to experimental data: Reexamining freedman’s critique. The Annals of Applied Statistics, 7(1), 295–318.CrossRef Lin, W. (2013). Agnostic notes on regression adjustments to experimental data: Reexamining freedman’s critique. The Annals of Applied Statistics, 7(1), 295–318.CrossRef
Zurück zum Zitat Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective (Vol. 1). New York: Psychology Press. Maxwell, S. E., & Delaney, H. D. (2004). Designing experiments and analyzing data: A model comparison perspective (Vol. 1). New York: Psychology Press.
Zurück zum Zitat Mendelbeg, T. (2008). Racial priming revived. Perspectives on Politics, 1(1), 109–123. Mendelbeg, T. (2008). Racial priming revived. Perspectives on Politics, 1(1), 109–123.
Zurück zum Zitat Mutz, D. C. (2011). Population-based survey experiments. Princeton: Princeton University Press.CrossRef Mutz, D. C. (2011). Population-based survey experiments. Princeton: Princeton University Press.CrossRef
Zurück zum Zitat Neyman, J. (1923). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statistical Science, 5(4), 465–472.CrossRef Neyman, J. (1923). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statistical Science, 5(4), 465–472.CrossRef
Zurück zum Zitat Raab, G. M., Day, S., & Sales, J. (2000). How to select covariates to include in the analysis of a clinical trial. Controlled Clinical Trials, 21(4), 330–342.CrossRef Raab, G. M., Day, S., & Sales, J. (2000). How to select covariates to include in the analysis of a clinical trial. Controlled Clinical Trials, 21(4), 330–342.CrossRef
Zurück zum Zitat Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701.CrossRef Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701.CrossRef
Zurück zum Zitat Tajfel, H. (1970). Experiments in intergroup discrimination. Scientific American, 223, 996–1105.CrossRef Tajfel, H. (1970). Experiments in intergroup discrimination. Scientific American, 223, 996–1105.CrossRef
Metadaten
Titel
At the Nexus of Observational and Experimental Research: Theory, Specification, and Analysis of Experiments with Heterogeneous Treatment Effects
verfasst von
Cindy D. Kam
Marc J. Trussler
Publikationsdatum
03.12.2016
Verlag
Springer US
Erschienen in
Political Behavior / Ausgabe 4/2017
Print ISSN: 0190-9320
Elektronische ISSN: 1573-6687
DOI
https://doi.org/10.1007/s11109-016-9379-z

Weitere Artikel der Ausgabe 4/2017

Political Behavior 4/2017 Zur Ausgabe