Abstract
In the Regression Discontinuity (RD) design, units are assigned a treatment based on whether their value of an observed covariate is above or below a fixed cutoff. Under the assumption that the distribution of potential confounders changes continuously around the cutoff, the discontinuous jump in the probability of treatment assignment can be used to identify the treatment effect. Although a recent strand of the RD literature advocates interpreting this design as a local randomized experiment, the standard approach to estimation and inference is based solely on continuity assumptions that do not justify this interpretation. In this article, we provide precise conditions in a randomization inference context under which this interpretation is directly justified and develop exact finite-sample inference procedures based on them. Our randomization inference framework is motivated by the observation that only a few observations might be available close enough to the threshold where local randomization is plausible, and hence standard large-sample procedures may be suspect. Our proposed methodology is intended as a complement and a robustness check to standard RD inference approaches. We illustrate our framework with a study of two measures of party-level advantage in U.S. Senate elections, where the number of close races is small and our framework is well suited for the empirical analysis.
1 Introduction
Inference on the causal effects of a treatment is one of the basic aims of empirical research. In observational studies, where controlled experimentation is not available, applied work relies on quasi-experimental strategies carefully tailored to eliminate the effect of potential confounders that would otherwise compromise the validity of the analysis. Originally proposed by Thistlethwaite and Campbell [1], the regression discontinuity (RD) design has recently become one of the most widely used quasi-experimental strategies. In this design, units receive treatment based on whether their value of an observed covariate or “score” is above or below a fixed cutoff. The key feature of the design is that the probability of receiving the treatment conditional on the score jumps discontinuously at the cutoff, inducing variation in treatment assignment that is assumed to be unrelated to potential confounders. Imbens and Lemieux [2], Lee and Lemieux [3] and Dinardo and Lee [4] give recent reviews, including comprehensive lists of empirical examples.
The traditional inference approach in the RD design relies on flexible extrapolation (usually nonparametric curve estimation techniques) using observations near the known cutoff. This approach follows the work of Hahn et al. [5], who showed that, when placement relative to the cutoff completely determines treatment assignment, the key identifying assumption is that the conditional expectation of a potential outcome is continuous at the threshold. Intuitively, since nothing changes abruptly at the threshold other than the probability of receiving treatment, any jump in the conditional expectation of the outcome variable at the threshold is attributed to the effects of the treatment. Modern RD analysis employs local nonparametric curve estimation at either side of the threshold to estimate RD treatment effects, with local-linear regression being the preferred choice in most cases. See Porter [6], Imbens and Kalyanaraman [7] and Calonico et al. [8] for related theoretical results and further discussion.
Although not strictly justified by the standard framework, RD designs are routinely interpreted as local randomized experiments, where in a neighborhood of the threshold treatment status is considered as good as randomly assigned. Lee [9] first argued that if individuals are unable to precisely manipulate or affect their score, then variation in treatment near the threshold approximates a randomized experiment. This idea has been expanded in Lee and Lemieux [3] and Dinardo and Lee [4], where RD designs are described as the “close cousins” of randomized experiments. Motivated by this common interpretation, we develop a methodological framework for analyzing RD designs as local randomized experiments employing a randomization inference setup.[1] Characterizing the RD design in this way not only has intuitive appeal but also leads to an alternative way of conducting statistical inference. Building on Rosenbaum [14, 15], we propose a randomization inference framework to conduct exact finite-sample inference in the RD design that is most appropriate when the sample size in a narrow window around the cutoff – where local randomization is most plausible – is small. Small sample sizes are a common phenomenon in the analysis of RD designs, since the estimation of the treatment effect at the cutoff typically requires that observations far from the cutoff be given zero or little weight; this may constrain researchers’ ability to make inferences based on large-sample approximations. In order to increase the sample size, researchers often include observations far from the cutoff and engage in extrapolation. However, incorrect parametric extrapolation invalidates standard inferential approaches because point estimators, standard errors and test statistics will be biased. In such cases, if a local randomization assumption is plausible, our approach offers a valid alternative that minimizes extrapolation by relying only on the few closest observations to the cutoff. More generally, our methodological framework offers a complement and a robustness check to conventional RD procedures by providing a framework that requires minimal extrapolation and allows for exact finite-sample inference.
To develop our methodological framework, we first make precise a set of conditions under which RD designs are equivalent to local randomized experiments within a randomization inference framework. These conditions are strictly stronger than the usual continuity assumptions imposed in the RD literature, but similar in spirit to those imposed in Hahn et al. ([5], Theorem 2) for identification of heterogeneous treatment effects. The key assumption is that, for the given sample, there exists a neighborhood around the cutoff where a randomization-type condition holds. More generally, this assumption may be interpreted as an approximation device to the conventional continuity conditions that allows us to proceed as if only the few closest observations near the cutoff are randomly assigned. The plausibility of this assumption will necessarily be context-specific, requiring substantive justification and empirical support. Employing these conditions, we discuss how randomization inference tools may be used to conduct exact finite-sample inference in the RD context.
Our resulting empirical approach consists of two steps. The first step is choosing a neighborhood or window around the cutoff where treatment status is assumed to be as-if randomly assigned. We develop a data-driven, randomization-based window selection procedure based on “balance tests” of pre-treatment covariates and illustrate how this approach for window selection performs in our empirical illustration. The second step is to apply established randomization inference tools, given a hypothesized treatment assignment mechanism, to construct hypothesis tests, confidence intervals, and point estimates.
Our approach parallels the conventional nonparametric RD approach but makes a different tradeoff: our randomization assumption (constitutes an approximation that) is likely valid within a smaller neighborhood of the threshold than the one used in the flexible local polynomial approach, but allows for exact finite-sample inference in a setting where large-sample approximations may be poor. Both approaches involve choices for implementation: standard local polynomial RD estimation requires selecting (i) a bandwidth and (ii) a kernel and polynomial order, while for our approach researchers need to choose (i) the size of the window around the cutoff where randomization is plausible and (ii) a randomization mechanism and test statistic. As is well known in the literature, bandwidth selection is difficult and estimation results can be highly sensitive to its choice [8]. In our approach, selecting the window is also crucial, and researchers should pay special attention to how it is chosen. On the other hand, selecting a kernel and polynomial order is relatively less important, as is choosing a randomization mechanism and test statistic in our approach.
We illustrate our methodological framework with a study of party-level advantages in U.S. Senate elections, comparing future Democratic vote shares in states where the Democratic party barely won an election to states where it barely lost. We find that the effect of barely winning an election for a seat has a large and positive effect on the vote share in the following election for that seat, but a null effect on the following election for the state’s other seat. Our null findings are consistent with the results reported by Butler and Butler [16], who studied balancing and related hypotheses using standard RD methods, although we find that these null results may be sensitive to the choice of window.
The rest of the paper is organized as follows. Section 2 sets up our statistical framework, formally states the baseline assumptions required to apply randomization inference procedures to the RD design, and describes these procedures briefly. Section 3 discusses data-driven methods to select the window around the cutoff where the randomization assumption may be plausible. Section 4 briefly reviews the classical notion of incumbency advantage in the Political Science literature and discusses its differences with RD-based measures, while Section 5 presents the results of our empirical analysis. Section 6 discusses several extensions and applications of our methodology, and Section 7 concludes.
2 Randomization inference in RD
Consider a setting with n units, indexed
Our approach begins by specifying conditions within a neighborhood of the threshold that allow us to analyze the RD design as a randomized experiment. Specifically, we focus on an interval or window
Assumption 1: Local Randomization. There exists a neighborhood
(a)
(b)
The first part of Assumption 1 says that the distribution of the score is the same for all units inside
The conditions in Assumption 1 are stronger than those typically required for identification and inference in the classical RD literature. Instead of only assuming continuity of the relevant population functions at
In most settings, Assumption 1 is plausible only within a narrow window of the threshold, leaving only a small number of units for analysis. Thus, the problems of estimation and inference using this assumption in the context of RD are complicated by small-sample concerns. Following Rosenbaum [14, 15], we propose using exact randomization inference methods to overcome this potential small-sample problem. In the remainder of this section, we maintain Assumption 1 and take as given the window
2.1 Hypothesizing the randomization mechanism
The first task in applying randomization inference to the RD design is to choose a randomization mechanism for
While the simplicity of this Bernoulli mechanism is attractive, a practical disadvantage is that it results in a positive probability of all units in the window being assigned to the same group. An alternative mechanism that avoids this problem, and is also likely to apply in settings where
When
2.2 Test of no effect
Having chosen an appropriate randomization mechanism, we can test the sharp null hypothesis of no treatment effect under Assumption 1. No treatment effect means observed outcomes are fixed regardless of the realization of
Any test statistic may be used, including difference-in-means, the Kolmogorov–Smirnov test statistic, and difference-in-quantiles. While in typical cases the significance level of the test may be approximated when a large number of units is available, randomization-based inference remains valid (given Assumption 1) even for a small number of units. This feature is particularly important in the RD design where the number of units within
2.3 Confidence intervals and point estimates
While the test of no treatment effect is often an important starting place, and appealing for the minimal assumptions it relies on, in most applications we would like to construct confidence intervals and point estimates of treatment effects. This requires additional assumptions. The next assumption we introduce is that of no interference between units.
Assumption 2: Local stable unit treatment value assumption. For all i with
This assumption means that unit i’s potential outcome depends only on
Point estimates and potentially shorter confidence intervals for the treatment effect can be obtained at the cost of a parametric model for the treatment effect. A simple (albeit restrictive) model that is commonly used is the constant treatment effect model described below.
Assumption 3: Local constant treatment effect model. For all i with
Under Assumptions 1–3, and hypothesizing a value
We discuss this constant and additive treatment effect model because it allows us to illustrate how confidence intervals can be easily derived by inverting hypothesis tests about a treatment effect parameter. But there is nothing in the randomization inference framework that we have adopted that necessitates Assumption 3. This assumption can be easily generalized to allow for non-constant treatment effects, such as Tobit or attributable effects (see Rosenbaum [15], Chapter 2). Indeed, the technique of constructing adjusted potential outcomes and inverting hypothesis tests of the sharp null hypothesis is general and allows for arbitrarily heterogeneous models of treatment effects. Furthermore, the confidence intervals for quantile treatment effects described above do not require a parametric treatment effect model.
3 Window selection
If there exists a window
Imposing Assumption 1 throughout, we propose a method to select
Our window selection procedure is inspired by this common empirical practice. In particular, we assume that there exists a covariate for each unit, denoted
The first step to formalize this approach is to assume that the treatment effect on the covariate x is zero inside the window where Assumption 1 holds. We collect the covariates in
Assumption 4: Zero treatment effect for covariate. For all i with
Assumption 4 states that the sharp null hypothesis holds for
The second necessary step to justify our procedure for selecting
Assumption 5: Association outside W0 between covariate and score. For all i with
(a)
(b)For all
Assumption 5 is key to obtain a valid window selector, since it requires a form of non-random selection among units outside
Assumptions 1, 4 and 5 justify a simple procedure to find
The procedure depends crucially on sequential testing in nested windows: if the sharp null hypothesis is rejected for a given window, then this hypothesis will also be rejected in any window that contains it (with a test of sufficiently high power). Thus, the procedure searches windows of different sizes until it finds the largest possible window such that the sharp null hypothesis cannot be rejected for any window contained in it. This procedure can be implemented as follows.
Window selection procedure based on predetermined covariates. Select a test statistic of interest, denoted
Step 1: Define
Step 2: Conduct a test of no effect using
Step 3: If the null hypothesis is rejected, increase
An important feature of this approach is that, unlike conventional hypothesis testing, we are particularly concerned about the possibility of failing to reject the null hypothesis when it is false (Type II error). Usually, researchers are concerned about controlling Type I error to avoid rejecting the null hypothesis too often when it is true, and thus prefer testing procedures that are not too “liberal”. In our context, however, rejecting the null hypothesis is used as evidence that the local randomization Assumption 1 does not hold, and our ultimate goal is to learn whether the data support the existence of a neighborhood around the cutoff where our null hypothesis fails to be rejected. In this sense, the roles of Type I and Type II error are interchanged in our context.[6] This has important implications for the practical implementation of our approach, which we discuss next.
3.1 Implementation
Implementing the procedure proposed above requires three choices: (i) a test statistic, (ii) the minimum sample sizes (
3.1.1 Choice of test statistic
This choice is important because different test statistics will have power against different alternative hypotheses and, as discussed above, we prefer tests with low type II error. In our procedure, the sharp null hypothesis of no treatment effect could employ different test statistics such as difference-in-means, Wilcoxon rank sum or Kolmogorov–Smirnov, because the null randomization distribution of any of them is known. Lehmann [19] and Rosenbaum [14, 15] provide a discussion and comparison of alternative test statistics. In our application, we employ the difference-in-means test statistic.
3.1.2 Choice of minimum sample size
The main goal of setting a minimum sample size is to prevent the procedure from having too few observations when conducting the hypothesis test in the smallest possible window. These constants should be large enough so that the test statistic employed has “good” power properties to detect departures from the null hypothesis. We recommend setting
3.1.3 Choice of testing procedure and α
First, our procedure performs hypothesis tests in a sequence of nested windows and thus involves multiple hypothesis testing (see Efron [20] for a recent review). This implies that, even when the null hypothesis is true, it will be rejected several times (e.g., if the hypotheses are independent, they will be rejected roughly as many times as the significance level times the number of windows considered). For the family-wise error rate, multiple testing implies that our window selector will reject more windows than it should, because the associated p-values will be too small. But since we are more concerned about failing to reject a false null hypothesis (type II error) than we are about rejecting a true one (type I error), this implies that our procedure will be more conservative, selecting a smaller window than the true window (if any) where the local randomization assumption is likely to hold. For this reason, we recommend that researchers do not adjust p-values for multiple testing.[7] Second, we must choose a significance level
In the upcoming sections, we illustrate how our methodological framework works in practice with a study of party advantages in U.S. Senate elections.
4 Regression discontinuity and the party incumbency advantage
Political scientists have long studied the question of whether the incumbent status of previously elected legislators translates into an electoral or incumbency advantage. This advantage is believed to stem from a variety of factors, including name recognition, the ability to perform casework and cultivate a personal vote, the ability to deter high-quality challengers, the implementation of pro-incumbent redistricting plans, and the availability of the incumbency cue amidst declining party attachments. Although the literature is vast, it has focused overwhelmingly on the incumbency advantage of members of the U.S. House of Representatives.[8]
Estimating the incumbency advantage is complicated by several factors. One is that high-quality politicians tend to obtain higher vote shares than their low-quality counterparts, making them more likely both to become incumbents in the first place and to obtain high vote shares in future elections. Another is that incumbents tend to retire strategically when they anticipate a poor performance in the upcoming election, making “open seats” (races where no incumbent is running) a dubious baseline for comparison. Any empirical strategy that ignores these methodological issues will likely overestimate the size of the incumbency advantage.
Recently, Lee [9] proposed using a regression discontinuity design based on the discontinuous relationship between the incumbency status of a party in a given election and its vote share in the previous election: in a two-party system, a party enjoys incumbency status when it obtains 50% of the vote or more in the previous election, but loses incumbency status to the opposing party otherwise. In this RD design, the score is the vote share obtained by a party at election t, the cutoff is 50%, and the treatment (incumbent status) is assigned deterministically based on whether the vote share at t exceeds the cutoff. The outcome of interest is the party’s vote share in the following election, at
4.1 RD design in U.S. Senate elections: two estimands of party advantage
Our application of the RD design to U.S. Senate elections focuses on two specific estimands that capture local electoral advantages and disadvantages at the party level. The first estimand, which we call the incumbent-party advantage, focuses on the effect of the Democratic party winning a Senate seat on its vote share in the following election for that seat. The other estimand, which we call the opposite-party advantage following Alesina et al. [27], is unrelated to the traditional concept of the incumbency advantage and reveals the disadvantages faced by the party that tries to win the second seat in a state’s Senate delegation. Establishing whether the opposite-party advantage exists has been of central importance to theories of split-party Senate delegations, and there are different explanations of why it may arise.[9]
Both estimands, formally defined in terms of potential outcomes below, are derived from applying an RD design to the staggered structure of Senate elections, which we now describe briefly. Term length in the U.S. Senate is 6 years and there are 100 seats. These Senate seats are divided into three classes of roughly equal size (Class I, Class II and Class III), and every 2 years only the seats in one class are up for election. As a result, the terms are staggered: in every general election, which occurs every 2 years, only one third of Senate seats are up for election. Each state elects two senators in different classes to serve a 6-year term in popular statewide elections. Since its two senators belong to different classes, each state has Senate elections separated by alternating 2-year and 4-year intervals. Moreover, in any pair of consecutive elections, each election is for a different senate seat – that is, for a seat in a different class.[10]
Following Butler and Butler [16], we apply the RD design in the U.S. Senate analogously to its previous applications in the U.S. House, comparing states where the Democratic party barely won election t to states where the Democratic party barely lost. But in the Senate, the staggered structure of terms adds a layer of variability that allows us to both study party advantages and validate our design in more depth than would be possible in a non-staggered legislature such as the House. Using t,
Election | Seat A | Seat B | Design and outcomes |
t | Election held. Candidate C from party P wins | No election held | – |
No election held | Election held. (Candidate C is not a contestant in this race) | Design II: Effect of P winning Seat A at t on P’s vote share for Seat B at | |
Election held. Candidate C may or may not be P’s candidate | No election held | Design I: Effect of P winning Seat A at t on P’s vote share for Seat A at |
The first design (Design I) focuses on the effect of party P’s barely winning at t on its vote share at
Using the notation introduced in Section 2, we consider two estimands defined by Designs I and II. We define the treatment indicator as
5 Results: RD-based party advantages in U.S. Senate elections
We analyze U.S. Senate elections between 1914 and 2010. This is the longest possible period to study popular U.S. Senate elections, as before 1914 Senate members were elected indirectly by state legislatures. We combine several data sources. We collected election returns for the period 1914–1990 from The Interuniversity Consortium for Political and Social Research (ICPSR) Study 7757, and for the period 1990–2010 from the CQ Voting and Elections Collection. We obtained population estimates at the state level from the U.S. Census Bureau. We also used ICPSR Study 3371 and data from the Senate Historical Office to establish whether each individual senator served the full 6 years of his or her term, and exclude all elections in which a subsequent vacancy occurs. We exclude vacancy cases because, in most states, when a Senate seat is left vacant the governor can appoint a replacement to serve the remaining time in the term or until special elections are held, and in most states appointed senators need not be of the same party as the incumbents they replace, leaving the “treatment assignment” of the previous election undefined.[12]
5.1 Selecting the window
We selected our window using the method based on predetermined covariates presented in Section 3. The largest window we considered was
Figure 1 summarizes graphically the results of our window selector. For every symmetric window considered (x-axis), we plot the minimum p-value found in that window (y-axis). The x-axis is the absolute value of our running variable, the Democratic margin of victory at election t, which is equivalent to the upper limit of each window considered (since we only consider symmetric windows) and ranges from 0 to 100. For example, the point 20 on the x-axis corresponds to the
Table 2 shows the minimum p-values for the first five consecutive windows we considered and also for the windows
Our window selection procedure suggests that Assumption 1 is plausible in the window
Window | Minimum p-value | Covariate with minimum p-value |
0.2639 | Dem Senate Vote t– 2 | |
0.4260 | Open Seat t | |
0.2682 | Open Seat t | |
0.0842 | Open Seat t | |
0.0400 | Open Seat t | |
0.0958 | Midterm t | |
0.0291 | Midterm t | |
0.0008 | Open Seat t | |
0.0000 | Dem Senate Vote t– 1 |
5.2 Inference within the selected window
We now show that the results obtained by conventional methods are robust to our randomization-based approach in both Design I and Design II. Randomization-based results within the window imply a sizable advantage when a party’s same seat is up for election (Design I) that is very similar to results based on conventional methods. Randomization results on outcomes when the state’s other seat is up for reelection (Design II) show a null effect, also in accordance with conventional methods. However, as we discuss below, the null opposite advantage results from Design I are sensitive to our window choice, and a significant opposite-party advantage appears in the smallest window contained within our chosen window.
Our randomization-based results include a Hodges–Lehmann estimate, a treatment effect confidence interval obtained inverting hypothesis tests based on a constant treatment effect model, a quantile treatment effect confidence interval, and a sharp null hypothesis p-value calculated as described in the window selection section above. Table 3 contrasts the party advantage estimates and tests obtained using our randomization-based framework, reported in column (3), to those obtained from two classical approaches: a 4th-order parametric fit as in Lee [9] reported in column (1), and a nonparametric local-linear regression with a triangular kernel as suggested by Imbens and Lemieux [2], using a mean-squared-error (MSE) optimal bandwidth implementation described in Calonico et al. [8], reported in column (2). For both approaches, we show conventional confidence intervals; for the local linear regression results, we also show the robust confidence intervals developed by Calonico et al. [8], since the MSE optimal bandwidth is too large for conventional confidence intervals to be valid.[14] Panel A presents results for Design I on the incumbent-party advantage, in which the outcome is the Democratic vote share in election
The point estimates in the first row of Panel A show an estimated incumbent-party effect of around 7 to 9 percentage points for standard RD methods and 9 percentage points for the randomization-based approach. These estimates are highly significant (p-values for all three approaches fall well below conventional levels) and point to a substantial advantage to the incumbent party when the party’s seat is up for re-election. In other words, our randomization-based approach shows that the results obtained with standard methods are remarkably robust: a local or global approximation that uses hundreds of observations far away from the cutoff yields an incumbent-party advantage that is roughly equivalent to the one estimated with the 38 races decided by three quarters of a percentage point or less. This robustness is illustrated in the top panel of Figure 2. Figure 2(a) displays the fit of the Democratic Vote Share at
In our data-driven window, estimates of the opposite-party party advantage also appear robust to the method of estimation employed. In Panel B, estimates on Democratic Vote Share at
These results are illustrated in the bottom row of Figure 2, where Figure 2(c) and 2(d) are analogous to Figure 2(a) and 2(b), respectively. The effect of winning an election by 0.75% appears roughly equivalent to the effect estimated by standard methods. In our randomization-based window, the mean of the control group is slightly larger than the mean of the treatment group, but as shown in Table 3 we do not find statistically significant evidence of an opposite-party advantage.
Taken together, our results provide interesting evidence about party-level electoral advantages in the U.S. Senate. First, our results show that there is a strong and robust incumbent-party effect, with the party that barely wins a Senate seat at t receiving on average seven to nine additional percentage points in the following election for that seat. Second, our randomization-based approach confirms the previous finding of Butler and Butler [16], according to which there is no opposite-party advantage in the U.S. Senate. As we show below, however, and in contrast to the incumbent-party advantage results, the opposite-party advantage result is sensitive to our window choice and becomes large and significant as predicted by theory inside a smaller window.
Conventional approaches | Randomization-basedapproach | ||
Parametric | Nonparametric | ||
(1) | (2) | (3) | |
A. Design I (outcome = Dem Vote Share at t+2) | |||
Point estimate | 9.41 | 7.43 | 9.32 |
p-value | 0.0000 | 0.0000 | 0.0004 |
95% CI | [6.16, 12.65] | [4.49, 10.36] | [4.60, 14.78] |
95% CI robust | – | [4.07, 10.98] | – |
0.25-QTE 95% CI | – | – | [–2.00, 21.12] |
0.75-QTE 95% CI | – | – | [3.68, 18.94] |
Bandwidth/Window | – | 16.79 | [–0.75, 0.75] |
Sample size treated | 702 | 310 | 22 |
Sample size control | 595 | 343 | 15 |
B. Design II (outcome = Dem Vote Share at t+1) | |||
Point estimate | 0.64 | 0.35 | –0.79 |
p-value | 0.79 | 0.82 | 0.62 |
95% CI | [–3.16, 4.44] | [–2.69, 3.39] | [–8.25, 5.03] |
95% CI robust | – | [–2.83, 4.13] | – |
0.25-QTE 95% CI | – | – | [–8.75, 9.96] |
0.75-QTE 95% CI | – | – | [–11.15, 11.31] |
Bandwidth/Window | – | 23.27 | [–0.75, 0.75] |
Sample size treated | 731 | 397 | 23 |
Sample size control | 610 | 428 | 15 |
5.3 Sensitivity of results to window choice and test statistics
We study the sensitivity of our results to two choices: the window size and the test statistic used to conduct our tests. First, we replicate the randomization-based analysis presented above for different windows, both larger and smaller than our chosen
Second, we perform the test of the sharp null using different test statistics. Under Assumption 1, there is no relationship between the outcome and the score on either side of the threshold within
Table 4 presents the results from our sensitivity analysis. Panel A shows results for Democratic Vote Share at
There are important differences between our two outcomes. The results in Design I (Panel A) are robust to the choice of the test statistic in the originally chosen
A. Design I (outcome = Dem Vote Share at t+2) | ||||
Window | Smaller window | Chosen Window | Larger Windows | |
[–0.50, 0.50] | [–0.75, 0.75] | [–1.00, 1.00] | [–2.00, 2.00] | |
Point estimate | 10.16 | 9.32 | 9.61 | 8.90 |
p-value diffmeans | 0.0037 | 0.0004 | 0.0000 | 0.0000 |
p-value linear | 0.0001 | 0.0000 | 0.0000 | 0.0000 |
p-value quadratic | 0.0089 | 0.0000 | 0.0000 | 0.0000 |
Treatment effect CI | [3.62, 17.14] | [4.60, 14.78] | [5.85, 15.17] | [6.38, 13.98] |
0.25-QTE CI | [–2.75, 19.42] | [–2.00, 21.12] | [4.13, 21.25] | [4.88, 18.57] |
0.75-QTE CI | [1.93, 17.87] | [3.68, 18.94] | [1.78, 17.53] | [0.42, 13.69] |
Sample size treated | 14 | 22 | 25 | 47 |
Sample size control | 9 | 15 | 18 | 49 |
B. Design II (outcome = Dem Vote Share at t+1) | ||||
Window | Smaller window | Chosen Window | Larger Windows | |
[–0.50, 0.50] | [–0.75, 0.75] | [–1.00, 1.00] | [–2.00, 2.00] | |
Point estimate | –8.17 | –0.79 | 2.32 | 0.56 |
p-value diffmeans | 0.0479 | 0.6228 | 0.5093 | 0.7252 |
p-value linear | 0.6455 | 0.0000 | 0.0000 | 0.2876 |
p-value quadratic | 0.0116 | 0.7297 | 0.4835 | 0.0599 |
Treatment effect CI | [–16.66, –0.08] | [–8.25, 5.03] | [–4.89, 9.66] | [–3.87, 5.60] |
0.25-QTE CI | [–13.82, –0.16] | [–8.75, 9.96] | [–8.63, 14.65] | [–4.14, 4.85] |
0.75-QTE CI | [–25.92, 12.63] | [–11.15, 11.31] | [–10.72, 16.23] | [–8.26, 12.63] |
Sample size treated | 15 | 23 | 27 | 50 |
Sample size control | 9 | 15 | 18 | 49 |
To investigate this issue further, Figure 3 plots the empirical cumulative distribution functions (ECDF) of our two outcomes in two different windows: the small
In contrast, for Democratic Vote Share
All in all, our sensitivity and robustness analysis in this section shows that the incumbent-party advantage results are robust but our opposite-party advantage results are more fragile and suggest some avenues for future research.
6 Extensions, applications and discussion
We introduced a framework to analyze regression discontinuity designs employing a “local” randomization approach and proposed using randomization inference techniques to conduct finite-sample exact inference. In this section, we discuss five natural extensions focusing on fuzzy RD designs, discrete-valued and multiple running variables, matching techniques and sensitivity analysis. In addition, we discuss a connection between our approach and the conventional large-sample RD approach.
6.1 Fuzzy RD with possibly weak instruments
In the sharp RD design, treatment assignment is equal to
Let
Assumption 1′: Local randomized experiment. There exists a neighborhood
(a)
(b)
This assumption permits testing the null hypothesis of no effect exactly as described above, although the interpretation of the test differs, as now it can only be considered a test of no effect of treatment among those units whose potential treatment status
Assumption 2′: Local SUTVA (LSUTVA). For all i with
(a)If
(b)If
Assumption 6: Local exclusion restriction. For all i with
Assumption 6 means potential responses depend on placement with respect to the threshold only through its effect on treatment status. Under assumptions
Fuzzy RD designs are local versions of the usual instrumental variables (IV) model and thus concerns about weak instruments may arise in this context as well [34]. Our randomization inference framework, however, circumvents this concern because it enables us to conduct exact finite-sample inference, as discussed in Imbens and Rosenbaum [10] for the usual IV setting. Therefore, our framework also offers an alternative, robust inference approach for fuzzy RD designs under possibly weak instruments.
6.2 Discrete and multiple running variables
Another feature of our framework is that it can handle RD settings where the running variable is not univariate and continuous. Our results provide an alternative inference approach when the running variable is discrete or has mass points in its support (see, for example Lee and Card [35]). While conventional, nonparametric smoothing techniques are usually unable to handle this case without appropriate technical modifications, our randomization inference approach applies immediately to this case and offers researchers a fully data-driven approach for inference when the running variable is not continuously distributed. Similarly, our approach extends naturally to settings where multiple running variables are present (see, e.g., Keele and Titiunik [36] and references therein). For example, in geographic RD designs, which involve two running variables, Keele et al. [37] discuss how the methodological framework introduced herein can be used to conduct inference employing geographic RD variation.
6.3 Matching and parametric modeling
Conventional approaches to RD employ continuity of the running variable and large-sample approximations, and typically do not emphasize the role of covariates and parametric modeling, relying instead on nonparametric smoothing techniques local to the discontinuity. However, in practice, researchers often incorporate covariates and employ parametric models in a “small” neighborhood around the cutoff when conducting inference. Our framework gives a formal justification (i.e., “local randomization”) and an alternative inference approach (i.e., randomization inference) for this common empirical practice. For example, our approach can be used to justify (finite-sample exact) inference in RD contexts using panel or longitudinal data, specifying nonlinear models or relying on flexible “matching” on covariates techniques. For a recent example of such an approach, see Keele et al. [37].
6.4 Sensitivity analysis and related techniques
In the context of randomization-based inference, a useful tool to asses the plausibility of the results is a sensitivity analysis that considers how the results vary under deviations from the randomization assumption. Rosenbaum [14, 15] provides details of such an approach when the treatment is assumed to be randomly assigned conditionally on covariates. Under a randomization-type assumption, the probability of receiving treatment is equal for treated and control units; a sensitivity analysis proposes a model for the odds of receiving treatment and allows the probability of receiving treatment to differ between groups and recalculates the p-values, confidence intervals or point estimates of interest. The analysis asks whether small departures from the randomization-type assumption would alter the conclusions from the study. If, for example, small differences in the probability of receiving treatment between treatment and control units lead to markedly different conclusions (i.e., if the null hypothesis of no effect is initially rejected but then ceases to be rejected), then we conclude that the results are sensitive and appropriately temper our confidence in the results. This kind of analysis could be directly applied in our context inside
6.5 Connection to standard RD setup
Our finite-sample RD inference framework may be regarded as an alternative approximation to the conventional RD identifying conditions in Hahn et al. [5]. This section defines a large-sample identification framework similar to the conventional one and discusses its connection to the finite-sample Assumption 1.
In the conventional RD setup, individuals have random potential outcomes
Assumption 7: Conventional RD assumption. For all
(a)
(b)
(c)
These conditions are very similar to those in Hahn et al. [5] and other (large-sample type) approaches to RD. The main difference is that we require continuity of potential outcome functions, as opposed to just continuity of the conditional expectation or distribution of potential outcomes. Continuity of the potential outcome functions rules out knife-edge cases where confounding differences in potential outcomes at the threshold (that is, discontinuities in
The conventional RD approach approximates the conditional distribution of outcomes near the threshold as locally linear and relies on large-sample asymptotics for inference. Our approach proposes an alternative local constant approximation and uses finite-sample inference techniques. The local linear approximation may be more accurate than local constant farther from the threshold but the large-sample sample approximations may be poor. The local constant approximation will likely be appropriate only very near the threshold, but the inference will remain valid for small samples. The following suggests that our finite-sample condition in Assumption 1 can be seen as an approximation obtained from the more conventional RD identifying conditions given in Assumption 7, with an approximation error that is controlled by the window width.
Result 1: connection between RD frameworks. Suppose Assumption 7 holds. Then:
(i)
(ii)
Part (i) of this result says that the running variable is approximately independent of potential outcomes near the threshold, or, in the finite-sample framework where potential outcomes are fixed, each unit’s running variable has approximately the same distribution (under i.i.d. sampling). This corresponds to part (a) of Assumption 1 (Local Randomization) and gives a formal connection between the usual RD framework and our randomization-inference framework. Similarly, part (ii) implies that potential outcomes depend approximately on treatment status only near the threshold
7 Conclusion
Motivated by the interpretation of regression discontinuity designs as local experiments, we proposed a randomization inference framework to conduct exact finite-sample inference in this design. Our approach is especially useful when only a few observations are available in the neighborhood of the cutoff where local randomization is plausible. Our randomization-based methodology can be used both for validating (and even selecting) this window around the RD threshold and performing statistical inference about the effects in this window. Our analysis of party-level advantages in U.S. Senate elections illustrated our methodology and showed that a randomization-based analysis can lead to different conclusions from standard RD methods based on large-sample approximations.
We envision our approach as complementary to existing parametric and nonparametric methods for the analysis of RD designs. Employing our proposed methodological approach, scholars can provide evidence about the plausibility of the as-good-as-random interpretation of their RD designs, and also conduct exact finite-sample inference employing only those few observations very close to the RD cutoff. If even in a small window around the cutoff the sharp null hypothesis of no effect is rejected for predetermined covariates, scholars should not rely on the local randomization interpretation of their designs, and hence should pay special attention to the plausibility of the continuity assumptions imposed by the standard approach.
Acknowledgments
We thank the co-Editor, Kosuke Imai, three anonymous referees, Peter Aronow, Jake Bowers, Devin Caughey, Andrew Feher, Don Green, Luke Keele, Jasjeet Sekhon, and participants at the 2010 Political Methodology Meeting in the University of Iowa and at the 2012 Political Methodology Seminar in Princeton University for valuable comments and suggestions. Previous versions of this manuscript were circulated under the titles “Randomization Inference in the Regression Discontinuity Design” and “Randomization Inference in the Regression Discontinuity Design to Study the Incumbency Advantage in the U.S. Senate” (first draft: July, 2010). Cattaneo and Titiunik gratefully acknowledge financial support from the National Science Foundation (SES 1357561).
References
1. ThistlethwaiteDL, CampbellDT. Regression-discontinuity analysis: an alternative to the ex-post facto experiment. J Educ Psychol1960;51:309–17.10.1037/h0044319Search in Google Scholar
2. ImbensG, LemieuxT. Regression discontinuity designs: a guide to practice. J Econometrics2008;142:615–35.10.1016/j.jeconom.2007.05.001Search in Google Scholar
3. LeeDS, LemieuxT. Regression discontinuity designs in economics. J Econ Lit2010;48:281–355.10.1257/jel.48.2.281Search in Google Scholar
4. DinardoJ, LeeDS. Program evaluation and research designs. In: AshenfelterO, CardD, editors. Handbook of labor economics, vol. 4A. Amsterdam, Netherlands: Elsevier Science B.V., 2011:463–536.Search in Google Scholar
5. HahnJ, ToddP, van der KlaauwW. Identification and estimation of treatment effects with a regression-discontinuity design. Econometrica2001;69:201–09.10.1111/1468-0262.00183Search in Google Scholar
6. PorterJ. Estimation in the regression discontinuity model. Working paper, University of Wisconsin, 2003.Search in Google Scholar
7. ImbensGW, KalyanaramanK. Optimal bandwidth choice for the regression discontinuity estimator. Rev Econ Stud2012;79:933–59.10.1093/restud/rdr043Search in Google Scholar
8. CalonicoS, CattaneoMD, TitiunikR. Robust nonparametric confidence intervals for regression-discontinuity designs. Econometrica2014.10.3982/ECTA11757Search in Google Scholar
9. LeeDS. Randomized experiments from non-random selection in U.S. house elections. J Econometrics2008;142:675–97.10.1016/j.jeconom.2007.05.004Search in Google Scholar
10. ImbensGW, RosenbaumP. Robust, accurate confidence intervals with a weak instrument: quarter of birth and education. J R Stat Soc Ser A2005;168:109–26.10.1111/j.1467-985X.2004.00339.xSearch in Google Scholar
11. HoDE, ImaiK. Randomization inference with natural experiments: an analysis of ballot effects in the 2003 election. J Am Stat Assoc2006;101:888–900.10.1198/016214505000001258Search in Google Scholar
12. BarriosT, DiamondR, ImbensGW, KolesarM. Clustering, spatial correlations and randomization inference. J Am Stat Assoc2012;107:578–91.10.1080/01621459.2012.682524Search in Google Scholar
13. HansenBB, BowersJ. Attributing effects to a cluster randomized get-out-the-vote campaign. J Am Stat Assoc2009;104:873–85.10.1198/jasa.2009.ap06589Search in Google Scholar
14. RosenbaumPR. Observational studies, 2nd ed. New York: Springer, 2002.10.1007/978-1-4757-3692-2Search in Google Scholar
15. RosenbaumPR. Design of observational studies. New York: Springer, 2010.10.1007/978-1-4419-1213-8Search in Google Scholar
16. ButlerD, ButlerM. Splitting the difference? Causal inference and theories of split-party delegations. Pol Anal2006;14:439–55.10.1093/pan/mpj010Search in Google Scholar
17. HollandPW. Statistics and causal inference. J Am Stat Assoc1986;81:945–60.10.1080/01621459.1986.10478354Search in Google Scholar
18. WellekS. Testing statistical hypotheses of equivalence and noninferiority, 2nd ed. Boca Raton, FL: Chapman & Hall/CRC, 2010.10.1201/EBK1439808184Search in Google Scholar
19. LehmannEL. Nonparametrics: statistical methods based on ranks. New York: Springer, 2006.Search in Google Scholar
20. EfronB. Large-scale inference. Cambridge, UK: Cambridge, 2010.Search in Google Scholar
21. CraiuRV, SunL. Choosing the lesser evil: trade-off between false discovery rate and non-discovery rate. Stat Sin2008;18:861–79.Search in Google Scholar
22. EriksonRS. The advantage of incumbency in congressional elections. Polity1971;3:395–405.10.2307/3234117Search in Google Scholar
23. GelmanA, KingG. Estimating incumbency advantage without bias. Am J Pol Sci1990;34:1142–64.10.2307/2111475Search in Google Scholar
24. AnsolabehereS, SnyderJM. The incumbency advantage in U.S. Elections: an analysis of state and federal offices, 1942–2000. Election Law J: Rules, Pol Policy2002;1:315–38.10.1089/153312902760137578Search in Google Scholar
25. EriksonR, TitiunikR. Using regression discontinuity to uncover the personal incumbency advantage. Working Paper, University of Michigan, 2014.Search in Google Scholar
26. CaugheyD, SekhonJS. Elections and the regression-discontinuity design: lessons from close U.S. House races, 1942–2008. Pol Anal2011;19:385–408.10.1093/pan/mpr032Search in Google Scholar
27. AlesinaA, FiorinaM, RosenthalH. Why are there so many divided senate delegations? National Bureau of Economic Research, Working Paper 3663, 1991.10.3386/w3663Search in Google Scholar
28. JungG-R, KennyLW, LottJR. An explanation for why senators from the same state vote differently so frequently. J Public Econ1994;54:65–96.10.1016/0047-2727(94)90071-XSearch in Google Scholar
29. SeguraGM, NicholsonSP. Sequential choices and partisan transitions in U.S. senate delegations: 1972–1988. J Polit1995;57:86–100.10.2307/2960272Search in Google Scholar
30. McCraryJ. Manipulation of the running variable in the regression discontinuity design: a density test. J Econometrics2008;142:698–714.10.1016/j.jeconom.2007.05.005Search in Google Scholar
31. CalonicoS, CattaneoMD, TitiunikR. Robust data-driven inference in the regression-discontinuity design. Stata J2014.10.1177/1536867X1401400413Search in Google Scholar
32. CalonicoS, CattaneoMD, TitiunikR. Rdrobust: an R package for robust inference in regression-discontinuity designs. Working paper, University of Michigan, 2014.10.32614/RJ-2015-004Search in Google Scholar
33. FrandsenB, FrölichM, MellyB. Quantile treatments effects in the regression discontinuity design. J Econometrics2012;168:382–95.10.1016/j.jeconom.2012.02.004Search in Google Scholar
34. MarmerV, FeirD, LemieuxT. Weak identification in fuzzy regression discontinuity designs. Working paper, University of British Columbia, 2014.Search in Google Scholar
35. LeeDS, CardD. Regression discontinuity inference with specification error. J Econometrics2008;142:655–74.10.1016/j.jeconom.2007.05.003Search in Google Scholar
36. KeeleL, TitiunikR. Geographic boundaries as regression discontinuities. Pol Anal2014.Search in Google Scholar
37. KeeleL, TitiunikR, ZubizarretaJ. Enhancing a geographic regression discontinuity design through matching to estimate the effect of ballot initiatives on voter turnout. J R Stat Soc Ser A2014.10.1111/rssa.12056Search in Google Scholar
©2015 by De Gruyter