Skip to main content
Erschienen in: Journal of Quantitative Economics 1/2022

Open Access 12.09.2022 | Original Article

Causal Inference of Social Experiments Using Orthogonal Designs

verfasst von: James J. Heckman, Rodrigo Pinto

Erschienen in: Journal of Quantitative Economics | Sonderheft 1/2022

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Orthogonal arrays are a powerful class of experimental designs that has been widely used to determine efficient arrangements of treatment factors in randomized controlled trials. Despite its popularity, the method is seldom used in social sciences. Social experiments must cope with randomization compromises such as noncompliance that often prevent the use of elaborate designs. We present a novel application of orthogonal designs that addresses the particular challenges arising in social experiments. We characterize the identification of counterfactual variables as a finite mixture problem in which choice incentives, rather than treatment factors, are randomly assigned. We show that the causal inference generated by an orthogonal array of incentives greatly outperforms a traditional design.
Hinweise

Supplementary Information

The online version contains supplementary material available at https://​doi.​org/​10.​1007/​s40953-022-00307-w.
James J. Heckman and Rodrigo Pinto contributed equally to this work.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

This paper investigates the problem of making causal inferences in social experiments under noncompliance. We develop two themes motivated by C.R. Rao’s fundamental contributions to the characterization of distributions and the study of experiments. We use instrumental variables to characterize the identification of causal parameters as the solution to a mixing distribution problem. We then explore orthogonal array designs to correct for the selection bias generated by noncompliance.
Statisticians widely use Rao’s research on orthogonal arrays to design efficient arrangements of treatment factors in randomized controlled trials (RCTs). See, e.g., Stinson (2004). Despite its popularity, Rao’s research has not been broadly applied to evaluate treatment effects in social sciences. Social experiments are commonly plagued by randomization compromises, such as noncompliance, that often prevent the use of elaborate designs. This paper uses recently developed econometric tools to repurpose Rao’s original ideas into a novel framework where orthogonal arrays of incentives play a central role in solving compliance problems in social experiments.
In his M.A. thesis at Calcutta University, C. R. Rao (1943) introduced a powerful class of experimental designs called orthogonal arrays. This design employs combinatorial arrangements of factors (or treatments) for each randomization arm. Rao developed the theory of orthogonal arrays in a series of seminal papers (C. R. Rao 1946a, b, 1947, 1949).
The following matrix is an example of an orthogonal array:
$$\begin{aligned} \varvec{A}= \left[ \begin{array}{ccc} 0 &{} 0 &{} 0 \\ 1 &{} 1 &{} 0 \\ 1 &{} 0 &{} 1 \\ 0 &{} 1 &{} 1 \\ \end{array}\right] \end{aligned}$$
(1)
Matrix \(\varvec{A}\) is a 2-1evel orthogonal array because it only uses two elements, 0 and 1. Any two columns of the matrix display all the possible combinations of zeros and ones, that is, (0, 0), (0, 1), (1, 0), and (1, 1). The matrix has four “runs” (rows) corresponding to treatment conditions and three “factors” (columns) corresponding to treatments. The matrix is classified as OA (4, 3, 2, 2), where the first number 2 is the level and the second number 2 is the strength, which is the number of columns where we are guaranteed to see all the possible combinations of zeros and ones.
Orthogonal arrays such as OA (4, 3, 2, 2) are widely used to design experiments that determine the optimum mix of factors (or treatments) that maximize production yield. In these experiments, the researcher can choose the combination of inputs in each randomization arm.
A fundamental difference between RCTs in the natural and social sciences is that social scientists often cannot force compliance with intended treatments. In a natural experiment, the experimenter can determine the treatment of each randomization unit. In a social experiment, the randomization units consist of economic agents. The experimenter can attempt to persuade agents but can seldom impose an intended treatment status on them. The final treatment status depends on the agent’s decision to comply or not comply with the initial treatment assignment.
Noncompliance violates the principle of randomization that secures the identification of causal effects in perfectly implemented RCTs. Agents that choose to deviate from their assigned treatment may differ from those who do not. The compliance decision introduces the danger of an unobserved confounding variable that may cause both the treatment choice and the outcomes of interest. Noncompliance prevents the use of sophisticated designs, making it especially difficult to reap the benefits of Rao’s orthogonal array design.
We present a novel approach to Rao’s orthogonal array design to aid the nonparametric identification of causal effects in RCTs with noncompliance. We draw on research by Heckman and Pinto (2018) and Pinto, R. (2021a)1 and use a choice-theoretic instrumental variable (IV) model. The identification of causal parameters hinges on methods that control for unobserved characteristics of agents. We use discrete instruments to generate a finite partition of unobserved variables. This partition enables us to characterize the identification of causal parameters as a problem of identifying a mixture of unobserved distributions. The partition induced by the instruments enables us to determine the necessary and sufficient conditions for identifying counterfactual outcomes. We use this framework to investigate how the orthogonal design of choice incentives outperforms the traditional approach to social experiments.
Section Causal Model with Choice and Compliance presents a choice-theoretic causal model using instrumental variables. Section Using IV to Control for Unobserved Variables explains how to nonparametrically control for an agent’s unobservable characteristics using discrete instruments. Section Identification as a Mixture Problem describes the identification of causal effects as a problem of identifying a finite mixture of unobserved distributions. Section Using Rao’s Orthogonal Design to Address Identification Problems Arising from Noncompliance in Social Experiments explains how to use Rao’s orthogonal design to identify and estimate causal parameters. Section Conclusion concludes.

Causal Model with Choice and Compliance

In social experiments, the treatment status is typically determined by agents’ decisions to comply with the treatment choice. This generates the problem of selection bias, which makes it difficult to identify causal effects. Economists have long used instrumental variables to solve the problem of selection bias and to identify causal effects in choice models. This paper examines the case of multivalued-choice models with categorical instrumental variables and heterogeneous agents.

Decision-Theoretic Foundation

The economic literature offers several theoretical foundations to model an economic agent \(\omega \)’s treatment choice t among the available treatments in a choice set \(\mathcal {T}\).
The classical microeconomic theory assumes a rational agent that maximizes the utility among available choices. Agents, however, do not need to be rational to generate predictable choice behavior (Thaler 2016). As noted by Becker (1962), the key features of choice theory are a notion of preferences based on the agent’s information set and some choice constraints, such as a budget set, that shape the agent’s behavior—however rational or not.
We do not assume the full rationality of agents, but we allow for purposive actions under different information and constraint sets. We adopt a flexible choice equation consistent with a broad array of decision mechanisms. We denote the preferences of an agent \(\omega \) over the choice set \(\mathcal {T}\) by an unobserved random vector \(\varvec{V}_{\omega }\) of arbitrary but finite dimension. Choice constraints are indexed by the elements z in a finite set \(\mathcal {Z}\). We keep the information sets of agents implicitly so that the treatment choice of agent \(\omega \) given a restriction \(z \in \mathcal {Z}\) is expressed as \(T_{\omega }(z) = f_T(z,\varvec{V}_{\omega })\).
We map the choice behavior onto a standard IV model where treatment values \(t \in \mathcal {T}\) and restriction indexes \(z \in \mathcal {Z}\) become potential values in the support of the random variables T and Z, respectively. We use \(\varvec{X}\) for the random vector of baseline variables that occur prior to treatment choice. All variables are defined on the probability space \((\Omega ,\mathcal {F},P)\), and \(Z_{\omega },T_{\omega },\varvec{V}_{\omega },\varvec{X}_{\omega }\) denote the realized values of random variables \(Z,T,\varvec{V},\varvec{X}\) for an agent \(\omega \in \Omega \).

The Instrumental Variable Model

The IV model has been a standard analytical framework in economics since Reiersöl (1945). In the economic context, the IV model consists of four observed variables: (1) an instrument Z taking \(N_Z\) discrete values in the support \({{\,\mathrm{supp}\,}}(Z) = \{z_1,\dotsc ,z_{N_Z}\}\); (2) a treatment choice T taking \(N_T\) discrete values in \({{\,\mathrm{supp}\,}}(T) = \{t_1,\dotsc ,t_{N_T}\}\); (3) a real-valued outcome2Y in \(\mathbb {R}\); and (4) a pre-treatment random vector \(\varvec{X}\) of finite dimension taking values in \(\mathbb {R}^{|X|}\). Notationally, we use \(D_t = \mathbf {1}[T=t], t\in {{\,\mathrm{supp}\,}}(T)\), and \(D_z = \mathbf {1}[Z=z], z\in {{\,\mathrm{supp}\,}}(Z)\), as indicators of treatment and instrument values, respectively.3
Observed variables are related according to two policy-invariant equations that determine causal relationships among the variables:4
$$\begin{aligned}&{\text {Choice Equation:}}&\quad T = f_T(Z,\varvec{V},\varvec{X}), \end{aligned}$$
(2)
$$\begin{aligned}&{\text {Outcome Equation:}}&\quad Y = f_Y(T,\varvec{V},\varvec{X},\epsilon _Y), \end{aligned}$$
(3)
where \(\epsilon _Y\) is an unobserved error term5 in \(\mathbb {R}\). As mentioned, the choice Eq. (2) is general and might be motivated by several choice mechanisms, including utility maximization (see, e.g., McFadden 1981). The unobserved random vector \(\varvec{V}\) subsumes not only the agent’s preferences but all the unobserved (by the analyst) variables that affect both the choice T and outcome Y. Vector \(\varvec{V}\) is a confounder, and it is the source of selection bias. Choice probability \(P(T=t \mid Z=z,\varvec{X})\) is the propensity score of choosing t given z and \(\varvec{X}.\)
The two main assumptions of the IV model are:
$$\begin{aligned}&{\text {Independence:}}&Z \perp\kern-5.5pt\perp (\varvec{V},\epsilon _Y)\mid \varvec{X}\, {\text { are statistically independent}}, \end{aligned}$$
(4)
$$\begin{aligned}&{\text {IV Relevance:}}\quad P(T=t\mid Z=z,\varvec{X}){\text { is a positive and non-degenerate}} \nonumber \\&{\text { function of { z}, for all }} (t,z) \in {{\,\mathrm{supp}\,}}(T)\times {{\,\mathrm{supp}\,}}(Z). \end{aligned}$$
(5)
Independence condition (4) states that the instrument Z is statistically independent of the confounder \(\varvec{V}\) and error term \(\epsilon \) conditioned on baseline variables \(\varvec{X}\). Given that \(\varvec{V}\) is arbitrary, we can, without loss of generality, assume that \(\varvec{V}\) and \(\epsilon \) are statistically independent; that is, \(\varvec{V} \perp\kern-5.5pt\perp \epsilon \mid \varvec{X}\). The independence condition implies that the instrument affects the outcome only through its impact on the treatment T.
IV relevance (5) guarantees that there exists agents who will choose t for any instrumental value z. The condition rules out the possibility that equivalent instrumental values have an identical impact on the treatment. We also assume as a regularity condition that the outcome expectation exists \(E(Y^2)<\infty \). To simplify notation, we henceforth suppress the background variables \(\varvec{X}\). Our analysis can be interpreted as conditioned on such variables.

Counterfactuals

Counterfactual choice is defined by fixing Z in the choice Eq. (2) to a value \(z \in {{\,\mathrm{supp}\,}}(Z)\); that is, \(T(z) = f_T(z,\varvec{V})\). The counterfactual outcome is defined by fixing T in (3) to a value \(t \in {{\,\mathrm{supp}\,}}(T)\); that is, \(Y(t) = f_Y(t,\varvec{V},\epsilon _Y)\).6 The observed choice T and outcome Y can be described as switching regressions (Quandt 1958, 1972) by the following equations:
$$\begin{aligned} T&= \sum _{z \in {{\,\mathrm{supp}\,}}(Z)} T(z)\cdot D_z \,\, \equiv \,\, T(Z), \end{aligned}$$
(6)
$$\begin{aligned} Y&= \sum _{t \in {{\,\mathrm{supp}\,}}(T)} Y(t)\cdot D_t \,\, \equiv \,\, Y(T). \end{aligned}$$
(7)
Equation (6) describes choice T as the counterfactual choice T(z) multiplied by the indicator \(D_z\) that takes value one if \(Z=z\) and zero otherwise. Equation (7) describes the outcome Y in terms of the counterfactual outcomes Y(t) multiplied by the choice indicator \(D_t\).
The independence condition (4) generates two useful relations regarding counterfactuals:
$$\begin{aligned} &{\text{Exogeneity:}}\quad Z \perp\kern-5.5pt\perp (T(z),Y(t)) \text { for all } (z,t) \in {{\,\mathrm{supp}\,}}(Z)\times {{\,\mathrm{supp}\,}}(T), \end{aligned}$$
(8)
$$\begin{aligned} &{\text{Matching:}}\quad Y(t) \perp\kern-5.5pt\perp T\mid \varvec{V} \text { for all } t \in {{\,\mathrm{supp}\,}}(T). \end{aligned}$$
(9)
The exogeneity condition (8) is commonly used to describe IV models. It states that the instrument Z is independent of the counterfactuals. The matching property (9) states that controlling for the confounder \(\varvec{V}\) renders the outcome counterfactuals Y(t) statistically independent of the treatment choice T.

Causal Inference

Causal analysis seeks to make inferences about counterfactual outcomes Y(t). The causal effect of switching the treatment from t to \(t'\) for agent \(\omega \) is given by \(Y_\omega (t') - Y_\omega (t)\). A fundamental problem in causal inference is that, in any cross-section, we only observe a single outcome for each agent \(\omega \). Causal inference copes with this problem by focusing on the evaluation of average causal effects, specifically, the causal effect over a sub-population \(\Omega ' \subseteq \Omega \) of the agents:
$$\begin{aligned} E\big (Y(t')-Y(t)\mid \omega \in \Omega '\big )=\dfrac{\int _{\omega \in \Omega '} \big [Y_\omega (t')-Y_\omega (t)\big ]dP}{P(\omega \in \Omega ')}. \end{aligned}$$
(10)
If \(\Omega ' = \Omega \) in (10), we obtain the average treatment effect of \(t'\) versus t on the outcome \({\text{ATE}} = E(Y(t')-Y(t))\).

Controlling for Unobservables

The identification of causal effects hinges on our ability to control for the confounder \(\varvec{V}\). By conditioning on \(\varvec{V}\), we are able to relate counterfactual outcome \(E(Y(t)\mid \varvec{V})\) and conditional outcome \(E(Y\mid T=t,\varvec{V})\):
$$\begin{aligned}&E\big (Y(t)\mid \varvec{V}\big )=E\big (Y(t)\mid T=t,\varvec{V}\big ) \nonumber \\&=E\bigg (\sum _{t \in {{\,\mathrm{supp}\,}}(T)} Y(t)\cdot D_t\mid D_t=1,\varvec{V}\bigg ) = E\big (Y\mid T=t,\varvec{V}\big ), \end{aligned}$$
(11)
where the first equality is due to matching property (9) and the second equality is due to (7). If \(\varvec{V}\) were observed, we would be able to identify the counterfactual expectation \(E(Y(t) \mid T=t,\varvec{V})\) by the conditional expectation \(E(Y \mid T=t,\varvec{V})\). In addition, if \(\varvec{V}\) were observed, we would be able to identify its probability distribution. The counterfactual mean E(Y(t)) could be evaluated by integrating the conditional expectation \(E(Y \mid T=t,\varvec{V})\) over the unconditional distribution of \(\varvec{V}\):
$$\begin{aligned}&E(Y(t))=\int _{\varvec{v}} E(Y(t)\mid \varvec{V}=\varvec{v})dF_{\varvec{V}}(\varvec{v}) \nonumber \\&=\int _{\varvec{v}} E(Y\mid T=t,\varvec{V}=\varvec{v})dF_{\varvec{V}}(\varvec{v}), \end{aligned}$$
(12)
where the second equality is due to (9), and \(dF_{\varvec{V}}(\varvec{v})\) denotes the probability density of the confounder \(\varvec{V}\) at point \(\varvec{v}\).

The Identification Problem

Unfortunately, when \(\varvec{V}\) is not observed, the conditional expectation of the outcome \(E(Y\mid T=t)\) does not identify the counterfactual mean E(Y(t)):
$$\begin{aligned}&E\big (Y\mid T=t\big )=\int _{\varvec{v}} E\big (Y\mid T=t,\varvec{V}=\varvec{v}\big )dF_{\varvec{V}\mid T=t}(\varvec{v}) \nonumber \\&=\int _{\varvec{v}} E\big (Y(t)\mid \varvec{V}=\varvec{v}\big )dF_{\varvec{V}\mid T=t}(\varvec{v}). \end{aligned}$$
(13)
Equation (13) clarifies how the outcome expectation \(E(Y \mid T=t)\) differs from the counterfactual mean E(Y(t)). Outcome expectation \(E(Y \mid T=t)\) is the weighted average of the counterfactual outcome \(E(Y(t) \mid \varvec{V}=\varvec{v})\) over the conditional probability of \(\varvec{V}\) given \(T=t\). On the other hand, the counterfactual mean E(Y(t)) is the weighted average of the counterfactual outcome \(E(Y(t) \mid \varvec{V}=\varvec{v})\) over the unconditional probability distribution of \(\varvec{V}\). This mismatch prevents the identification of causal effects and can promote misleading conclusions. For instance, the difference-in-means estimator for the binary outcome \(T \in \{0,1\}\) evaluates the following parameter:
$$\begin{aligned}&E\big (Y\mid T=1\big )-E\big (Y\mid T=0\big ) \nonumber \\&= \int _{\varvec{v}} E\big (Y(1)\mid \varvec{V}=\varvec{v}\big ) dF_{\varvec{V}\mid T=1}(\varvec{v}) - \int _{\varvec{v}} E\big (Y(0)\mid \varvec{V}=\varvec{v}\big ) dF_{\varvec{V}\mid T=0}(\varvec{v}). \end{aligned}$$
(14)
An identification problem arises because agent self-selection induces a correlation between choice T and the unobserved variables in \(\varvec{V}\). Large values of the difference in means in (14) could arise from the difference between the distribution of \(\varvec{V}\) conditioned on the treatment choices instead of the impact of the treatment on the outcome.
RCTs are supposed to solve the problem of selection bias by randomly assigning the treatments. The randomization secures statistical independence between the treatment T and the unobserved characteristics of the agents, namely, the confounder \(\varvec{V}\). The independence relationship \(\varvec{V} \perp\kern-5.5pt\perp T\) implies that the distribution of \(\varvec{V}\) conditional on T is equal to the unconditional distribution of \(\varvec{V}\), and therefore, the outcome difference-in-means identified the average treatment effect.
Noncompliance in RCTs potentially compromises the independence relationship between agents’ unobserved variables \(\varvec{V}\) and their final treatment assignment T. Effectively, noncompliance transforms the intended RCT experiment into an IV model where the randomization arms determine the instrumental variable.

Using IV to Control for Unobserved Variables

Identification strategies in IV models use instruments Z to control for the unobserved confounder \(\varvec{V}\) (Heckman and Pinto 2015). One approach assumes parametric models that impose functional restrictions on the choice Eq. (2) and the outcome Eq. (3). An example of this approach is Two-Stage Least Squares (Theil 1958, 1971).
Heckman and Pinto (2018) propose a nonparametric approach that explores the choice behavior induced by the instrument Z. They use counterfactual choices to determine a partition of the support of \({{\,\mathrm{supp}\,}}(\varvec{V})\) that renders T statistically independent of the counterfactual outcomes Y(t). This independence property enables them to characterize the observed data as a mixture of unobserved counterfactuals over the partition set of \({{\,\mathrm{supp}\,}}(\varvec{V})\). We use this characterization to determine the necessary and sufficient conditions to point-identify counterfactual outcomes. Additional notation is necessary to introduce their results.

The Response Vector

We control for the unobservables \(\varvec{V}\) using a partition of it generated by the choice variation induced by the instrument. A central concept in our analysis is the response vector. This is the \(N_Z\)-dimensional random vector of counterfactual choices T(z) across all the instrumental values \(z_1,...,z_{N_Z}\):
$$\begin{aligned} \varvec{S}=\Big [T(z_1) \,,\, \ldots \,,\, T\big (z_{N_Z}\big )\Big ]'. \end{aligned}$$
(15)
The support of the response vector is given by \({{\,\mathrm{supp}\,}}(S) =\{\varvec{s}_1,\ldots ,\varvec{s}_{N_S}\}\), and each element \(\varvec{s} \in {{\,\mathrm{supp}\,}}(S)\) is called a response-type. The response vector for an agent \(\omega \) is given by \(\varvec{S}_{\omega } =[T_{\omega }(z_1) \,,\, \ldots \,,\, T_{\omega }(z_{N_Z})]'\). It lists the treatment choices that agent \(\omega \) would take if it were to face each instrumental value.7
Response vector \(\varvec{S}\) has been used by several authors in distinct fields, starting with Robins and Greenland (1992) and Balke and Pearl (1993), who studied bounds for causal effects for the binary choice model. Angrist et al. (1996) use response-types to study the identification of a binary choice model.
Response vectors are called “principal strata” by Frangakis and Rubin (2002) and can be understood as the control functions of Heckman and Robb (1985) and Powell (1994). Our approach differs from these interpretations. We use the response vector \(\varvec{S}\) as a criterion to control for the unobserved confounding variable \(\varvec{V}\).
Equation (16) expresses the response vector \(\varvec{S}\) as a function of \(\varvec{V}\), while Eq. (17) expresses choice T as a function of the response vector \(\varvec{S}\) and the instrument Z. Figure 1 displays these causal relationships graphically as directed acyclic graphs (DAGs).
$$\begin{aligned}\varvec{S}&=\Big [f_T\big (z_1,\varvec{V}\big ),\ldots ,f_T(z_{N_Z},\varvec{V})\Big ]'= h(\varvec{V}), \end{aligned}$$
(16)
$$\begin{aligned}T&= \Big [D_{z_1},\ldots ,D_{z_{N_Z}}\Big ]\cdot \varvec{S}=g(Z,\varvec{S}). \end{aligned}$$
(17)
Equation (18) lists three useful properties of the response vector \(\varvec{S}\):
$$\begin{aligned} \text {(i) } \varvec{S} \perp\kern-5.5pt\perp Z,\quad \text {(ii) } Y(t)\perp\kern-5.5pt\perp T\mid \varvec{S}, \quad \text {(iii) } Y\perp\kern-5.5pt\perp T\mid (\varvec{S},Z). \end{aligned}$$
(18)
Property (i) states that the response vector is independent of the IV. This independence relationship stems from \(\varvec{V} \perp\kern-5.5pt\perp Z\) in (4) and from the fact that \(\varvec{S}\) is a function of \(\varvec{V}\). Property (ii) states a matching condition where \(\varvec{S}\) plays the role of a balancing score for \(\varvec{V}\).8 The relationship stems from \((Y(t),\varvec{V}) \perp\kern-5.5pt\perp Z\) and from the fact that \(\varvec{S}\) is a function of \(\varvec{V}\), while T is a function of Z and \(\varvec{V}\).9 Indeed, conditioned on \(\varvec{S}\), T depends only on Z, which is independent of Y(t). The last property (iii) is due to the fact that T is deterministic given T and \(\varvec{S}\).
The properties of the response vector in (18) enable us to describe a coarse partition of \({{\,\mathrm{supp}\,}}(\varvec{V})\) that renders the treatment statistically independent of counterfactual outcomes. According to (16), each \(\varvec{v} \in {{\,\mathrm{supp}\,}}(\varvec{V})\) corresponds to one and only one response-type \(\varvec{s}\in {{\,\mathrm{supp}\,}}(\varvec{S})\) such that \(h(\varvec{v})= \varvec{s}\). Thus, for each response-type \(\varvec{s}_n \in {{\,\mathrm{supp}\,}}(\varvec{S})\), we can define a subset \(\mathcal {V}_{n} \subset {{\,\mathrm{supp}\,}}(\varvec{V})\) as:
$$\begin{aligned} \mathcal {V}_n=\Big \{\varvec{v}\in {{\,\mathrm{supp}\,}}(\varvec{V})\text { such that } \big [\,\, f_T(z_1,\varvec{v}),\ldots ,f_T(z_{N_Z},\varvec{v})\big ]'=\varvec{s}_n\Big \}. \end{aligned}$$
(19)
Sets \(\mathcal {V}_{1},\ldots ,\mathcal {V}_{N_S}\) constitute a disjoint partition of \({{\,\mathrm{supp}\,}}(\varvec{V})\) and their union spans the full set; that is,
$$\begin{aligned} {{\,\mathrm{supp}\,}}(S) = \bigcup _{n=1}^{N_S} \mathcal {V}_{n} \text { such that } \mathcal {V}_{n}\bigcap \mathcal {V}_{n'} = \varnothing . \end{aligned}$$
(20)
Note that the events \(\varvec{S} =\varvec{s}_n\) and \(\varvec{V} \in \mathcal {V}_{n}\) are equivalent. The matching property (ii) in (18) states that \(Y(t)\perp\kern-5.5pt\perp T \mid (\varvec{S}=\varvec{s}_n)\), so
$$\begin{aligned} T \perp\kern-5.5pt\perp Y(t)\mid (\varvec{V}\in \mathcal {V}_{n}) \quad \text {for each }\, n\in \{1,\ldots ,N_S\}. \end{aligned}$$
(21)
Equations (19)–(21) imply that the treatment T can be understood as being randomly assigned when we condition on the subset of agents \(\omega \) that share the same response-type \(\varvec{s}\). If response-types were observed, we could use (ii) in (18) to identify the expected value of counterfactual outcomes by taking the expected values of the observed outcome conditioned on the treatment choice and the response-types.
A significant challenge is that the response-types that determine the partition of the support of \(\varvec{V}\) are not observed. Nevertheless, the partition substantially simplifies the identification problem. It reframes the identification of counterfactuals as a problem of identifying a finite mixture of unobserved distributions.

Identification as a Mixture Problem

We gain a deeper understanding by reframing the identification problem as a particular case of the identification of unobserved mixture distributions (B. L. S. P. Rao 1992). The general mixture model is given by:
$$\begin{aligned} F(Y)= \int F_\theta (Y)dG(\theta ), \end{aligned}$$
(22)
where F(Y) stands for the cumulative distribution function (cdf) of an observed outcome Y, \((F_\theta (Y))_{\theta \in \Theta }\) is a collection of cdf’s indexed by a random variable \(\theta \in \Theta \) that takes a value in the (possibly infinite) set \(\Theta \), and G denotes the cdf of \(\theta \). F(Y) is a mixture distribution, the cdf’s \((F_\theta (Y))_{\theta \in \Theta }\) are component distributions, G is the mixing distribution, and \(\theta \) is the unobserved latent (or mixing) variable. B. L. S. P. Rao (1992) notes that if the mixing distribution G is finite, then a necessary and sufficient condition for its identification is that the family of cdf’s \((F_\theta (Y))_{\theta \in \Theta }\) be linearly independent as functions on Y. We use the mixture model (22) as a starting point.
As mentioned, the identification of causal parameters hinges on controlling for unobserved variables \(\varvec{V}\). A natural candidate for the values of \(\theta \) in (22) are the elements \(\varvec{v} \in {{\,\mathrm{supp}\,}}(\varvec{V})\). We replace the cdf’s in (22) by the expectation of \(\kappa (Y)\), where \(\kappa :{{\,\mathrm{supp}\,}}(Y)\rightarrow \mathbb {R}\) is an arbitrary real-valued function.
$$\begin{aligned} E(\kappa (Y))&= \int _{\varvec{v}} E(\kappa (Y)\mid \varvec{V}=\varvec{v})dF_{\varvec{V}}(\varvec{v}) \end{aligned}$$
(23)
$$\begin{aligned}&=\sum _{n=1}^{N_S} E(\kappa (Y)\mid \varvec{V}\in \mathcal {V}_{n})P(\varvec{V}\in \mathcal {V}_{n}). \end{aligned}$$
(24)
Equation (23) describes the expected outcome using the mixture model in (22), where \(\theta \) stands for the elements \(\varvec{v} \in {{\,\mathrm{supp}\,}}(\varvec{V})\). Equation (24) uses the partition of \({{\,\mathrm{supp}\,}}(\varvec{V})\) in (19) to generate a discrete mixing distribution across the partition sets of the support of \(\varvec{V}\). Condition (21) in Section Using IV to Control for Unobserved Variables enables us to express the conditional expectation \(E(\kappa (Y)\mid T=t)\) in terms of the conditional counterfactuals \(E(\kappa (Y(t))\mid \varvec{V})\):
$$\begin{aligned} E\big (\kappa (Y)\mid T=t\big )=\sum _{n=1}^{N_S} E\Big (\kappa \big (Y(t)\big )\mid \varvec{V}\in \mathcal {V}_n\Big ) P\big (\varvec{V}\in \mathcal {V}_n\mid T=t\big ). \end{aligned}$$
(25)
Equation (25) relates a single conditional outcome expectation with several outcome counterfactuals for each choice value \(t\in {{\,\mathrm{supp}\,}}(T)\). The equation does not assess sufficient information on observed data to secure the identification of the counterfactual outcomes. The instrumental variable Z generates additional variation of observed quantities (left-hand side of (25)) without increasing the number of unobserved counterfactuals (right-hand side of (25)):
$$\begin{aligned}E\big (\kappa (Y)\mid T=t,Z=z\big ) &= \sum _{n=1}^{N_S} E\Big (\kappa \big (Y(t)\big )\mid Z=z,\varvec{V}\in \mathcal {V}_{n}\Big )P\big (\varvec{V}\in \mathcal {V}_{n}\mid T=t,Z=z\big )\end{aligned} $$
(26)
$$\begin{aligned}&= \sum _{n=1}^{N_S} E(\kappa (Y(t))\mid Z=z,\varvec{S} =\varvec{s}_n)P(\varvec{S} =\varvec{s}_n\mid T=t,Z=z) \end{aligned}$$
(27)
$$\begin{aligned}&=\sum _{n=1}^{N_S} E\Big (\kappa \big (Y(t)\big )\mid \varvec{S} =\varvec{s}_n\Big ) \dfrac{P\big (T\!=\!t\mid Z\!=\!z,\varvec{S}\!=\!\varvec{s}_n\big )P\big (\varvec{S} \!=\!\varvec{s}_n\mid Z\!=\!z\big )}{P\big (T=t\mid Z=z\big )} \end{aligned}$$
(28)
$$\begin{aligned}&= \sum _{n=1}^{N_S} \mathbf {1}\big [T=t\mid Z=z,\varvec{S} =\varvec{s}_n\big ]E\Big (\kappa \big (Y(t)\big )\mid \varvec{S}=\varvec{s}_n\Big ) \frac{P(\varvec{S} =\varvec{s}_n)}{P\big (T=t\mid Z=z\big )}. \end{aligned}$$
(29)
Equation (26) rewrites (25) conditioning on instrument Z. Equation (27) uses the fact that \(Z \perp\kern-5.5pt\perp \varvec{S}\) and that \(\varvec{V}\in \mathcal {V}_{n}\) and \(\varvec{S} =\varvec{s}_n\) are equivalent events. Equation (28) uses Bayes rule to rewrite the conditional expectation \(P(\varvec{S} =\varvec{s}_n \mid T=t,Z=z)\). Equation (29) employs \(Z \perp\kern-5.5pt\perp \varvec{S}\) again and invokes the fact that T is deterministic when conditioned on \(\varvec{S}\) and Z. The response vector \(\varvec{S}\) enables us to connect observed data with a mixture of counterfactual outcomes conditioned on response-types. This produces our main equation:
$$\begin{aligned}&E\big (\kappa (Y)\mid T=t,Z=z\big )P\big (T=t\mid Z=z\big ) \nonumber \\&= \sum _{n=1}^{N_S} \mathbf {1} \big [T=t\mid Z=z,\varvec{S} =\varvec{s}_n\big ]E\big (\kappa (Y(t))\mid \varvec{S} =\varvec{s}_n\big )P(\varvec{S}=\varvec{s}_n). \end{aligned}$$
(30)
If \(\kappa (Y)=Y\), (30) generates an equality relating the expected values of observed outcomes with expected counterfactual outcomes. Setting \(\kappa (Y)= \mathbf {1}[Y\le y]\) generates the cdf of the observed outcome with the unobserved cdf of counterfactual outcomes. Setting \(\kappa (Y)\) to 1 in (30) generates the propensity score equality:
$$\begin{aligned} P\big (T=t\mid Z=z\big )=\sum _{n=1}^{N_S} \mathbf {1} \big [T=t\mid \varvec{S}=\varvec{s}_n,Z=z\big ]P(\varvec{S}=\varvec{s}_n). \end{aligned}$$
(31)
Replacing \(\kappa (Y)\) by any variable X such that \(X\perp\kern-5.5pt\perp T \mid \varvec{S}\) generates an equation that relates baseline variables with response-types:
$$\begin{aligned}&E\big (X\mid T=t,Z\big )P\big (T=t\mid Z\big ) \nonumber \\&=\sum _{n=1}^{N_S} \mathbf {1} \big [T=t\mid \varvec{S}=\varvec{s}_n,Z\big ] E\big (X\mid \varvec{S}=\varvec{s}_n\big )P(\varvec{S}=\varvec{s}_n). \end{aligned}$$
(32)

Identification Criteria

We now investigate the necessary and sufficient conditions for identifying counterfactual outcomes and response-type probabilities. To do so, we express our main Eq. (30) as a system of linear equations.
Observed parameters are stacked in vectors \(\varvec{P}_{Z}(t)\) and \(\varvec{Q}_{Z}(t)\) below:
$$\begin{aligned}&\varvec{P}_{Z}(t)=\big [P(T=t\mid Z=z_1),\ldots ,P(T=t\mid Z = z_{N_Z})\big ]', \end{aligned}$$
(33)
$$\begin{aligned}&\varvec{Q}_{Z}(t)=\Big [E\big (\kappa (Y)\mid T=t,Z=z_1\big ),\ldots , E\big (\kappa (Y)\mid T=t,Z=z_{N_Z}\big )\Big ]', \end{aligned}$$
(34)
where \(\varvec{P}_{Z}(t)\) is the vector of observed propensity scores, and \(\varvec{Q}_{Z}(t)\) is the vector of outcome expectations. The unobserved parameters are stacked in the vectors \(\varvec{P}_{S}\) and \(\varvec{Q}_{S}(t)\) below:
$$\begin{aligned}&\varvec{P}_S=\big [P(\varvec{S}=\varvec{s}_1),\dotsc ,P(\varvec{S}=\varvec{s}_{N_S})\big ]', \end{aligned}$$
(35)
$$\begin{aligned}&\varvec{Q}_{S}(t)=\Big [E\Big (\kappa \big (Y(t)\big )\mid \varvec{S}=\varvec{s}_1\Big ),\dotsc , E\Big (\kappa \big (Y(t)\big )\mid \varvec{S}=\varvec{s}_{N_S}\Big )\Big ]', \end{aligned}$$
(36)
where \(\varvec{P}_{S}\) is the vector of response-type probabilities, and \(\varvec{Q}_{S}(t)\) is the vector of counterfactual outcomes conditioned on response-types.
Response matrix \(\varvec{R}\) stacks the response-types in \({{\,\mathrm{supp}\,}}(\varvec{S})\) as columns:
$$\begin{aligned} \varvec{R}=[\varvec{s}_1,\ldots ,\varvec{s}_{N_S}]. \end{aligned}$$
(37)
Matrix \(\varvec{R}\) has dimension \(N_Z \times N_S\). The entry in the ith row and nth column of \(\varvec{R}\) is denoted by \(\varvec{R}[i,n] = (T \mid Z=z_i,\varvec{S}=\varvec{s}_n), i \in \{1,\ldots ,N_Z\}, n \in \{1,\ldots ,N_S\}\). We use \(\varvec{R}[i,\cdot ]\) to denote the ith row of \(\varvec{R}\) and \(\varvec{R}[\cdot ,n]\) to denote the nth column of \(\varvec{R}\). IV relevance condition (5) prevents identical rows in \(\varvec{R}\).
We use \(\varvec{B}_t = \mathbf {1}[\varvec{R} =t]\) to denote a binary matrix of the same dimension of \(\varvec{R}\) that takes value 1 if the respective element in \(\varvec{R}\) is equal to t and zero otherwise. An entry of \(\varvec{B}_t\) is given by \(\varvec{B}_t[i,n] = \varvec{1}[T=t \mid Z=z_i,\varvec{S}=\varvec{s}_n]\). Let \(\varvec{B}_{T}\) be a binary matrix of dimension \((N_Z\cdot N_T) \times N_S\) generated by stacking \(\varvec{B}_t\) as t ranges over \({{\,\mathrm{supp}\,}}(T): \varvec{B}_{T} = [ \varvec{B}_{t_1}',\ldots , \varvec{B}_{t_{N_T}}']'\) and let \(\varvec{P}_Z\) be the \((N_Z\cdot N_T) \times 1\) vector that stacks the propensity scores \(P_Z(t)\) across the treatment values: \(\varvec{P}_Z= [ \varvec{P}(t_1)',\ldots , \varvec{P}(t_{N_T})']'\). In this notation, Eqs. (30) and (31) can be written in matrix form by the following equations:
$$\begin{aligned}&\varvec{P}_Z=\varvec{B}_T\varvec{P}_S, \end{aligned}$$
(38)
$$\begin{aligned}&\varvec{Q}_Z(t)\odot \varvec{P}_Z=\varvec{B}_t\varvec{Q}_S(t)\odot \varvec{P}_S, \end{aligned}$$
(39)
where \(\odot \) denotes the Hadamard (element-wise) multiplication.
The response matrix \(\varvec{R}\) and the binary matrices \(\varvec{B}_t\), \(t\in {{\,\mathrm{supp}\,}}(T)\), are deterministic, as T is known given Z and \(\varvec{S}\). If \(\varvec{B}_t\) and \(\varvec{B_T}\) were invertible, \(\varvec{Q}_S(t)\) and \(\varvec{P}_S\) would be identified. However, such inverses do not always exist. In their place, we can use generalized inverses. Let \(\varvec{B}^+_T\) and \(\varvec{B}_t^+\) be the Moore-Penrose pseudo-inverses10 of \(\varvec{B}_T\) and \(\varvec{B}_t, \, t\in {{\,\mathrm{supp}\,}}(T)\). Under this notation, we can state the following result:
Theorem T-1
The general solution for the system of linear equations in (38) and (39):
$$\begin{aligned} \varvec{P}_S=\varvec{B}_{T}^{+}\varvec{P}_Z+\varvec{K}_T\varvec{\lambda } \end{aligned}$$
(40)
and
$$\begin{aligned} \varvec{Q}_S(t)\odot \varvec{P}_S=\varvec{B}^{+}_t\varvec{Q}_Z (t)+\varvec{K}_t \varvec{\tilde{\lambda }} \end{aligned}$$
(41)
such that
$$\begin{aligned} \varvec{K}_T=\varvec{I}_{N_S}-\varvec{B}_T^+\varvec{B}_T \quad \text {and}\quad \varvec{K}_t=\varvec{I}_{N_S}-\varvec{B}^+_t\varvec{B}_t,\quad t\in {{\,\mathrm{supp}\,}}(T), \end{aligned}$$
(42)
where \(\varvec{I}_{N_S}\) denotes an identity matrix of dimension \(N_S\), and \(\varvec{\lambda }\) and \(\varvec{\tilde{\lambda }}\) are arbitrary \(N_S\)-dimensional vectors (with the same dimension as \(\varvec{P}_S\)).
Proof
See Appendix A.1. \(\square \)
Matrices \(\varvec{K}_{T}\) and \(\varvec{K}_t\) are orthogonal projection matrices that depend only on matrices \(\varvec{B}_{T}\) and \(\varvec{B}_t, t\in {{\,\mathrm{supp}\,}}(T)\). Theorem T-1 is useful to provide the general conditions for identification of response probabilities and counterfactual means:
Corollary C-1
In the IV model (4)–(5), if there exists a real-valued \(N_S\)-dimensional vector \(\varvec{\lambda }\) such that \(\varvec{\lambda }'\varvec{K}_{T}= \varvec{0}\), then \(\varvec{\lambda }'\varvec{P}_{S}\) is identified. In addition, if there exists a real-valued \(N_S\)-dimensional vector \(\varvec{\tilde{\lambda }}\) such that \(\varvec{\tilde{\lambda }}'\varvec{K}_t= \varvec{0}\), then \(\varvec{\tilde{\lambda }}'\varvec{Q}_{S}(t)\) is identified.
Proof
See Heckman and Pinto (2018) or Appendix A.2. \(\square \)
Corollary C-1 shows that the nonparametric identification of counterfactuals depends only on properties of the response matrix \(\varvec{R}\). If \(\varvec{B}_{T}\) had full column-rank, then \(\varvec{B}_{T}^+\varvec{B}_{T} =\varvec{I}_{N_S}\) and \(\varvec{K}_{T} = \varvec{0}\). In this case, each response-type probability is identified. Indeed, \(\varvec{\lambda }'\varvec{P}_{S}\) is identified for any real vector \(\varvec{\lambda }\) of dimension \(N_S\) including those that indicate each of the response-type probabilities.11
Binary matrix \(\varvec{B}_{T}\) contains each \(\varvec{B}_t, t\in {{\,\mathrm{supp}\,}}(T)\). Thus, the conditions for identifying response-type probabilities are weaker than those for identifying counterfactual outcomes. In particular, a full-rank \(\varvec{B}_{T}\) does not imply that matrices \(\varvec{B}_t\), \(t\in {{\,\mathrm{supp}\,}}(T)\), are full-rank. Therefore, the identification of the response-type probabilities does not automatically identify corresponding mean counterfactual outcomes. Corollary C-2 formalizes this discussion.
Corollary C-2
The following relationships hold for the IV model (4)–(5):
$$\begin{aligned}\text {Vector }\varvec{P}_S \text { is point-identified }&\Leftrightarrow {{\,\mathrm{rank}\,}}(\varvec{B}_{T})=N_S, \end{aligned}$$
(43)
$$\begin{aligned}\text {Vector }\varvec{Q}_S(t)\text { is point-identified }&\Leftrightarrow {{\,\mathrm{rank}\,}}(\varvec{B}_{t})=N_S. \end{aligned}$$
(44)
Also, if (44) holds, then \(E(\kappa (Y(t)))\) is identified by \(\iota ' \varvec{B}^+_{t}\varvec{Q}_Z(t)\), where \(\iota \) is an \(N_S\)-dimensional vector of ones.
Proof
See Heckman and Pinto (2018) or Appendix A.3. \(\square \)
Versions of Corollary C-2 are found in the literature on the identifiability of finite mixtures (see, e.g., Yakowitz and Spragins 1968 and B. L. S. P. Rao 1992). Given binary matrices \(\varvec{B}_{T}\) and \(\varvec{B}_t\), \(t\in \{1,\ldots , N_T\}\), the problem of identifying \(\varvec{P}_{S}\) and \(\varvec{Q}_S(t)\) is equivalent to the problem of identifying finite mixtures of distributions where \(\varvec{B}_{T}\) and \(\varvec{B}_t\) play the roles of kernels of mixtures. Mixture components are the corresponding counterfactual outcomes conditional on the response-types, and mixture probabilities are the response-type probabilities.

Understanding the Identification Challenge

Identification criteria (43) and (44) show that the identification of causal parameters depends solely on the properties of the response matrix \(\varvec{R}\). In particular, the identification of the counterfactual outcomes in \(\varvec{Q}_S(t)\) depends on the column-rank of the binary matrix \(\varvec{B}_{t}\). If the column-rank of \(\varvec{B}_{t}\) is \(N_S\) (full column-rank), then \(\varvec{B}_{t}^+\varvec{B}_{t} =\varvec{I}_{N_S}\) and \(\varvec{K}_{t} = \varvec{0}\). In this case \(\varvec{\xi }'\varvec{Q}_S(t)\) and \(\varvec{\xi }'\varvec{P}_S\) are identified for any real vector \(\varvec{\xi }\) of dimension \(N_S\), including all unit vectors with a value of 1 in the nth entry and 0 elsewhere. In summary, all counterfactual outcomes in \(\varvec{Q}_S(t)\) would be identified.
Identification criteria (43) and (44) pose a major identification problem. The column rank of any binary matrix \(\varvec{B}_{t}\) is less than or equal to its row-dimension \(N_Z\). On the other hand, the dimension of \(\varvec{Q}_S(t)\) is the number of response-types \(N_S\) that usually far exceeds the number of IV-values \(N_Z\). For instance, under no restrictions, the total number of potential response-types is \(N_T^{N_Z}\). Thus, a requirement for generating any identification result on counterfactual outcomes is to reduce the number of response-types that the choice model admits.
A common approach to decreasing the number of response-types is to impose functional restrictions on the choice equation. Heckman and Pinto (2018) and Pinto (2021a)12 adopt a different approach that relies on economic choice theory. They combine choice incentives with revealed preference analysis to generate choice restrictions that systematically eliminate potential response-types.

Using Rao’s Orthogonal Design to Address Identification Problems Arising from Noncompliance in Social Experiments

We propose a novel application of Rao’s orthogonal design (C. R. Rao 1946a, b, 1947, 1949). Rao’s methodology is traditionally applied to investigate the effects of combinations of treatment factors. The method determines randomization groups exposed to an orthogonal arrangement of treatment factors.
Similar to Rao’s work, ours uses an RCT setting. Our method differs from Rao’s original methodology in two ways: (1) we consider the possibility of noncompliance; and (2) the orthogonal array design is not used to combine treatment factors but to determine choice incentives across a finite number of treatment alternatives.
We use revealed preference analysis to translate choice incentives into choice restrictions that eliminate response-types. This elimination process generates the response matrix \(\varvec{R}\), which contains all the necessary information to examine the nonparametric identification of causal parameters.

Examining Choice Incentives Determined by an Orthogonal Array

Noncompliance in social experiments effectively transforms the original RCT into an IV model where each instrumental value represents a randomization arm. It implicitly adds a choice probability, which we explicitly model in the IV model. The experimenter cannot impose a treatment status upon participants but rather incentivizes them toward a treatment choice. In this setup, orthogonal arrays play the role of the incentive matrix of Pinto (2021a).13 Each factor stands for a treatment choice and each run stands for a randomization arm that incentivizes one or several treatment alternatives.
We illustrate the method using the orthogonal array OA (4, 3, 2, 2) discussed in Section Introduction. This design can be understood as an RCT with four randomization arms \(Z\in \{z_1,z_2,z_3,z_4\}\) and three treatment statuses \(T \in \{t_1,t_2,t_3\}\), where \(z_1\) denotes the control group that offers no incentive toward any choice, \(z_2\) incentivizes participants toward choices \(t_1\) and \(t_2\), \(z_3\) incentivizes them toward choices \(t_1\) and \(t_3\), and \(z_4\) incentivizes them toward choices \(t_2\) and \(t_3\). This incentive pattern is described by an ordinal incentive matrix:
$$ \begin{gathered} \begin{array}{*{20}c} \quad\qquad\qquad\qquad\qquad\quad{t_{1} } & {t_{2} } & {t_{3} } \\ \end{array} \hfill \\ {\text{Incentive Matrix }}L = \left[ {\begin{array}{*{20}c} 0 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 1 \\ \end{array} } \right]\begin{array}{*{20}c} {z_{1} } \\ {z_{2} } \\ {z_{3} } \\ {z_{4} } \\ \end{array} \hfill \\ \end{gathered} $$
(45)
Each column displays which choices are incentivized across all values of the instruments. The incentive matrix \(\varvec{L}\) in (45) is an orthogonal array of type OA (4, 3, 2, 2). The factors refer to treatment choices; the runs, to instrumental values.

Choice Restrictions

Classical revealed preference analysis can be used to translate choice incentives into choice restrictions. Pinto (2021a)14 shows that the Weak Axiom of Revealed Preferences (WARP) and Normal Choice generate the choice rule described below:
$$\begin{aligned} \text {If } T_{\omega }(z)=t \text { and } \varvec{L}[z',t']-\varvec{L}[z,t']\le \varvec{L}[z',t]-\varvec{L}[z,t], \text { then } T_{\omega }(z') \ne t'. \end{aligned}$$
(46)
Choice rule (46) is intuitive. It states that if an agent \(\omega \) chooses choice t under z, and the change from z to \(z'\) induces greater incentives toward t than toward \(t'\), then the same agent \(\omega \) does not choose \(t'\) under \(z'\).
Choice rules like (46) restrict \(\varvec{R}\). They enable analysts to translate any incentive matrix into a set of choice restrictions and generate a response matrix. A simple algorithm efficiently implements the task of moving from an incentive matrix to a response matrix. We now clarify this process.
Consider an agent \(\omega \) that chooses \(t_1\) if it were assigned to \(z_1\); that is, \(T_{\omega }(z_1) = t_1\). We seek to examine whether the agent would choose \(t_2\) if it were assigned to \(z_2\), \(z_3\), or \(z_4\).
The first row of Table 1 compares the incentive gains for choosing \(t_1\) and \(t_2\) if the instrument were to change from \(z_1\) to \(z_2\). The incentives to choose either \(t_1\) or \(t_2\) increase, which satisfies the incentive requirement of choice rule (46). Therefore, we can state that an agent that chooses \(t_1\) under \(z_1\) does not choose \(t_2\) under \(z_2\). This choice restriction is summarized as \(T_{\omega }(z_1)=t_1 \Rightarrow T_{\omega }(z_2) \ne t_2\).
The second row compares the incentives to choose \(t_3\) for the same instrumental change (\(z_1\) to \(z_2\)). The incentive to choose \(t_1\) increases, while the incentive to chose \(t_3\) does not. Choice rule (46) applies and the agent does not switch to \(t_3\); that is, \(T_{\omega }(z_1)=t_1 \Rightarrow T_{\omega }(z_2) \ne t_3\).
The third and fourth rows of Table 1 compare the incentives for choosing \(t_1\) versus \(t_2\) (third row) and \(t_1\) versus \(t_3\) (fourth row) when the instrument changes from \(z_1\) to \(z_3\). The incentive to choose \(t_1\) increases, while the incentives to choose either \(t_2\) or \(t_3\) do not. Choice rule (46) holds and the agent does not choose \(t_2\) or \(t_3\); namely, \(T_{\omega }(z_1)=t_1 \Rightarrow T_{\omega }(z_3) \notin \{t_2,t_3\}\).
The last two rows investigate the instrumental change from \(z_1\) to \(z_4\). The incentives to choose \(t_2\) or \(t_3\) increase, while the incentive to choose \(t_1\) does not. The incentive requirement of choice rule (46) is not satisfied, and therefore, no choice restriction is generated.
Table 2 presents all the choice restrictions generated by applying choice rule (46) to each combination of treatment pairs \((t,t') \in \{t_1,t_2,t_3\}^2\) and to each pair of instrumental values \((z,z') \in \{z_1,z_2,z_3,z_4\}^2\).
Table 1
Applying Choice Rule (46) to \(T_{\omega }(z_1)=t_1\)
Counterfactual choice
Incentive condition
 
Choice restriction
\(T(z_1)=t_1\)
\(\varvec{L}[z_2,t_2]-\varvec{L}[z_1,t_2]=1\le 1=\varvec{L}[z_2,t_1]-\varvec{L}[z_1,t_1]\)
\(\Rightarrow \)
\(T(z_2)\ne t_2\)
\(T(z_1)=t_1\)
\(\varvec{L}[z_2,t_3]-\varvec{L}[z_1,t_3]=0\le 1=\varvec{L}[z_2,t_1]-\varvec{L}[z_1,t_1]\)
\(\Rightarrow \)
\(T(z_2)\ne t_3\)
\(T(z_1)=t_1\)
\(\varvec{L}[z_3,t_2]-\varvec{L}[z_1,t_2]=0\le 1=\varvec{L}[z_3,t_1]-\varvec{L}[z_1,t_1]\)
\(\Rightarrow \)
\(T(z_3)\ne t_2\)
\(T(z_1)=t_1\)
\(\varvec{L}[z_3,t_3]-\varvec{L}[z_1,t_3]=0\le 1=\varvec{L}[z_3,t_1]-\varvec{L}[z_1,t_1]\)
\(\Rightarrow \)
\(T(z_3)\ne t_3\)
\(T(z_1)=t_1\)
\(\varvec{L}[z_4,t_2]-\varvec{L}[z_1,t_2]=1\nleq 0=\varvec{L}[z_4,t_1]-\varvec{L}[z_1,t_1]\)
\(\Rightarrow \)
No Restriction
\(T(z_1)=t_1\)
\(\varvec{L}[z_4,t_3]-\varvec{L}[z_1,t_3]=1\nleq 0=\varvec{L}[z_4,t_1]-\varvec{L}[z_1,t_1]\)
\(\Rightarrow \)
No Restriction
This table presents all the choice restrictions generated by applying the choice rule (46) to each of the combination of choices \((t,t') \in \{t_1,t_2,t_3\}\) and instrumental values \((z,z') \in \{z_1,z_2,z_3,z_4\}\) of the incentive matrix (45)
Table 2
Choice Restrictions Generated by Incentive Matrix (45)
1
\(T_{\omega }(z_1)=t_1\)
\(\Rightarrow \)
\(T_{\omega }(z_2)\notin \{t_2,t_3\}\) and \(T_{\omega }(z_3)\notin \{t_2,t_3\}\)
2
\(T_{\omega }(z_2)=t_1\)
\(\Rightarrow \)
\(T_{\omega }(z_1)\ne t_2\) and \(T_{\omega }(z_3)\ne t_2\)
3
\( T_{\omega }(z_3)=t_1\)
\(\Rightarrow \)
\( T_{\omega }(z_1)\ne t_3\) and \( T_{\omega }(z_2)\ne t_3\)
4
\( T_{\omega }(z_4)=t_1\)
\(\Rightarrow \)
\( T_{\omega } (z_1) \notin \{t_2,t_3\}\) and \( T_{\omega } (z_2) \notin \{t_2,t_3\}\) and \( T_{\omega } (z_3) \notin \{t_2,t_3\}\)
5
\( T_{\omega }(z_1)=t_2\)
\(\Rightarrow \)
\( T_{\omega } (z_2) \notin \{t_1,t_3\}\) and \( T_{\omega } (z_4) \notin \{t_1,t_3\}\)
6
\( T_{\omega }(z_2)=t_2\)
\(\Rightarrow \)
\( T_{\omega }(z_1)\ne t_1\) and \( T_{\omega }(z_4)\ne t_1\)
7
\(T_{\omega }(z_3)=t_2\)
\(\Rightarrow \)
\(T_{\omega } (z_1) \notin \{t_1,t_3\}\) and \( T_{\omega } (z_2) \notin \{t_1,t_3\}\) and \( T_{\omega } (z_4) \notin \{t_1,t_3\}\)
8
\( T_{\omega }(z_4)=t_2\)
\(\Rightarrow \)
\( T_{\omega }(z_1)\ne t_3\) and \( T_{\omega }(z_2)\ne t_3\)
9
\( T_{\omega }(z_1)=t_3\)
\(\Rightarrow \)
\(T_{\omega } (z_3) \notin \{t_1,t_2\}\) and \( T_{\omega } (z_4) \notin \{t_1,t_2\}\)
10
\( T_{\omega }(z_2)=t_3\)
\(\Rightarrow \)
\( T_{\omega } (z_1) \notin \{t_1,t_2\}\) and \( T_{\omega } (z_3) \notin \{t_1,t_2\}\) and \( T_{\omega } (z_4) \notin \{t_1,t_2\}\)
11
\( T_{\omega }(z_3)=t_3\)
\(\Rightarrow \)
\( T_{\omega }(z_1)\ne t_1\) and \(T_{\omega }(z_4)\ne t_1\)
12
\(T_{\omega }(z_4)=t_3\)
\(\Rightarrow \)
\( T_{\omega }(z_1)\ne t_2\) and \(T_{\omega }(z_3)\ne t_2\)
This table presents all the choice restrictions generated by applying the choice rule (46) to each of the combination of choices \((t,t') \in \{t_1,t_2,t_3\}\) and instrumental values \((z,z') \in \{z_1,z_2,z_3,z_4\}\) of the incentive matrix (45)

Generating the Response Matrix

The choice restrictions of Table 2 can be used to determine the set of admissible response-types that the response-vector \(\varvec{S} = [T(z_1),T(z_2),T(z_3),T(z_4)]'\) can take. The first panel of Table 2 examines the case where \(T(z)=t_1\) for \(z\in \{z_1,z_2,z_3,z_4\}\). The first restriction states that if \(T(z_1)=t_1\), then \(T(z_2)=T(z_3)=t_1\). Given \(T(z_1)=t_1\), there are only three possible response-types that comply with this choice restriction: \(\varvec{s}_1 = [t_1,t_1,t_1,t_1]'\), \(\varvec{s}_2 = [t_1,t_1,t_1,t_2]'\), and \(\varvec{s}_3 = [t_1,t_1,t_1,t_3]'\). The second and third choice restrictions of Table 2 are subsumed by the first restriction. The fourth choice restriction implies that the only admissible response-type for which \(T(z_4)=t_1\) is \(\varvec{s}_1 = [t_1,t_1,t_1,t_1]'\).
The second panel of Table 2 examines the case where \(T(z)=t_2\) for \(z\in \{z_1,z_2,z_3,z_4\}\). The third panel examines the case where \(T(z)=t_3\) for \(z\in \{z_1,z_2,z_3,z_4\}\). We apply the elimination analysis of the first panel to the second and third panels. There are only nine admissible response-types that comply with each of the 12 choice restrictions of Table 2. Those are displayed in the response matrix below:15
$$ \begin{gathered} \begin{array}{*{20}c}\qquad\qquad\qquad\qquad\quad \,\,{s_{1} } & {s_{2} } & {s_{3} } & {s_{4} } & {s_{5} } & {s_{6} } & {s_{7} } & {s_{8} } & {s_{9} } \\ \end{array} \hfill \\ {\text{Respose}}\;{\text{Matrix }}R = \left[ {\begin{array}{*{20}c} {t_{1} } & {t_{1} } & {t_{1} } & {t_{2} } & {t_{2} } & {t_{2} } & {t_{3} } & {t_{3} } & {t_{3} } \\ {t_{1} } & {t_{1} } & {t_{1} } & {t_{2} } & {t_{2} } & {t_{2} } & {t_{1} } & {t_{2} } & {t_{3} } \\ {t_{1} } & {t_{1} } & {t_{1} } & {t_{1} } & {t_{2} } & {t_{3} } & {t_{3} } & {t_{3} } & {t_{3} } \\ {t_{1} } & {t_{2} } & {t_{3} } & {t_{2} } & {t_{2} } & {t_{2} } & {t_{3} } & {t_{3} } & {t_{3} } \\ \end{array} } \right]\begin{array}{*{20}c} {z_{1} } \\ {z_{2} } \\ {z_{3} } \\ {z_{4} } \\ \end{array} \hfill \\ \end{gathered} $$
(47)

Identification and Estimation

Theorem T-2 uses the identification criteria in C-1 to recover all causal parameters that are identified.
Theorem T-2
The response matrix (47) enables the identification of the following causal parameters:
1
All response-type probabilities \(P(\varvec{S}=\varvec{s}_j)\); \(j=1,\ldots ,9\).
 
2
The expectation (and distribution) of the following counterfactual outcomes:
 
Response-Types
Treatment Choices
\(t_1\)
\(t_2\)
\(t_3\)
Always-Takers
\(E(Y(t_1)\mid \varvec{S}=\varvec{s}_1)\)
\(E(Y(t_2)\mid \varvec{S}=\varvec{s}_5)\)
\(E(Y(t_3)\mid \varvec{S}= \varvec{s}_9)\)
Switchers
\(E(Y(t_{1})\mid \varvec{S} = \varvec{s}_{4})\)
\(E(Y(t_{2})\mid \varvec{S}= \varvec{s}_{2})\)
\(E(Y(t_{3})\mid \varvec{S}= \varvec{s}_{3})\)
\(E(Y(t_{1})\mid \varvec{S} = \varvec{s}_{7})\)
\(E(Y(t_{2})\mid \varvec{S}= \varvec{s}_{8})\)
\(E(Y(t_{3})\mid \varvec{S}= \varvec{s}_{6})\)
Partially Identified
\(E(Y(t_{1})\mid \varvec{S} \in \{\varvec{s}_{2},\varvec{s}_{3}\})\)
\(E(Y(t_{2})\mid \varvec{S} \in \{\varvec{s}_{4},\varvec{s}_{6}\})\)
\(E(Y(t_{3})\mid \varvec{S} \in \{\varvec{s}_{7},\varvec{s}_{8}\})\)
Proof
See Appendix A.5. \(\square \)
The response matrix (47) enables the researcher to use well-known econometric methods to evaluate causal effects. For instance, the first row (\(z_1\)) and the last row (\(z_4\)) of the response matrix (47) differ for two response-types: \(\varvec{s}_2\) and \(\varvec{s}_3\) take the value \(t_1\) for \(z_1\) and the values \(t_2\) and \(t_3\) for \(z_4\), respectively. It is easy to show that the 2SLS estimator that uses the \(t_1\)-indicator \(D_{t_t} = \mathbf {1}[T=t_1]\) as the treatment and employs only the IV-values \(z_1\) and \(z_4\) evaluates the causal effect of choosing \(t_1\) versus not choosing \(t_1\) for response-types \(\varvec{s}_2\) and \(\varvec{s}_3\):
$$\begin{aligned} \dfrac{E\big (Y\mid Z=z_1\big )-E\big (Y\mid Z=z_4\big )}{P\big (T=t_1\mid Z=z_1\big )- P\big (T=t_1\mid Z=z_1\big )}= E\Big (Y(t_1) - Y(\bar{t}_1)\mid \varvec{S}\in \{\varvec{s}_2,\varvec{s}_3\}\Big ), \end{aligned}$$
(48)
where \(Y(\bar{t_1})\) stands for the counterfactual outcome of not choosing \(t_1\):
$$\begin{aligned}&E (Y(\bar{t}_1)\mid \varvec{S} \in \{\varvec{s}_2,\varvec{s}_3\} ) \nonumber \\&= \dfrac{E\left(Y(t_2)\mid \varvec{S} =\varvec{s}_2\right)P(\varvec{S}=\varvec{s}_2) + E\big (Y(t_3)\mid \varvec{S} = \varvec{s}_3\big )P(\varvec{S}=\varvec{s}_3)}{P(\varvec{S}= \varvec{s}_2)+P(\varvec{S}=\varvec{s}_3)}. \end{aligned}$$
(49)
We can make the same analogy for the 2SLS that uses the indicator \(D_{t_3}\) for treatment and employs data from \(z_1\) and \(z_2\). The 2SLS estimator evaluates the causal effect of choosing \(t_3\) versus not choosing \(t_3\) for response-types \(\varvec{s_7}\) and \(\varvec{s_8}\); that is, \(E(Y(t_3) - Y(\bar{t_3})\mid \varvec{S} \in \{\varvec{s}_7,\varvec{s}_8\})\). Finally, the 2SLS that uses the treatment indicator \(D_{t_3}\) and IV-values \(z_3\) and \(z_4\) evaluates the causal effect of choosing \(t_2\) versus not choosing \(t_2\) for response-types \(\varvec{s}_4\) and \(\varvec{s}_6\); namely, \(E(Y(t_2) - Y(\bar{t_2})\mid \varvec{S} \in \{\varvec{s}_4,\varvec{s}_6\})\).

Benefits of Orthogonal Designs

The benefits of orthogonal designs become more apparent when we compare their results to those of more traditional designs. The plethora of identification results generated by orthogonal designs stand in sharp contrast to the paucity of identification results of standard designs. For instance, consider a conventional experimental design consisting of a control group with no incentives and three randomization groups dedicated solely to each treatment alternative. Specifically, we would have that \(z_1\) incentivizes participants toward \(t_1\), \(z_2\) incentivizes participants toward \(t_2\), \(z_3\) incentivizes participants toward \(t_3\), and \(z_4\) does not incentivize participants toward any choice. This incentive pattern is described by the incentive matrix \(\varvec{L}\) in (50).
$$ \begin{gathered} \begin{array}{*{20}c} \quad\qquad\qquad\qquad\qquad\quad\;\;{t_{1} } & {t_{2} } & {t_{3} } \\ \end{array} \hfill \\ {\text{Traditional Design }}L = \left[ {\begin{array}{*{20}c} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \\ \end{array} } \right]\begin{array}{*{20}c} {z_{1} } \\ {z_{2} } \\ {z_{3} } \\ {z_{4} } \\ \end{array} \hfill \\ \end{gathered} $$
(50)
We apply the same approach used to examine the orthogonal design of incentive matrix (45) to the traditional design of incentive matrix (50). Table 3 presents all the choice restrictions generated by applying the choice rule (46) to each combination of treatment pairs \((t,t') \in \{t_1,t_2,t_3\}^2\) and to each pair of instrumental values \((z,z') \in \{z_1,z_2,z_3,z_4\}^2\) of incentive matrix (50).
Table 3
Choice Restrictions Generated by the Traditional Incentive Matrix (50)
1
\( T_{\omega }(z_1)=t_1\)
\(\Rightarrow \)
No Restriction
2
\( T_{\omega }(z_2)=t_1\)
\(\Rightarrow \)
\( T_{\omega } (z_1) \notin \{t_2,t_3\}\) and \( T_{\omega }(z_3)\ne t_2\) and \( T_{\omega } (z_4) \notin \{t_2,t_3\}\)
3
\( T_{\omega }(z_3)=t_1\)
\(\Rightarrow \)
\( T_{\omega } (z_1) \notin \{t_2,t_3\}\) and \( T_{\omega }(z_2)\ne t_3\) and \( T_{\omega } (z_4) \notin \{t_2,t_3\}\)
4
\( T_{\omega }(z_4)=t_1\)
\(\Rightarrow \)
\( T_{\omega } (z_1) \notin \{t_2,t_3\}\) and \( T_{\omega }(z_2)\ne t_3\) and \( T_{\omega }(z_3)\ne t_2\)
5
\( T_{\omega }(z_1)=t_2\)
\(\Rightarrow \)
\( T_{\omega } (z_2) \notin \{t_1,t_3\}\) and \( T_{\omega }(z_3)\ne t_1\) and \( T_{\omega } (z_4) \notin \{t_1,t_3\}\)
6
\( T_{\omega }(z_2)=t_2\)
\(\Rightarrow \)
No Restriction
7
\( T_{\omega }(z_3)=t_2\)
\(\Rightarrow \)
\( T_{\omega }(z_1)\ne t_3\) and \( T_{\omega } (z_2) \notin \{t_1,t_3\}\) and \( T_{\omega } (z_4) \notin \{t_1,t_3\}\)
8
\( T_{\omega }(z_4)=t_2\)
\(\Rightarrow \)
\( T_{\omega }(z_1)\ne t_3\) and \( T_{\omega } (z_2) \notin \{t_1,t_3\}\) and \( T_{\omega }(z_3)\ne t_1\)
9
\( T_{\omega }(z_1)=t_3\)
\(\Rightarrow \)
\( T_{\omega }(z_2)\ne t_1\) and \( T_{\omega } (z_3) \notin \{t_1,t_2\}\) and \( T_{\omega } (z_4) \notin \{t_1,t_2\}\)
10
\( T_{\omega }(z_2)=t_3\)
\(\Rightarrow \)
\( T_{\omega }(z_1)\ne t_2\) and \( T_{\omega } (z_3) \notin \{t_1,t_2\}\) and \( T_{\omega } (z_4) \notin \{t_1,t_2\}\)
11
\( T_{\omega }(z_3)=t_3\)
\(\Rightarrow \)
No Restriction
12
\( T_{\omega }(z_4)=t_3\)
\(\Rightarrow \)
\( T_{\omega }(z_1)\ne t_2\) and \( T_{\omega }(z_2)\ne t_1\) and \( T_{\omega } (z_3) \notin \{t_1,t_2\}\)
This table presents all the choice restrictions generated by applying the choice rule (46) to each combination of choices \((t,t') \in \{t_1,t_2,t_3\}\) and instrumental values \((z,z') \in \{z_1,z_2,z_3,z_4\}\) of the incentive matrix (50)
The choice restrictions of Table 3 eliminate 69 out of the 81 possible response-types. The 12 admissible response-types that comply with all the choice restrictions in Table 3 are presented in the response matrix below:
$$ \begin{gathered} \begin{array}{*{20}c} {\;\;\;\;\;\;\;s_{1} } & {s_{2} } & {s_{3} } & {s_{4} } & {s_{5} } & {s_{6} } & {s_{7} } & {s_{8} } & {s_{9} } & {s_{10} } & {s_{11} } & {s_{12} } \\ \end{array} \hfill \\ \, R = \left[ {\begin{array}{*{20}c} {t_{1} } & {t_{1} } & {t_{1} } & {t_{1} } & {t_{1} } & {t_{1} \;} & {t_{1} \;} & {t_{1} } & {t_{2} \;} & {t_{2} \;} & {t_{3} \;\;} & {t_{3} } \\ {t_{1} } & {t_{1} } & {t_{1} } & {t_{2} } & {t_{2} } & {t_{2} } & {t_{2} } & {t_{3} } & {t_{2} } & {t_{2} } & {t_{2} } & {t_{3} } \\ {t_{1} } & {t_{3} } & {t_{2} } & {t_{2} } & {t_{3} } & {t_{3} } & {t_{3} } & {t_{3} } & {t_{2} } & {t_{3} } & {t_{3} } & {t_{3} } \\ {t_{1} } & {t_{1} } & {t_{1} } & {t_{2} } & {t_{1} } & {t_{2} } & {t_{3} } & {t_{3} } & {t_{2} } & {t_{2} } & {t_{3} } & {t_{3} } \\ \end{array} } \right]\begin{array}{*{20}c} {z_{1} } \\ {z_{2} } \\ {z_{3} } \\ {z_{4} } \\ \end{array} \hfill \\ \end{gathered} $$
(51)
Table 4 presents all the response-type probabilities and counterfactual outcomes that are identified by the response matrix (51). Response matrix (51) does not generate a single point-identified response-type probability. The matrix does not generate any point-identified counterfactual outcomes either. By choosing an orthogonal design for the incentive matrix, we secure the identification of causal parameters. Using a traditional design, we do not.
Table 4
Causal Parameters Identified by Response Matrix (51)
1. The identified response-type probabilities are:
\(P(\varvec{S} \in \{\varvec{s}_{1},\varvec{s}_{2}\})\)
\(P(\varvec{S} \in \{\varvec{s}_{1},\varvec{s}_{3}\})\)
\(P(\varvec{S} \in \{\varvec{s}_{2},\varvec{s}_{5}\})\)
\(P(\varvec{S} \in \{\varvec{s}_{3},\varvec{s}_{5}\})\)
\(P(\varvec{S} \in \{\varvec{s}_{4},\varvec{s}_{6}\})\)
\(P(\varvec{S} \in \{\varvec{s}_{4},\varvec{s}_{9}\})\)
\(P(\varvec{S} \in \{\varvec{s}_{6},\varvec{s}_{10}\})\)
\(P(\varvec{S} \in \{\varvec{s}_{7},\varvec{s}_{8}\})\)
\(P(\varvec{S} \in \{\varvec{s}_{7},\varvec{s}_{11}\})\)
\(P(\varvec{S} \in \{\varvec{s}_{8},\varvec{s}_{12}\})\)
\(P(\varvec{S} \in \{\varvec{s}_{9},\varvec{s}_{10}\})\)
\(P(\varvec{S} \in \{\varvec{s}_{11},\varvec{s}_{12}\})\)
2. The expectation (and distribution) of the following counterfactual outcomes are identified:
\(t_1\)
\(t_2\)
\(t_3\)
 
\({{\,\mathrm{E}\,}}(Y(t_{1})\mid S \in \{s_{1},s_{2}\})\)
\({{\,\mathrm{E}\,}}(Y(t_{2})\mid S \in \{s_{4},s_{6}\})\)
\({{\,\mathrm{E}\,}}(Y(t_{3})\mid S \in \{s_{7},s_{8}\})\)
 
\({{\,\mathrm{E}\,}}(Y(t_{1})\mid S \in \{s_{1},s_{3}\})\)
\({{\,\mathrm{E}\,}}(Y(t_{2})\mid S \in \{s_{4},s_{9}\})\)
\({{\,\mathrm{E}\,}}(Y(t_{3})\mid S \in \{s_{7},s_{11}\})\)
 
\({{\,\mathrm{E}\,}}(Y(t_{1})\mid S \in \{s_{2},s_{5}\})\)
\({{\,\mathrm{E}\,}}(Y(t_{2})\mid S \in \{s_{6},s_{10}\})\)
\({{\,\mathrm{E}\,}}(Y(t_{3})\mid S \in \{s_{8},s_{12}\})\)
 
\({{\,\mathrm{E}\,}}(Y(t_{1})\mid S \in \{s_{3},s_{5}\})\)
\({{\,\mathrm{E}\,}}(Y(t_{2})\mid S \in \{s_{9},s_{10}\})\)
\({{\,\mathrm{E}\,}}(Y(t_{3})\mid S \in \{s_{11},s_{12}\})\)
 
\({{\,\mathrm{E}\,}}(Y(t_{1})\mid S \in \{s_{4},s_{6},s_{7},s_{8}\})\)
\({{\,\mathrm{E}\,}}(Y(t_{2})\mid S \in \{s_{3},s_{5},s_{7},s_{11}\})\)
\({{\,\mathrm{E}\,}}(Y(t_{3})\mid S \in \{s_{2},s_{5},s_{6},s_{10}\})\)
 
This table presents all the causal parameters that are identified by response matrix (51)
Appendix B applies our analysis to the study of Latin squares. We refer to Pinto and Navjeevan (2022)16 for further discussion on how economic incentives shape choice restrictions in the IV model with multiple choices and heterogeneous agents.

Conclusion

This paper provides a novel application of Rao’s fundamental work on the design of experiments using orthogonal arrays. Rao’s seminal ideas are widely used to determine efficient arrangements of treatment factors in RCTs. His method is well suited for experiments where the analyst can reliably assign treatment factors to randomization units. Unfortunately, social scientists can seldom impose treatment statuses. Most social experiments are consequently plagued by noncompliance, which undermines the random assignment of treatment statuses.
We repurpose Rao’s original ideas to address the common challenges that noncompliance generates. We use a novel framework whereby orthogonal arrays denote a pattern of choice incentives. We combine the IV framework of Heckman and Pinto (2018) with the recently developed econometric tools in Pinto (2021a, b),17 and Pinto and Navjeevan (2022)18 to translate choice incentives into choice restrictions. These restrictions determine the set of economically justifiable counterfactual choices, which, in turn, enable the identification of causal parameters. We then show the benefits of using orthogonal arrays (rather than traditional approaches) for identifying causal parameters.
Our method broadly applies to IV models with multiple treatments, categorical instruments, and heterogeneous agents. We establish a tight link between the problem of the unobserved mixture of distributions and the identification of counterfactuals. We explore the notion of a response matrix. The matrix contains all the necessary information to examine the nonparametric identification of model counterfactuals. We apply mixture model methods to matrices to prove the identification of causal parameters.

Acknowledgements

This research was supported by a MERIT award from the Eunice Kennedy Shriver National Institutes of Child Health and Human Development under award number R37HD06572 and a grant from a private donor. A web appendix (https://​cehd.​uchicago.​edu/​causal-models-choice-treat-appx) contains proofs of propositions.

Declaration

Conflict of interest

The authors declare that they have no competing interests that influenced the research or writing of this manuscript.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Anhänge

Supplementary Information

Below is the link to the electronic supplementary material.
Fußnoten
1
Pinto, R. (2021a). Beyond intention to treat: Using the incentives in moving to opportunity to identify neighborhood effects (unpublished manuscript). Department of Economics, University of California, Los Angeles. https://​www.​rodrigopinto.​net/​_​files/​ugd/​95d94d_​90f491ec1afa45cf​8ef1e9a77346c9a8​.​pdf.
 
2
Our analysis holds if outcome Y represents a vector-valued variable denoting multiple outcomes.
 
3
The indicator function \(\mathbf {1}[A]\) equals one if event A occurs and zero otherwise.
 
4
By policy-invariant, we mean functions whose maps remain invariant under manipulation of the arguments. This is the notation of autonomy developed by Frisch (1938) and Haavelmo (1944). For a recent discussion of these conditions, see Heckman and Pinto (2015) and Pinto and Heckman (2021).
 
5
Such error terms are often called “shocks” in structural equation models. \(f_T\) is a deterministic function that can be interpreted as a random function if we introduced shock \(\epsilon _T\) of arbitrary dimension as one of its arguments.
 
6
Fixing is a causal operation that captures the notion of external (ceteris paribus) manipulation. It is a central concept in the study of causality and dates back to Haavelmo (1943). See Heckman and Pinto (2015) for a recent discussion of fixing and causality.
 
7
The response-types can be viewed as “types” in the sense of Keane and Wolpin (1997).
 
8
A balancing score for \(\varvec{V}\) constitutes a function of \(\varvec{V}\) that preserves the matching condition in \(Y(t) \perp\kern-4.5pt\perp T \mid \varvec{V}\) (9). See Rosenbaum and Rubin (1983).
 
9
Formally, \((Y(t),\varvec{V}) \perp\kern-4.5pt\perp Z \Rightarrow \) \((Y(t),h(\varvec{V})) \perp\kern-4.5pt\perp Z \Rightarrow \) \((Y(t),\varvec{S}) \perp\kern-4.5pt\perp Z \Rightarrow Y(t) \perp\kern-4.5pt\perp Z \mid \varvec{S} \Rightarrow \) \(Y(t) \perp\kern-4.5pt\perp g(Z,\varvec{S}) \mid \varvec{S} \Rightarrow \) \(Y(t) \perp\kern-4.5pt\perp T \mid \varvec{S}.\)
 
10
The Moore-Penrose inverse of a matrix \(\varvec{A}\) is denoted by \(\varvec{A}^+\) and is defined by the following four properties: (1) \(\varvec{A}\varvec{A}^+\varvec{A} = \varvec{A};\) (2) \(\varvec{A}^+\varvec{A}\varvec{A}^+ = \varvec{A}^+;\) (3) \(\varvec{A}^+\varvec{A}\) is symmetric; (4) \(\varvec{A}\varvec{A}^+\) is symmetric. The Moore-Penrose matrix \(\varvec{A}^+\) of a real matrix \(\varvec{A}\) is unique and always exists (Magnus and Neudecker 1999).
 
11
See Section A.4 of the Appendix for bounds on the response-type probabilities and counterfactual outcomes.
 
12
Pinto, R. (2021a). Beyond intention to treat: Using the incentives in moving to opportunity to identify neighborhood effects [Unpublished manuscript]. Department of Economics, University of California, Los Angeles. https://​www.​rodrigopinto.​net/​_​files/​ugd/​95d94d_​90f491ec1afa45cf​8ef1e9a77346c9a8​.​pdf.
 
13
Pinto, R. (2021a). Beyond intention to treat: Using the incentives in moving to opportunity to identify neighborhood effects [Unpublished manuscript]. Department of Economics, University of California, Los Angeles. https://​www.​rodrigopinto.​net/​_​files/​ugd/​95d94d_​90f491ec1afa45cf​8ef1e9a77346c9a8​.​pdf.
 
14
Pinto, R. (2021a) Beyond intention to treat: Using the incentives in moving to opportunity to identify neighborhood effects [Unpublished manuscript]. Department of Economics, University of California, Los Angeles. https://​www.​rodrigopinto.​net/​_​files/​ugd/​95d94d_​90f491ec1afa45cf​8ef1e9a77346c9a8​.​pdf.
 
15
Under no choice restrictions, each of the four counterfactual choices (\(T(z_1)\), \(T(z_2)\), \(T(z_3)\), and \(T(z_4)\)) can take any of the three treatment values (\(t_1\), \(t_2\), or \(t_3\)). Thus, the total number of potential response-types is 81. The choice restrictions in Table 2 are able to eliminate 72 out of the 81 possible response-types. The nine response-types that survive this elimination process are displayed in (47).
 
16
Pinto, R., and Navjeevan, M. (2022). Ordered, unordered and minimal monotonicity criteria [Unpublished manuscript]. Department of Economics, University of California, Los Angeles. https://​www.​rodrigopinto.​net/​_​files/​ugd/​95d94d_​1405f5376ae449a9​b07f3bd3f37db161​.​pdf.
 
17
Pinto, R. (2021a). Beyond intention to treat: Using the incentives in moving to opportunity to identify neighborhood effects [Unpublished manuscript]. Department of Economics, University of California, Los Angeles. https://​www.​rodrigopinto.​net/​_​files/​ugd/​95d94d_​90f491ec1afa45cf​8ef1e9a77346c9a8​.​pdf. Pinto, R. (2021b). Economics of monotonicity conditions [Unpublished manuscript]. Department of Economics, University of California, Los Angeles.
 
18
Pinto, R., and Navjeevan, M. (2022). Ordered, unordered and minimal monotonicity criteria [Unpublished manuscript]. Department of Economics, University of California, Los Angeles. https://​www.​rodrigopinto.​net/​_​files/​ugd/​95d94d_​1405f5376ae449a9​b07f3bd3f37db161​.​pdf.
 
Literatur
Zurück zum Zitat Angrist, J.D., G.W. Imbens, and D. Rubin. 1996. Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91 (434): 444–455.CrossRef Angrist, J.D., G.W. Imbens, and D. Rubin. 1996. Identification of causal effects using instrumental variables. Journal of the American Statistical Association 91 (434): 444–455.CrossRef
Zurück zum Zitat Balke, A.A., and J. Pearl. 1993. Nonparametric bounds on causal effects from partial compliance data (Tech. Rep. No. R-199). University of California, Los Angeles. Balke, A.A., and J. Pearl. 1993. Nonparametric bounds on causal effects from partial compliance data (Tech. Rep. No. R-199). University of California, Los Angeles.
Zurück zum Zitat Becker, G.S. 1962. Irrational behavior and economic theory. Journal of Political Economy 70: 1–13.CrossRef Becker, G.S. 1962. Irrational behavior and economic theory. Journal of Political Economy 70: 1–13.CrossRef
Zurück zum Zitat Frangakis, C.E., and D. Rubin. 2002. Principal stratification in causal inference. Biometrics 58 (1): 21–29.CrossRef Frangakis, C.E., and D. Rubin. 2002. Principal stratification in causal inference. Biometrics 58 (1): 21–29.CrossRef
Zurück zum Zitat Frisch, R. 1938. Autonomy of economic relations: Statistical versus theoretical relations in economic macrodynamics. Paper given at League of Nations. Reprinted in D.F. Hendry and M.S. Morgan (1995), The Foundations of Econometric Analysis, Cambridge University Press. Frisch, R. 1938. Autonomy of economic relations: Statistical versus theoretical relations in economic macrodynamics. Paper given at League of Nations. Reprinted in D.F. Hendry and M.S. Morgan (1995), The Foundations of Econometric Analysis, Cambridge University Press.
Zurück zum Zitat Haavelmo, T. 1943. The statistical implications of a system of simultaneous equations. Econometrica 11 (1): 1–12.CrossRef Haavelmo, T. 1943. The statistical implications of a system of simultaneous equations. Econometrica 11 (1): 1–12.CrossRef
Zurück zum Zitat Haavelmo, T. 1944. The probability approach in econometrics. Econometrica 12 (Supplement), iii–iv and 1–115. Haavelmo, T. 1944. The probability approach in econometrics. Econometrica 12 (Supplement), iii–iv and 1–115.
Zurück zum Zitat Heckman, J.J., and R. Pinto. 2015. Causal analysis after Haavelmo. Econometric Theory 31 (1): 115–151.CrossRef Heckman, J.J., and R. Pinto. 2015. Causal analysis after Haavelmo. Econometric Theory 31 (1): 115–151.CrossRef
Zurück zum Zitat Heckman, J.J., and R. Pinto. 2018. Unordered monotonicity. Econometrica 86 (1): 1–35.CrossRef Heckman, J.J., and R. Pinto. 2018. Unordered monotonicity. Econometrica 86 (1): 1–35.CrossRef
Zurück zum Zitat Heckman, J.J., and R. Robb. 1985. Alternative methods for evaluating the impact of interventions: An overview. Journal of Econometrics 30 (1–2): 239–267.CrossRef Heckman, J.J., and R. Robb. 1985. Alternative methods for evaluating the impact of interventions: An overview. Journal of Econometrics 30 (1–2): 239–267.CrossRef
Zurück zum Zitat Keane, M.P., and K.I. Wolpin. 1997. The career decisions of young men. Journal of Political Economy 105 (3): 473–522.CrossRef Keane, M.P., and K.I. Wolpin. 1997. The career decisions of young men. Journal of Political Economy 105 (3): 473–522.CrossRef
Zurück zum Zitat Magnus, J., and H. Neudecker. 1999. Matrix differential calculus with applications in statistics and econometrics, 2nd ed. New York: Wiley. Magnus, J., and H. Neudecker. 1999. Matrix differential calculus with applications in statistics and econometrics, 2nd ed. New York: Wiley.
Zurück zum Zitat McFadden, D. 1981. Econometric models of probabilistic choice. In Structural analysis of discrete data with econometric applications, ed. C. Manski and D. McFadden, 198–272. Cambridge, MA: MIT Press. McFadden, D. 1981. Econometric models of probabilistic choice. In Structural analysis of discrete data with econometric applications, ed. C. Manski and D. McFadden, 198–272. Cambridge, MA: MIT Press.
Zurück zum Zitat Pinto, R., and J.J. Heckman. 2021. The econometric model for causal policy analysis (Forthcoming, Annual Review of Economics). Pinto, R., and J.J. Heckman. 2021. The econometric model for causal policy analysis (Forthcoming, Annual Review of Economics).
Zurück zum Zitat Powell, J.L. 1994. Estimation of semiparametric models. In Handbook of econometrics, vol. 4, ed. R. Engle and D. McFadden, 2443–2521. Amsterdam: Elsevier. Powell, J.L. 1994. Estimation of semiparametric models. In Handbook of econometrics, vol. 4, ed. R. Engle and D. McFadden, 2443–2521. Amsterdam: Elsevier.
Zurück zum Zitat Quandt, R.E. 1958. The estimation of the parameters of a linear regression system obeying two separate regimes. Journal of the American Statistical Association 53 (284): 873–880.CrossRef Quandt, R.E. 1958. The estimation of the parameters of a linear regression system obeying two separate regimes. Journal of the American Statistical Association 53 (284): 873–880.CrossRef
Zurück zum Zitat Quandt, R.E. 1972. A new approach to estimating switching regressions. Journal of the American Statistical Association 67 (338): 306–310.CrossRef Quandt, R.E. 1972. A new approach to estimating switching regressions. Journal of the American Statistical Association 67 (338): 306–310.CrossRef
Zurück zum Zitat Rao, C.R. 1943. Researches in the theory of the design of experiments and distribution problems connected with bivariate and multivariate populations. Thesis submitted to Calcutta University in lieu of 7th and 8th practical papers of the master’s examination in statistics. Rao, C.R. 1943. Researches in the theory of the design of experiments and distribution problems connected with bivariate and multivariate populations. Thesis submitted to Calcutta University in lieu of 7th and 8th practical papers of the master’s examination in statistics.
Zurück zum Zitat Rao, C.R. 1946a. Difference sets and combinatorial arrangements derivable from finite geometries. Proceedings of the Indian National Science Academy 12 (3): 123–135. Rao, C.R. 1946a. Difference sets and combinatorial arrangements derivable from finite geometries. Proceedings of the Indian National Science Academy 12 (3): 123–135.
Zurück zum Zitat Rao, C.R. 1946b. Hypercubes of strength ‘d’ leading to confounded designs in factorial experiments. Bulletin of the Calcutta Mathematical Society 38: 67–78. Retrieved from https://ci.nii.ac.jp/naid/10010345773/en/. Rao, C.R. 1946b. Hypercubes of strength ‘d’ leading to confounded designs in factorial experiments. Bulletin of the Calcutta Mathematical Society 38: 67–78. Retrieved from https://​ci.​nii.​ac.​jp/​naid/​10010345773/​en/​.
Zurück zum Zitat Rao, B.L.S.P. 1992. Identifiability for mixtures of distributions. Identifiability in stochastic models: Characterization of probability distributions, 183–228. Boston, MA: Academic Press. Rao, B.L.S.P. 1992. Identifiability for mixtures of distributions. Identifiability in stochastic models: Characterization of probability distributions, 183–228. Boston, MA: Academic Press.
Zurück zum Zitat Reiersöl, O. 1945. Confluence analysis by means of instrumental sets of variables. Arkiv för Matematik, Astronomi och Fysik 32A (4): 1–119. Reiersöl, O. 1945. Confluence analysis by means of instrumental sets of variables. Arkiv för Matematik, Astronomi och Fysik 32A (4): 1–119.
Zurück zum Zitat Robins, J.M., and S. Greenland. 1992. Identifiability and exchangeability for direct and indirect effects. Epidemiology 3 (2): 143–155.CrossRef Robins, J.M., and S. Greenland. 1992. Identifiability and exchangeability for direct and indirect effects. Epidemiology 3 (2): 143–155.CrossRef
Zurück zum Zitat Rosenbaum, P.R., and D.B. Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70 (1): 41–55.CrossRef Rosenbaum, P.R., and D.B. Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70 (1): 41–55.CrossRef
Zurück zum Zitat Stinson, D. 2004. Combinatorial designs: constructions and analysis. Berlin: Springer. Stinson, D. 2004. Combinatorial designs: constructions and analysis. Berlin: Springer.
Zurück zum Zitat Thaler, R.H. 2016. Misbehaving: the making of behavioral economics. New York: W. W. Norton & Company. Thaler, R.H. 2016. Misbehaving: the making of behavioral economics. New York: W. W. Norton & Company.
Zurück zum Zitat Theil, H. 1958. Economic forecasts and policy (No. 15). Amsterdam: North Holland Publishing Company. Theil, H. 1958. Economic forecasts and policy (No. 15). Amsterdam: North Holland Publishing Company.
Zurück zum Zitat Theil, H. 1971. Principles of econometrics. New York: Wiley. Theil, H. 1971. Principles of econometrics. New York: Wiley.
Zurück zum Zitat Yakowitz, S.J., and J.D. Spragins. 1968. On the identifiability of finite mixtures. Annals of Mathematical Statistics 39 (1): 209–214.CrossRef Yakowitz, S.J., and J.D. Spragins. 1968. On the identifiability of finite mixtures. Annals of Mathematical Statistics 39 (1): 209–214.CrossRef
Metadaten
Titel
Causal Inference of Social Experiments Using Orthogonal Designs
verfasst von
James J. Heckman
Rodrigo Pinto
Publikationsdatum
12.09.2022
Verlag
Springer India
Erschienen in
Journal of Quantitative Economics / Ausgabe Sonderheft 1/2022
Print ISSN: 0971-1554
Elektronische ISSN: 2364-1045
DOI
https://doi.org/10.1007/s40953-022-00307-w

Weitere Artikel der Sonderheft 1/2022

Journal of Quantitative Economics 1/2022 Zur Ausgabe