Categorical Data

Categorical outcome models are regression models for a dependent variable that is a discrete variable recording in which of two or more categories, usually mutually exclusive, an outcome of interest lies.

A. Colin Cameron

Competing Risks Model

A competing risks model is a model for multiple durations that start at the same point in time for a given subject, where the subject is observed until the first duration is completed and one also observes which of the multiple durations is completed first.

Gerard J. Van Den Berg

Computational Methods in Econometrics

In evaluating the importance and usefulness of particular econometric methods, it is customary to focus on the set of statistical properties that a method possesses — for example, unbiasedness, consistency, efficiency, asymptotic normality, and so on. It is crucial to stress, however, that meaningful comparisons cannot be completed without paying attention also to a method’s computational properties. Indeed the practical value of an econometric method can be assessed only by examining the inevitable interplay between the two classes of properties, since a method with excellent statistical properties may be computationally infeasible and vice versa. Computational methods in econometrics are evolving over time to reflect the current technological boundaries as defined by available computer hardware and software capabilities at a particular period, and hence are inextricably linked with determining what the state of the art is in econometric methodology.

Vassilis A. Hajivassiliou

Control Functions

The control function approach is an econometric method used to correct for biases that arise as a consequence of selection and/or endogeneity. It is the leading approach for dealing with selection bias in the correlated random coefficients model (see Heckman and Robb, 1985; 1986; Heckman and Vytlacil, 1998; Wooldridge, 1997; 2003; Heckman and Navarro, 2004), but it can be applied in more general semiparametric settings (see Newey, Powell and Vella, 1999; Altonji and Matzkin, 2005; Chesher, 2003; Imbens and Newey, 2006; Florens et al, 2007).

Salvador Navarro

Decision Theory in Econometrics

The decision-theoretic approach to statistics and econometrics explicitly specifies a set of models under consideration, a set of actions available to the analyst, and a loss function (or, equivalently, a utility function) that quantifies the value to the decision-maker of applying a particular action when a particular model holds. Decision rules, or procedures, map data into actions, and can be evaluated on the basis of their expected loss.Bibliography

Keisuke Hirano

Difference-in-Difference Estimators

Difference-in-differences (DID) estimators are often used in empirical research in economics to evaluate the effects of public interventions and other treatments of interest in the absence of purely experimental data.

Alberto Abadie

Exchangeability

Definition A sequence y = (y₁,…,y_n) of random variables (for n ≥ 1) is (finitely) exchangeable if the joint probability distribution p(y₁,…,y_n) of the elements of y is invariant under permutation of the indices (1,…,n), and a countably infinite sequence (y₁,…,y₂, …) is (infinitely) exchangeable if every finite subsequence is finitely exchangeable.

David Draper

Extreme Bounds Analysis

The analysis of economic data necessarily depends on assumptions that our weak data-sets do not allow us to test. We are forced to choose a limited number of variables in a multivariate analysis, to restrict the functional form, to limit the considered interdependence among observations to special forms, and to make special distributional assumptions. We make these assumptions, not because we believe them, but because we have to. Absent assumptions, our data-sets are utterly useless.

Edward E. Leamer

Field Experiments

Field experiments occupy an important middle ground between laboratory experiments and naturally occurring field data. The underlying idea behind most field experiments is to make use of randomization in an environment that captures important characteristics of the real world. Distinct from traditional empirical economics, field experiments provide an advantage by permitting the researcher to create exogenous variation in the variables of interest, allowing us to establish causality rather than mere correlation. In relation to a laboratory experiment, a field experiment potentially gives up some of the control that a laboratory experimenter may have over her environment in exchange for increased realism.

John A. List, David Reiley

Fixed Effects and Random Effects

One of the major benefits from using panel data as compared to cross-section data on individuals is that it enables us to control for individual heterogeneity. Not controlling for these unobserved individual specific effects leads to bias in the resulting estimates. Consider the panel data regression 1<math display='block'> <mrow> <msub> <mi>y</mi> <mrow> <mi>i</mi><mi>t</mi> </mrow> </msub> <mo>=</mo><mi>α</mi><mo>+</mo><msub> <msup> <mi>X</mi> <mo>′</mo> </msup> <mrow> <mi>i</mi><mi>t</mi> </mrow> </msub> <mi>β</mi><mo>+</mo><msub> <mi>u</mi> <mrow> <mi>i</mi><mi>t</mi> </mrow> </msub> <mtext> </mtext><mtext> </mtext><mi>i</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>N</mi><mo>;</mo><mtext> </mtext><mi>t</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>…</mo><mo>,</mo><mi>T</mi> </mrow> </math>$${{y}_{{it}}}=\alpha +{{{X}^{\prime}}_{{it}}}\beta +{{u}_{{it}}}\quad \;i=1,\ldots ,N;\quad t=1,\ldots ,T$$ with i denoting individuals and t denoting time. The panel data is balanced in that none of the observations is missing whether randomly or non-randomly due to attrition or sample selection. α is a scalar, β is K Õ 1 and X_it is the itth observation on K explanatory variables. Most panel data applications utilize a one-way error component model for the disturbances, with 2<math display='block'> <mrow> <msub> <mi>u</mi> <mrow> <mi>i</mi><mi>t</mi> </mrow> </msub> <mo>=</mo><msub> <mi>μ</mi> <mi>i</mi> </msub> <mo>+</mo><msub> <mi>ν</mi> <mrow> <mi>i</mi><mi>t</mi> </mrow> </msub> </mrow> </math>$${{u}_{{it}}}={{\mu }_{i}}+{{\nu }_{{it}}}$$ where μ_i denotes the unobservable individual specific effect and v_it denotes the remainder disturbance. For example, in an earnings equation in labour economics, y_it will measure earnings of the head of the household, whereas X_it may contain a set of variables like experience, education, union membership, sex, or race. Note that μ_i is time-invariant and it accounts for any individual specific effect that is not included in the regression. In this case we could think of it as the individual’s unobserved ability. The remainder disturbance v_it varies with individuals and time and can be thought of as the usual disturbance in the regression. If the μ_i’s are assumed to be fixed parameters to be estimated, we get the fixed effects (FE) model.

Badi H. Baltagi

Identification

In economic analysis, we often assume that there exists an underlying structure which has generated the observations of real-world data. However, statistical inference can relate only to characteristics of the distribution of the observed variables. Statistical models which are used to explain the behaviour of observed data typically involve parameters, and statistical inference aims at making statements about these parameters. For that purpose, it is important that different values of a parameter of interest can be characterized in terms of the data distribution. Otherwise, the problem of drawing inferences about this parameter is plagued by a fundamental indeterminacy and can be viewed as ‘ill-posed’.

Jean-Marie Dufour, Cheng Hsiao

Local Regression Models

Local regression models are regression models where the parameters are ‘localized’, that is, they are allowed to vary with some or all of the covariates in a general way. Suppose that (Y, X) are random variables and let 1<math display='block'> <mrow> <mi>E</mi><mrow><mo>(</mo> <mrow> <mi>Y</mi><mo>|</mo><mi>X</mi><mo>=</mo><mi>x</mi> </mrow> <mo>)</mo></mrow><mo>=</mo><mi>m</mi><mrow><mo>(</mo> <mi>x</mi> <mo>)</mo></mrow> </mrow> </math>$$E\left( {Y|X=x} \right)=m\left( x \right)$$ when it exists. The regression function m(x) is of primary interest because it describes how X affects Y One may also be interested in derivatives of m or averages thereof or in derived quantities like conditional variance var(Y|X = x) = E(2|X = x) − E2 (Y\X = x). In cases of heavy-tailed distributions, the conditional expectation may not exist, in which case one may instead work with other location functionals like trimmed mean or median. The conditional expectation is particularly easy to deal with but a lot of what is done for the mean can also be done for the median or other quantities.

Oliver B. Linton

Logit Models of Individual Choice

The logit function is the reciprocal function to the sigmoid logistic function. It maps the interval [0,1] into the real line and is written as 1<math display='block'> <mrow> <mi>l</mi><mi>o</mi><mi>g</mi><mi>i</mi><mi>t</mi><mrow><mo>(</mo> <mi>p</mi> <mo>)</mo></mrow><mo>=</mo><mi>ln</mi><mrow><mo>(</mo> <mrow> <mi>p</mi><mo>/</mo><mrow><mo>(</mo> <mrow> <mn>1</mn><mo>−</mo><mi>p</mi> </mrow> <mo>)</mo></mrow> </mrow> <mo>)</mo></mrow><mo>.</mo> </mrow> </math>$$logit\left( p \right)=\ln \left( {p/\left( {1-p} \right)} \right).$$ Two traditions are involved in the modern theory of logit models of individual choices. The first one concerns curve fitting as exposed by Berkson (1944), who coined the term ‘logit’ after its close competitor ‘probit’ which is derived from the normal distribution. Both models are by far the most popular econometric methods used in applied work to estimate models for binary variables, even though the development of semiparametric and nonparametric alternatives since the mid-1970s has been intensive (Horowitz and Savin, 2001).

Thierry Magnac

Longitudinal Data Analysis

‘Longitudinal data’ (or ‘panel data’) refers to data-sets that contain time series observations of a number of individuals. In other words, it provides multiple observations for each individual in the sample. Compared with cross-sectional data, in which observations for a number of individuals are available only for a given time, or time-series data, in which a single entity is observed over time, panel data have the obvious advantages of more degrees of freedom and less collinearity among explanatory variables, and so provide the possibility of obtaining more accurate parameter estimates. More importantly, by blending inter-individual differences with intra-individual dynamics, panel data allow the investigation of more complicated behavioural hypotheses than those that can be addressed using cross-sectional or time-series data.

Cheng Hsiao

Matching Estimators

Matching is a widely used non-experimental method of evaluation that can be used to estimate the average effect of a treatment or programme intervention. The method compares the outcomes of programme participants with those of matched non-participants, where matches are chosen on the basis of similarity in observed characteristics. One of the main advantages of matching estimators is that they typically do not require specifying the functional form of the outcome equation and are therefore not susceptible to misspecification bias along that dimension. Traditional matching estimators pair each programme participant with a single matched non-participant (see, for example, Rosenbaum and Rubin, 1983), whereas more recently developed estimators pair programme participants with multiple non-participants and use weighted averaging to construct the matched outcomes.

Petra E. Todd

Maximum Score Methods

In a seminal paper, Manski (1975) introduces the maximum score estimator (MSE) of the structural parameters of a multinomial choice model and proves consistency without assuming knowledge of the distribution of the error terms in the model. As such, the MSE is the first instance of a semiparametric estimator of a limited dependent variable model in the econometrics literature.

Robert P. Sherman

Mixture Models

Suppose that ℱ = {F_θ: θ ( S} is a parametric family of distributions on a sample space X, and let Q denote a probability distribution defined on the parameter space S. The distribution 1<math display='block'> <mrow> <msub> <mi>F</mi> <mi>Q</mi> </msub> <mo>=</mo><mstyle displaystyle='true'> <mrow><mo>∫</mo> <mrow> <msub> <mi>F</mi> <mi>θ</mi> </msub> <mi>d</mi><mi>Q</mi><mrow><mo>(</mo> <mi>θ</mi> <mo>)</mo></mrow> </mrow> </mrow> </mstyle> </mrow> </math>$${{F}_{Q}}=\int {{{F}_{\theta }}dQ\left( \theta \right)} $$ is a mixture distribution. An observation X drawn from Fq can be thought of as being obtained in a two-step procedure: first, a random Θ is drawn from the distribution Q and then, conditional on Θ = θ, X is drawn from the distribution F_θ. Suppose we have a random sample X₁,…, X_n from Fq . We can view this as a missing data problem in that the ‘full data’ consists of pairs (X₁,Θ₁),…, (X_n,Θ_n), with Θi ∼ Q and X_i|Θ_i = θ ∼ F_θ, but then only the first member X_i of each pair is observed; the labels Θ_i are hidden.

Bruce G. Lindsay, Michael Stewart

Natural Experiments and Quasi-Natural Experiments

The term ‘natural experiment’ has been used in many, often, contradictory, ways. It is not unfair to say that the term is frequently employed to describe situations that are neither ‘natural’ nor ‘experiments’ or situations which are ‘natural, but not experiments’ or vice versa.

J. Dinardo

Nonlinear Panel Data Models

Panel or longitudinal data are becoming increasingly popular in applied work as they offer a number of advantages over pure cross-sectional or pure time-series data. A particularly useful feature is that they allow researchers to model unobserved heterogeneity at the level of the observational unit, where the latter may be an individual, a household, a firm or a country. Standard practice in the econometric literature is to model this heterogeneity as an individual-specific effect which enters additively in the model, typically assumed to be linear, that captures the statistical relationship between the dependent and the independent variables. The presence of these individual effects may cause problems in estimation. In particular in short panels, that is, in panels where the time-series dimension is of smaller order than the cross-sectional dimension, their estimation in conjunction with the other parameters of interest usually yields inconsistent estimators for both. (Notable exceptions are the static linear and the Poisson count panel data models, where estimation of the individual effects along with the finite dimensional coefficient vector yields consistent estimators of the latter.) This is the well-known incidental parameters problem (Neyman and Scott, 1948). In linear regression models, this problem may be dealt with by taking transformations of the model, such as first differences or differences from time averages (‘within transformation’), which remove the individual effect from the equation under consideration. However they do not apply to nonlinear econometric models, that is, models which are nonlinear in the parameters of interest and which include models that arise frequently in applied work, such as discrete choice models, limited dependent variable models, and duration models, among others.

Ekaterini Kyriazidou

Nonparametric Structural Models

The interplay between economic theory and econometrics comes to its full force when analysing structural models. These models are used in industrial organization, marketing, public finance, labour economics and many other fields in economics. Structural econometric methods make use of the behavioural and equilibrium assumptions specified in economic models to define a mapping between the distribution of the observable variables and the primitive functions and distributions that are used in the model. Using these methods, one can infer elements of the model, such as utility and production functions, that are not directly observed. This allows one to predict behaviour and equilibria outcomes under new environments and to evaluate the welfare of individuals and profits of firms under alternative policies, among other benefits.

Rosa L. Matzkin

Partial Identification in Econometrics

Suppose that one wants to use sample data to draw conclusions about a population of interest. Econometricians have long found it useful to separately study identification problems and problems of statistical inference. Studies of identification characterize the conclusions that could be drawn if one were able to observe an unlimited number of realizations of the sampling process. Studies of statistical inference characterize the generally weaker conclusions that can be drawn given a sample of positive but finite size. Koopmans (1949, p. 132) put it this way in the article that introduced the term ‘identification’:

In our discussion we have used the phrase ‘a parameter that can be determined from a sufficient number of observations.’ We shall now define this concept more sharply, and give it the name
identifiability
of a parameter. Instead of reasoning, as before, from ‘a sufficiently large number of observations’ we shall base our discussion on a hypothetical knowledge of the probability distribution of the observations, as defined more fully below. It is clear that exact knowledge of this probability distribution cannot be derived from any finite number of observations. Such knowledge is the limit approachable but not attainable by extended observation. By hypothesizing nevertheless the full availability of such knowledge, we obtain a clear separation between problems of statistical inference arising from the variability of finite samples, and problems of identification in which we explore the limits to which inference even from an infinite number of observations is suspect.

Charles F. Manski

Partial Linear Model

A partially linear model requires the regression function to be a linear function of a subset of the variables and a nonparametric non-specified function of the rest of the variables. Suppose, for example, that one is interested in estimating the relationship between an outcome variable of interest y and a vector of variables (x, z). The economist is comfortable modelling the regression function as linear in x, but s hesitant in extending the linearity to z. One example, considered by Engle et al. (1986), is the effect of temperature on fuel consumption using a time series of cities. To do that, one can consider a regression of average fuel consumption in time t on average household characteristic and average temperature in time t. The analyst might be more comfortable with imposing linearity on the part of the regression function involving household characteristics but unwilling to require that fuel consumption varies linearly with temperature. This is natural since fuel consumption tends to be higher at extremes of the temperature scale, but lower at moderate temperatures. The regression function Engle et al. consider is: (1)<math display='block'> <mrow> <mi>y</mi><mo>=</mo><msup> <mi>x</mi> <mo>′</mo> </msup> <mi>β</mi><mo>+</mo><mi>g</mi><mrow><mo>(</mo> <mi>z</mi> <mo>)</mo></mrow><mo>+</mo><mi>u</mi> </mrow> </math>$$y={x}^{\prime}\beta +g\left( z \right)+u$$ where x denotes a vector of household/city characteristics and z is temperature and u is a mean zero random variable such that is independent of (x, z). The function g(.) is unspecified except for smoothness assumptions. They term this the semiparametric regression model.

Elie Tamer

Propensity Score

Propensity score is an object often discussed in evaluation studies. It is defined as the conditional probability of treatment given covariates. It has attracted attention for its potential to control for the bias in the presence of high dimensional covariates.

Jinyong Hahn

Proportional Hazard Model

The estimation of duration models has been the subject of significant research in econometrics since the late 1970s. Cox (1972) proposed the use of proportional hazard models in biostatistics and they were soon adopted for use in economics. Since Lancaster (1979), it has been recognized among economists that it is important to account for unobserved heterogeneity in models for duration data. Failure to account for unobserved heterogeneity causes the estimated hazard rate to decrease more with the duration than the hazard rate of a randomly selected member of the population. Moreover, the estimated proportional effect of explanatory variables on the population hazard rate is smaller in absolute value than that on the hazard rate of the average population member and decreases with the duration. To account for unobserved heterogeneity Lancaster proposed a parametric mixed proportional hazard (MPH) model, a partial generalization of Cox’s proportional hazard model, that specifies the hazard rate as the product of a regression function that captures the effect of observed explanatory variables, a baseline hazard that captures variation in the hazard over the spell, and a random variable that accounts for the omitted heterogeneity. In particular, Lancaster (1979) introduced the mixed proportional hazard model in which the hazard is a function of a regressor X unobserved heterogeneity v, and a function of time λ(f), (1)<math display='block'> <mrow> <mi>θ</mi><mrow><mo>(</mo> <mrow> <mi>t</mi><mo>|</mo><mi>X</mi><mo>,</mo><mi>v</mi> </mrow> <mo>)</mo></mrow><mo>=</mo><mi>v</mi><msup> <mi>e</mi> <mrow> <mi>X</mi><msub> <mi>β</mi> <mn>0</mn> </msub> </mrow> </msup> <mi>λ</mi><mrow><mo>(</mo> <mi>t</mi> <mo>)</mo></mrow><mo>.</mo> </mrow> </math>$$\theta \left( {t|X,v} \right)=v{{e}^{{X{{\beta }_{0}}}}}\lambda \left( t \right).$$

Jerry A. Hausman, Tiemen M. Woutersen

Quantile Regression

The quantile regression is a semiparametric technique that has been gaining considerable popularity in economics (for example, Buchinsky, 1994). It was introduced by Koenker and Bassett (1978b) as an extension to ordinary quantiles in a location model. In this model, the conditional quantiles have linear forms. A well-known special case of quantile regression is the least absolute deviation (LAD) estimator of Koenker and Bassett (1978a), which fits medians to a linear function of covariates. In an important generalization of the quantile regression model, Powell (1984; 1986) introduced the censored quantile regression model. This model is an extension of the ‘Tobit’ model and is designed to handle situations in which some of the observations on the dependent variable are censored.

Moshe Buchinksy

Regression-Discontinuity Analysis

The regression discontinuity (RD) data design is a quasi-experimental evaluation design first introduced by Thistlethwaite and Campbell (1960) as an alternative approach to evaluating social programmes. The design is characterized by a treatment assignment or selection rule which involves the use of a known cut-off point with respect to a continuous variable, generating a discontinuity in the probability of treatment receipt at that point. Under certain comparability conditions, a comparison of average outcomes for observations just left and right of the cut-off can be used to estimate a meaningful causal impact. While interest in the design had previously been mainly limited to evaluation research methodologists (Cook and Campbell, 1979; Trochim, 1984), the design is currently experiencing a renaissance among econometricians and empirical economists (Hahn, Todd and van der Klaauw, 1999; 2001; Angrist and Krueger, 1999; Porter, 2003). Among the main econometric contributions have been the formal derivation of identification conditions for causal inference and the introduction of semiparametric estimation procedures for the design. At the same time, a large and rapidly growing number of empirical applications are providing new insights into the applicability of the design, which have led to the development of several sensitivity and validity tests.

Wilbert Van Der Klaauw

Roy model

The Roy (1951) model of self-selection on outcomes is one of the most important models in economics. It is a framework for analysing comparative advantage. The original model analysed occupational choice with heterogeneous skill levels and has subsequently been applied in many other contexts. We first discuss the model. We then summarize what is known about identification of the model. We end by describing some applications based on the model and its extensions.

James J. Heckman, Christopher Taber

Rubin Causal Model

The Rubin Causal Model (RCM) is a formal mathematical framework for causal inference, first given that name by Holland (1986) for a series of previous articles developing the perspective (Rubin, 1974; 1975; 1976; 1977; 1978; 1979; 1980). There are two essential parts to the RCM, and a third optional one. The first part is the use of ‘potential outcomes’ to define causal effects in all situations — this part defines ‘the science’, which is the object of inference, and it requires the explicit consideration of the manipulations that define the treatments whose causal effects we wish to estimate. The second part is an explicit probabilistic model for the assignment of ‘treatments’ to ‘units’ as a function of all quantities that could be observed, including all potential outcomes; this model is called the ‘assignment mechanism’, and defines the structure of experiments designed to learn about the science from observed data or the acts of nature that lead to the observed data. The third possible part of the RCM framework is an optional distribution on the quantities being conditioned on in the assignment mechanism, including the potential outcomes, thereby allowing model-based Bayesian ‘posterior predictive’ (causal) inference. This part of the RCM focuses on the model-based analysis of observed data to draw inferences for causal effects, where the observed data are revealed by applying the assignment mechanism to the science. A full-length text that discusses estimation and inference for causal effects from this perspective is Imbens and Rubin (2006).

Guido W. Imbens, Donald B. Rubin

Selection Bias and Self-Selection

The problem of selection bias in economic and social statistics arises when a rule other than simple random sampling is used to sample the underlying population that is the object of interest. The distorted representation of a true population as a consequence of a sampling rule is the essence of the selection problem. Distorting selection rules may be the outcome of decisions of sample survey statisticians, self-selection decisions by the agents being studied, or both.

James J. Heckman

Semiparametric Estimation

Semiparametric estimation methods are used to obtain estimators of the parameters of interest — typically the coefficients of an underlying regression function — in an econometric model, without a complete parametric specification of the conditional distribution of the dependent variable given the explanatory variables (regressors). A structural econometric model relates an observable dependent variable y to some observable regressors x; some unknown parameters β, and some unobservable ‘error term’ ε, through some functional y = g(x, β, ε); in this context, a semiparametric estimation problem does not restrict the distribution of ε (given the regressors) to belong to a parametric family determined by a finite number of unknown parameters, but instead imposes only broad restrictions on the distribution of e (for example, independence of ε and x, or symmetry of ε about zero given x) to obtain identification of β and construct consistent estimators of it.

James L. Powell

Simulation-Based Estimation

Simulation-based estimation is an application of the general Monte Carlo principle to statistical estimation: any mathematical expectation, when unavailable in closed form, can be approximated to any desired level of accuracy through a generation of (pseudo-) random numbers. Pseudo-random numbers are generated on a computer by means of a deterministic method. (For convenience, we henceforth delete the qualification ‘pseudo’.) Then a well-suited drawing of random numbers (or vectors) Z₁ , Z₂ ,…,Z_H provides the Monte Carlo simulator(1=H)Σ_Hh=1g(Zh) of E[g(Z)]. Of course, one may also want to resort to many simulators improving upon this naive one in terms of variance reduction, increased smoothness and reduced computational cost. A detailed discussion of simulation techniques is beyond the scope of this article. Nor are we going to study Monte Carlo experiments, which complement a given statistical procedure by the observation of its properties on simulated data. Rather, our focus of interest is to show how Monte Carlo integration may directly help to compute estimators that would be unfeasible without resorting to simulators.

Eric Renault

Social Interactions (Empirics)

The empirical economics literature on social interactions addresses the significance of the social context in economic decisions. Decisions of individuals who share a social milieu are likely to be interdependent. Recognizing the nature of such interdependence in a variety of conventional and unconventional settings and measuring empirically the role of social interactions poses complex econometric questions. Their resolution may be critical for a multitude of phenomena in economic and social life and of matters of public policy. Questions like why some countries are Catholic and others Protestant, why crime rates vary so much across cities in the same country, why fads exist and survive, and why there is residential segregation and neighbourhood tipping are all in principle issues that may be examined as social interactions phenomena.

Yannis M. Ioannides

Spatial Econometrics

Spatial econometrics is concerned with models for dependent observations indexed by points in a metric space or nodes in a graph. The key idea is that a set of locations can characterize the joint dependence between their corresponding observations. Locations provide a structure analogous to that provided by the time index in time series models. For example, near observations may be highly correlated but, as distance between observations grows, they approach independence. However, while time series are ordered in a single dimension, spatial processes are almost always indexed in more than one dimension and not ordered. Even small increases in the dimension of the indexing space permit large increases in the allowable patterns of interdependence between observations. The primary benefit of this modelling strategy is that complicated patterns of interdependence across sets of observations can be parsimoniously described in terms of relatively simple and estimable functions of objects like the distances between them.

Timothy G. Conley

Survey Data, Analysis of

When economists analyse survey data, they must confront characteristics of the data-generating process that may distinguish these data from other types, such as administrative records.

Jeff Dominitz, Arthur van Soest

Tobit model

The Tobit model, or censored regression model, is useful to learn about the conditional distribution of a variable y* given a vector of regressors x, when y* is observed only if it is above or below some known threshold (censoring). In the original model of Tobin (1958), for example, the dependent variable was expenditures on durables, and values below zero are not observed.

Jean-Marc Robin

Treatment Effect

A ‘treatment effect’ is the average causal effect of a binary (0–1) variable on an outcome variable of scientific or policy interest. The term ‘treatment effect’ originates in a medical literature concerned with the causal effects of binary, yes-or-no ‘treatments’, such as an experimental drug or a new surgical procedure. But the term is now used much more generally. The causal effect of a subsidized training programme is probably the mostly widely analysed treatment effect in economics (see, for example, Ashenfelter, 1978, for one of the first examples, or Heckman and Robb, 1985 for an early survey). Given a data-set describing the labour market circumstances of trainees and a non-trainee comparison group, we can compare the earnings of those who did participate in the programme and those who did not. Any empirical study of treatment effects would typically start with such simple comparisons. We might also use regression methods or matching to control for demographic or background characteristics.

Joshua D. Angrist

Variance, Analysis of

Analysis of variance (ANOVA) represents a set of models that can be fit to data, and also a set of methods for summarizing an existing fitted model. We first consider ANOVA as it applies to classical linear models (the context for which it was originally devised; Fisher, 1925) and then discuss how ANOVA has been extended to generalized linear models and multilevel models. Analysis of variance is particularly effective for analysing highly structured experimental data (in agriculture, multiple treatments applied to different batches of animals or crops; in psychology, multi-factorial experiments manipulating several independent experimental conditions and applied to groups of people; industrial experiments in which multiple factors can be altered at different times and in different locations).

Andrew Gelman

Springer Professional

Über dieses Buch

Inhaltsverzeichnis

Frontmatter