Skip to main content
main-content

Über dieses Buch

This book provides a self-contained presentation of the statistical tools required for evaluating public programs, as advocated by many governments, the World Bank, the European Union, and the Organization for Economic Cooperation and Development. After introducing the methodological framework of program evaluation, the first chapters are devoted to the collection, elementary description and multivariate analysis of data as well as the estimation of welfare changes. The book then successively presents the tools of ex-ante methods (financial analysis, budget planning, cost-benefit, cost-effectiveness and multi-criteria evaluation) and ex-post methods (benchmarking, experimental and quasi-experimental evaluation). The step-by-step approach and the systematic use of numerical illustrations equip readers to handle the statistics of program evaluation.

It not only offers practitioners from public administrations, consultancy firms and nongovernmental organizations the basic tools and advanced techniques used in program assessment, it is also suitable for executive management training, upper undergraduate and graduate courses, as well as for self-study.

Inhaltsverzeichnis

Frontmatter

1. Statistical Tools for Program Evaluation: Introduction and Overview

Abstract
Program evaluation intends to grasp the impact of collective projects on citizens, their economic, social and environmental consequences on individual and community welfare. The task is challenging as it is not so easy to put a value on items such as health, education or changes in environment. This chapter provides an overview of the program evaluation methodology, and offers an introduction to the present book. Program evaluation is defined as a process that consists in collecting, analyzing, and using information to assess the relevance of a policy, its effectiveness and its efficiency (Sect. 1.1). The approach usually starts with a context analysis which first relies on descriptive and inferential statistical tools to point out issues that must be addressed, then measures welfare changes associated with the program (Sect. 1.2). Afterwards, an ex ante evaluation can be conducted to set up solutions and to select a particular strategy among the competing ones (Sect. 1.3). Once the selected strategy has been implemented, ex post evaluation techniques assess the extent to which planned outcomes have been achieved as a result of the program, ceteris paribus (Sect. 1.4). The last section of this chapter finally offers a brief description of how to use the book (Sect. 1.5).
Jean-Michel Josselin, Benoît Le Maux

Identifying the Context of the Program

Frontmatter

2. Sampling and Construction of Variables

Abstract
The construction of a reliable and relevant database is a key aspect of any statistical study. Not only will misleading information create bias and mistakes (sampling, coverage or measurement errors, etc.), but it may also seriously affect public decisions if the study is used for guiding policy-makers. The sample should be sufficiently representative of the population of interest (Sect. 2.1). The time needed to collect and process the data is an important issue in this respect (Sect. 2.2). As the purpose of a survey is to obtain sincere responses from the respondents, the design of the questionnaire also has its importance; in particular the sequencing, phrasing and format of questions (Sect. 2.3). The process of data collection should not be neglected either. The analyst must decide which sampling method is more relevant (e.g., non-probability vs. probability sampling) and assess the efficacy of data collection during the survey process (Sect. 2.4). Last, the coding of variables and the way measurement and nonresponse errors are investigated are an essential step of data construction (Sect. 2.5).
Jean-Michel Josselin, Benoît Le Maux

3. Descriptive Statistics and Interval Estimation

Abstract
This chapter reviews the different statistical methods used to describe a sample and make inference for a larger population. Despite its apparent simplicity, one should not underestimate the importance of the task, especially in the context of public policies. Providing basic descriptive statistics to point out the issues that must be addressed is a preliminary and necessary step in program evaluation (Sect. 3.1). One-way and two-way tables summarize the data in a very efficient manner (Sect. 3.2). Bar graphs, pie charts, histograms, line graphs and radar charts can also be generated at the evaluator’s convenience (Sect. 3.3). To go further, numerical analysis rests on measures of central tendency (mode, median, and mean), and of dispersion (interquartile range, variance, standard deviation, coefficient of variation) (Sect. 3.4). The asymmetry of a distribution and its “tailedness” can be approximated by the skewness and kurtosis coefficients (Sect. 3.5). Last, in most cases, the description of a database is done in the context of a sample survey. Any generalization to the population of interest thus involves the calculation of confidence intervals (Sect. 3.6). Several examples illustrate the methods.
Jean-Michel Josselin, Benoît Le Maux

4. Measuring and Visualizing Associations

Abstract
One goal of statistical studies is to highlight associations between pairs of variables. This is particularly useful when one wants to get a clear picture of a multi-dimensional data set and motivate a specific policy intervention (Sect. 4.1). Yet, the choice of a method is not straightforward. Testing for correlation is the relevant approach to investigate a linear association between two numerical variables (Sect. 4.2). The chi-square test is an inferential test that uses data from a sample to make conclusions about the relationship between two categorical variables (Sect. 4.3). When one variable is numerical and the other is categorical, the usual approach is to test for differences between means or to implement an analysis of variance (Sect. 4.4). When faced with more than two variables, it is also possible to provide a multidimensional representation of the problem using methods such as principal component analysis (Sect. 4.5) and multiple correspondence analysis (Sect. 4.6). The idea is to reduce the dimensionality of a data set by plotting all the observations on 2D graphs describing how observations cluster with respect to various characteristics. These groups can for instance serve to identify the beneficiaries of a particular intervention. Using R-CRAN, several examples are included in this chapter to illustrate the different methods.
Jean-Michel Josselin, Benoît Le Maux

5. Econometric Analysis

Abstract
Econometrics encompasses several multivariate tools for testing a theory or hypothesis, quantifying it and providing indications about the evolution of an outcome of interest. This chapter gives the basic knowledge on how those tools work, with examples and the corresponding R-CRAN codes. The first step is dedicated to simple and multiple linear regression models and their estimation by the method of ordinary least squares (Sects. 5.1 and 5.2). The classical assumptions (e.g., linearity, normality of residuals, homoscedasticity, non-autocorrelation) underlying the method are exposed (Sect. 5.3). An important step in conducting an econometric analysis is model specification as it determines the validity of the regression analysis (Sect. 5.4). Another issue is the choice of the functional form that best fits the data, i.e. whether the variables are expressed or not in a non-linear form (Sect. 5.5). Several tests also exist to detect potential misspecifications (Jarque-Bera, Breusch-Pagan, and Durbin-Watson tests) which are fully detailed (Sect. 5.6). Last, the selection of the final model relies on a meticulous examination of regression outputs, e.g., whether the variables sufficiently explain the outcome of interest (Sect. 5.7). Methods are then extended to the case where the latter is binary (Sect. 5.8), the so-called logit and probit models.
Jean-Michel Josselin, Benoît Le Maux

6. Estimation of Welfare Changes

Abstract
Since public projects have consequences on individual lives, the estimation of welfare changes is an essential step in the evaluation process (Sect. 6.1). To this end, this chapter gives several tools for eliciting individual preferences. The first set of methods consists of stated preferences techniques whereby individuals declare what their perceptions are of the project and its consequences. Those methods include contingent valuation and discrete choice experiment. The former consists in asking directly a sample of individuals their willingness to pay for a program (Sect. 6.2). Discrete choice experiment on the other hand asks the agents to compare a set of public goods or services. It estimates a multi-attribute utility function based on the idea that agents’ preference for goods depend on the characteristics they contain (Sect. 6.3). The second set of methods comprehends revealed preferences techniques, where preferences are inferred from what is observed on existing markets. For instance, the hedonic pricing method values the implicit price of non-market goods, e.g., proximity of a school or air quality, from their impact on real estate market prices (Sect. 6.4). In the same vein, the travel cost method estimates the demand for recreational sites based on the costs incurred by people for visiting the site (Sect. 6.5). Last, the third set of methods is commonly used for the assessment of public health decisions. It aims to estimate directly the utility levels (e.g., QALY) associated with particular health states (Sect. 6.6).
Jean-Michel Josselin, Benoît Le Maux

Ex ante Evaluation

Frontmatter

7. Financial Appraisal

Abstract
Financial analysis provides decision-makers with information about the costs and revenues of their investment projects (Sect. 7.1). The time value of money is in this respect a central concept. Dollars at different points in time are compared when money is borrowed or when it is invested (Sect. 7.2). More specifically, the methodology is concerned with sustainability and profitability. A strategy is said to be financially sustainable when it does not incur the risk of running out of cash during its lifetime (Sect. 7.3). Financial profitability is the ability of the project to achieve a satisfactory rate of return (Sect. 7.4). Another important consideration is whether cash flows are to be expressed in current or real value (Sect. 7.5). Additional methods (e.g., accounting rate of return, break-even point or payback period) can also be used to compare different investment strategies based on the extent to which the return on investment compensates the initial outlay, the volume of activity that is needed to cover the costs, or the time it takes for a strategy to earn back the money initially invested (Sect. 7.6). Last, the model’s assumptions can be investigated with a deterministic sensitivity analysis through the calculation of switching points and elasticities (Sect. 7.7). A spreadsheet framework example is accompanying the reader across the sections.
Jean-Michel Josselin, Benoît Le Maux

8. Budget Impact Analysis

Abstract
Budget impact analysis examines the extent to which the introduction of a new strategy in an existing program affects an agency’s budget. Not only does the method provide information about the costs generated by a new intervention or treatment, but it also assesses how the new strategy will affect the overall supply of services and the amount of resources devoted to them. The approach may serve for instance to evaluate the impact of a new drug on the health care system, or be part of a budget planning process in order to analyze multiple scenarios. The present chapter first offers an introduction to the method (Sect. 8.1). The analytical framework is then presented in the case of a single supply, e.g., one school, one drug or more generally one service or strategy (Sect. 8.2). The setting is extended to several supplies and compares the current environment with a new one in which an additional strategy is added (Sect. 8.3). An example is performed using an Excel spreadsheet with information about the distribution of populations, exit rates and costs among different strategies (Sect. 8.4). A deterministic sensitivity analysis is based on this example and uses visual basic tools to examine how the budget impact reacts to the changes in forecast assumptions (Sect. 8.5).
Jean-Michel Josselin, Benoît Le Maux

9. Cost Benefit Analysis

Abstract
Cost benefit analysis compares policy strategies not only on the basis of their financial flows but also on the basis of their overall impact, be it economic, social, or environmental (Sect. 9.1). To do so, the approach relies on the concepts of Pareto-optimality and Kaldor-Hicks compensation to assess whether public programs are globally beneficial or detrimental to society’s welfare (Sect. 9.2). All effects must be expressed in terms of their equivalent money values and discounted based on how society values the well-being of future generations (Sect. 9.3). Using conversion factors, the cost benefit analysis methodology also ensures that the price of inputs and outputs used in the analysis reflects their true economic values (Sect. 9.4). Last, the analysis ends with a deterministic or probabilistic sensitivity analysis which examines how the conclusions of the study change with variations in cash flows, assumptions, or the manner in which the evaluation is set up (Sects. 9.5 and 9.6). One may also employ a mean-variance analysis to compare the performance and risk of each competing strategy (Sect. 9.7). Applications and examples, provided on spreadsheets and R-CRAN, will enable the reader to take the methodology in hand.
Jean-Michel Josselin, Benoît Le Maux

10. Cost Effectiveness Analysis

Abstract
Cost effectiveness analysis compares the differential costs and outcomes of policy strategies competing for the implementation of a program. Outcomes are measured by arithmetical or physical units instead of by an equivalent money value (Sect. 10.1). Indicators of cost effectiveness are the incremental cost effectiveness ratio and its generalization as a function of collective willingness to pay, i.e. the incremental net benefit (Sect. 10.2). Beyond the pairwise comparison of strategies through those indicators, the efficiency frontier identifies policy options subject to simple and extended dominance, and selects the efficient ones (Sect. 10.3). Cost and outcome data is usually obtained from decision analytic modeling, here Markov models (Sect. 10.4). Section 10.5 provides a numerical example in R-CRAN. Cost effectiveness analysis is notably used in public health economics where the health outcome of a medical intervention is assessed by quality-adjusted life years (Sect. 10.6). Section 10.7 debates the uncertainty surrounding decision analytic models and uses Monte Carlo simulations to address parameter uncertainty. Section 10.8 shows how to analyze simulation outputs on the differential cost-effectiveness plane and with cost effectiveness acceptability curves, with self-contained R-CRAN programs.
Jean-Michel Josselin, Benoît Le Maux

11. Multi-criteria Decision Analysis

Abstract
Multiple criteria decision analysis is devoted to the development of decision support tools to address complex decisions, especially where other methods fail to consider more than one outcome of interest. The approach is very flexible as outcomes can be quantifiable in non-monetary terms and be expressed in ordinal or numerical terms (Sect. 11.1). Basically speaking, it starts with the construction of a value tree and the identification of relevant criteria (Sect. 11.2). The approach then proceeds with gathering information about the performance of each assessed alternative against the whole set of criteria. Values are generally normalized from 0 to 1, thereby constituting what is termed a score matrix (Sect. 11.3). Numerical weights are also assigned to criteria to better reflect their relative importance (Sect. 11.4). Weights and scores are then combined to arrive at a ranking or sorting of alternatives. Should a compensatory analysis be implemented, the approach would rely on aggregation methods to build a composite indicator (Sect. 11.5). Should a non-compensatory analysis be carried out, the approach would instead examine each dimension individually (Sect. 11.6). Furthermore, a sensitivity analysis of the weights and scores can be used to explore how changes in assumptions influence the results (Sect. 11.7).
Jean-Michel Josselin, Benoît Le Maux

Ex post Evaluation

Frontmatter

12. Project Follow-Up by Benchmarking

Abstract
Benchmarking is a follow-up evaluation tool that compares the cost structure of facilities with that of a given reference, the benchmark or yardstick. What is assessed is not a policy per se, but the facilities in charge of implementing it (Sect. 12.1). The method is applicable to any public service operating within a multiple-input multiple-output setting and equipped with a cost accounting system (Sect. 12.2). As the demand for a set of services plays a determinant role in explaining the average cost of a facility, the first step is to delineate the effects of the demand structure on cost (Sect. 12.3). Benchmarking also assesses whether an extra cost observed in one facility is due to price effects or to the allocation of inputs among services (Sect. 12.4). The stakeholders of the public project may also wish to get alternative or complementary information on the role each input plays within the production structure. A simple reorganization of the data allows it (Sect. 12.5). Last, the method can be used to motivate operations improvement or to help a decision-maker understand where the performance falls in comparison to others (Sect. 12.6).
Jean-Michel Josselin, Benoît Le Maux

13. Randomized Controlled Experiments

Abstract
In many cases, the presence of confounding factors makes the identification of causal effects rather difficult. One solution to avoid potential bias is to run a randomized controlled experiment, either in the form of a clinical trial or a field experiment (Sect. 13.1). The basic tenet is to assign the subjects to a control group and a treatment group, such that they share similar characteristics on average (Sect. 13.2). The impact of an intervention is then obtained by comparing the average outcomes observed in both groups and testing whether the difference is significant (Sect. 13.3). An important issue is to assess the risks of type I and type II errors, i.e. the probabilities that the statistical test yields the wrong conclusion (Sect. 13.4). Controlling for those risks implies finding the minimum number of subjects to enroll in the experiment to achieve a given statistical power (Sect. 13.5). Another issue is to select an indicator (e.g., absolute risk reduction, relative risk ratio, odds ratio, number needed to treat) in order to point out the number of successes and failures in each group (Sect. 13.6). The analysis can also be extended to a more general framework were the timing of event occurrence is explicitly accounted for, via the estimation of survival curves with the Kaplan-Meier approach (Sect. 13.7) and the implementation of the Mantel-Haenszel test (Sect. 13.8).
Jean-Michel Josselin, Benoît Le Maux

14. Quasi-experiments

Abstract
Quasi-experimental evaluation estimates the causal impact of an intervention based on observational data. The approach differs from a randomized controlled experiment in that the subjects are not randomly assigned between a treatment and a control group. Hence, the issue is to find a proper comparison group, the so-called counterfactual, that resembles the treatment group in everything but the fact of receiving the intervention (Sect. 14.1). Several methods exist. The differences in differences method uses panel data and calculates the effect of an intervention by comparing the changes in outcome over time between the treated and non-treated groups (Sect. 14.2). Propensity score matching relies on the estimation of scores (probability of participating in the treatment) to select and pair subjects with similar characteristics. The impact is then computed as the difference in means between the two selected groups (Sect. 14.3). Regression discontinuity design compares subjects in the vicinity of a cutoff point around which the intervention is dispensed. The underlying assumption is that subjects lying closely on either side of the threshold share similar characteristics (Sect. 14.4). Last, instrumental variable estimation addresses the problem of endogeneity in individual participation (Sect. 14.5). Examples are provided all along this chapter, with detailed R-CRAN programs, to provide the reader with a complete description of these approaches.
Jean-Michel Josselin, Benoît Le Maux
Weitere Informationen

Premium Partner

    Bildnachweise