2022 | Book

# Ten Projects in Applied Statistics

Author: Peter McCullagh

Publisher: Springer International Publishing

Book Series : Springer Series in Statistics

2022 | Book

Author: Peter McCullagh

Publisher: Springer International Publishing

Book Series : Springer Series in Statistics

The first half of the book is aimed at quantitative research workers in biology, medicine, ecology and genetics. The book as a whole is aimed at graduate students in statistics, biostatistics, and other quantitative disciplines. Ten detailed examples show how the author approaches real-world statistical problems in a principled way that allows for adequate compromise and flexibility. The need to accommodate correlations associated with space, time and other relationships is a recurring theme, so variance-components models feature prominently. Statistical pitfalls are illustrated via examples taken from the recent scientific literature. Chapter 11 sets the scene, not just for the second half of the book, but for the book as a whole. It begins by defining fundamental concepts such as baseline, observational unit, experimental unit, covariates and relationships, randomization, treatment assignment, and the role that these play in model formulation. Compatibility of the model with the randomization scheme is crucial. The effect of treatment is invariably modelled as a group action on probability distributions. Technical matters connected with space-time covariance functions, residual likelihood, likelihood ratios, and transformations are discussed in later chapters.

Advertisement

Abstract

The first project is a medical research problem that was brought to the statistical consulting program in the Department of Statistics by George Huang in the early 1990s. At the time, Dr Huang was a resident in the department of Surgery. He was studying the effect of high-pressure oxygen treatment on the healing of surgical wounds, anticipating that oxygen might have a beneficial effect. He experimented on rats.

From a statistical perspective, the experimental design is straightforward. After surgery, one subset of rats received the high-pressure oxygen treatment daily. The remaining rats served as controls. Control rats were put in the hyperbaric chamber daily, but at normal pressure and normal 21% oxygen level.

Although the design is completely randomized with two treatment levels, there are several complications. First, the plan called for every rat to be measured at five sites, so there are five times as many observational units as there are experimental units. Second, there is ample evidence that values at different sites on the same rat are not independent. Third, a non-negligible fraction of rats had to be excluded for medical reasons that are discussed in the text. Fourth, among the non-excluded rats, the fraction of missing values is around 15%, i.e., small but not entirely negligible. Fifth, sites near the shoulder seem to heal slightly faster and stronger than sites near the tail.

The chief goal is to estimate the treatment effect and to estimate its standard error taking into account the most important of these complications. As it happens, the effect is small, but it is in the direction opposite to that anticipated.

Abstract

The second project is a small-scale classically designed experiment that is concerned with comparing efficiencies of chain saws. Duplicate saws of three brands are available for comparison. Logs of three species, pine, spruce and larch, are available as experimental material. Half of the logs are de-barked; the remainder are intact. Six teams are available, ranging in experience from novices to professionals. The response is the time taken by the team to complete a designated task with the saw provided.

The experimental design includes five factors, wood-species, bark, saw-id and saw-brand, and team, but not all combinations of levels because saw-id determines saw brand. It is technically a four-factor fractional factorial design embedded in a 6 × 6 Latin square.

The text first discusses the need for transformation before using linear decompositions. The analysis begins by examining marginal means, and the discussion then centers on ways to compute standard errors for species effects and standard errors for brand effects. The fact that there are only two duplicate saws per brand means that standard errors for brand effects are not computed in the Latin-square standard manner.

Abstract

The third project takes a look at the data from an experiment on mating preferences of fruit flies. The goal of that experiment was to understand whether assortative mating is induced by diet. In other words, do flies raised on one diet mate preferentially with flies raised on on the same diet versus flies raised on a different diet? If so, what is the causal agent?

Details of the experimental design are complicated and fascinating. Two breeding populations of genetically identical fruit flies were raised separately for roughly forty generations on one of two diets, C or S. At certain stages, flies destined for experimentation were removed from the breeding populations and raised for one intermediate generation on the standard C diet before testing was done. Thus the testing for generation six was done on the virgin offspring, so generation six is really 6+1.

The experimental set-up for mating tests consisted of a number of mating wells, with four flies in each well, one male and one female of each ancestral dietary type. Over a one-hour period, each mating was noted, and the totals for each type were recorded. In other words, each observational unit is a mating, and each mating is one of four types, CC, CS, SC or SS. The data are presented in the form of a contingency table indexed by 18 generations and four mating types.

Homogamic matings, types CC and SS, outnumber heterogamic types by a 2:1 ratio, so assortative mating induced by diet is clearly established. However, the data also exhibit several features that come as statistical surprises. In particular, the statistical variation in the contingency-table counts is only one third of what is predicted by binomial or multinomial models.

The bulk of this project deals with the subsequent investigation to explore and understand the sources of under-dispersion. The female refractory effect turns out to be crucial; a male may mate a second time within the observation period if a receptive female is available, but a mated female does not mate a second time. Whereas the initial analysis treated each mating event as one observational unit with four types, subsequent analyses treat each well as one observational unit with nine possibilities for each well.

The more detailed contingency table supports the thesis of assortative mating. And it provides a partial rationale for the under-dispersion previously observed. But it also reveals a totally unanticipated lack of independence of mating behaviours for flies in distinct wells.

Abstract

This project deals with growth curves for arabidopsis plants of two strains. Over a period of 30 days, the height of each plant was recorded every few days. The challenge is to formulate a statistical model for the temporal trajectories of each plant, taking into account various obvious facts.

First, growth is a continuous process, but it is not linear in time. Growth curves for different plants exhibit similar temporal patterns. Is there a useful family of non-linear functions available? How do we account for temporal correlations?

These and other matters are discussed using inverse linear and inverse quadratic functions for the characteristic mean height as a function of time. Deviations from the mean curve are modelled by a single Brownian path. In addition to the measurement process, Brownian motion is also used as a model for plant-specific deviations that are continuous in time.

Parameter estimation is discussed, both for variance components and strain effects. A strong distinction is drawn between the fitted-value curve, and the predicted-value curve for the trajectory of a new plant. The fitted-value curve is inverse-quadratic for each strain, while the predicted-value curve for a new plant has an additional linear spline component.

Abstract

In a 2019 paper published in the Proceedings of the National Academy of the U.S., Villa et al. make the following claim:

...we show that experimental evolution of parasite body size over 4 y (approximately 60 generations) leads to reproductive isolation in natural populations of feather lice on birds. When lice are transferred to pigeons of different sizes, they rapidly evolve differences in body size that are correlated with host size. These differences in size trigger mechanical mating isolation between lice that are locally adapted to the different sized hosts.

The goal of this project is to examine the experimental design and the analysis in order to judge whether the data support the authors’ claim.

As we might expect, the design is elaborate. Several aspects of the design and conduct of the experiment are discussed at length. After preparation of the host birds, the lice were arranged in 32 lineages, one lineage per bird. Birds were housed in aviaries, four birds of the same type per aviary. The birds are of two body types, either large or normal size.

The first complication arises from the fact that lice are assigned at random to clean birds at baseline. Randomization implies exchangeability, so it is necessary to arrange matters so that all models incorporate baseline exchangeability. Off-the-shelf factorial models such as those employed by the authors, do not have this property. Also, if bird or lineage, is used solely as a block factor, the values over time are exchangeable. But the standard genetic model for the temporal evolution of a quantitative trait uses Brownian motion, which implies serial correlation.

The main thrust of the project is to investigate whether the evidence in the data support baseline exchangeability and/or serial correlation for lineages, and ultimately whether the authors claim of rapid differential evolution of louse body size associated with host size is supported.

The conclusions are all negative. Over the four-year period, no evidence is found that the mean louse body size has increased or decreased for large hosts. No evidence is found that the mean louse body size has increased or decreased for small hosts, and no evidence is found of a differential change in louse body size associated with host size. Evolution is not rapid; on this time scale it is undetectable.

Abstract

This chapter is a largely empirical investigation of various aspects of the central England daily temperature series from Jan 1 1772 to Dec 31 2019, i.e., 90 580 days over 248 years.

To a close approximation, the annual cycle has a mean that is a harmonic of degree two, and the same applies to the volatility, which is about 50% higher in winter than in summer. Skewness is slightly positive in summer, and slightly negative in winter. Kurtosis is highly variable, but mostly negative or short-tailed.

Graphs are used to show that the phase of the annual cycle has changed little over the centuries. For the past century, however, temperatures have increased noticeably and reasonably uniformly throughout the year. The volatility pattern has not changed perceptibly.

Some aspects of long-range dependence are studied by examining the variance of block averages for contiguous blocks of different lengths. It is found empirically that the variance of a block average decreases not in proportion to its length, but in proportion to the square root of its length.

The variogram is also computed and studied at various lags from days to weeks to years. The pattern at short to moderate lags from one day to a few years is a good match with the α-stable covariance function with spectral density \(\exp (-|\omega |{ }^{1/2})\). No sign of an asymptote is found in the variogram at lags up 100 years, which points either to long-range dependence or failure of stationarity.

Abstract

This chapter continues the empirical analysis of the central England daily temperature series using Fourier techniques indexed by frequencies. The first step is to compute the Fourier transform and to plot the spectrum, which is the absolute Fourier coefficient plotted against frequency. After dropping the principal annual harmonics, the log spectrum at all but the very lowest frequencies is found to decrease linearly as a function of the square rot of the frequency. In other words, the short and medium range behaviour is close to that of the α-stable process, i.e., the stationary process whose covariance function is proportional to the density of the α-stable distribution with α = 1∕2. This finding is in accord with the behaviour of block averages noted earlier: the variance of the sample average is inversely proportional to the square root of the block length.

Trajectories of the α-stable process are simulated and compared with trajectories of the Matérn process at varying degrees of smoothness.

Abstract

This project was prompted by the publication in 2011 of a paper by Q.D. Atkinson, on the geographic patterns of linguistic diversity in Africa and elsewhere.

Like the genetic thesis for human migration and evolution, Atkinson’s ‘Out-of-Africa’ thesis for linguistic diversity holds that language evolved somewhere in Africa, and diffused from there to Asia, Europe and elsewhere as populations split and migrated. Since the genetic and linguistic diversity of a population is intrinsically related to its size, a small migrating subset carries less diversity than the population from which it originated. Accordingly, a subpopulation that splits and migrates carries less diversity than the descendants of the ancestral population that remains. Although tones and sounds are continuously gained and lost in all languages, the loss is supposedly higher for small migrating founder populations than for the ancestral population. In this way, the diversity of sounds becomes progressively reduced as the distance from the origin increases.

Is Atkinson’s thesis supported by the data? Before embarking on technical details of linguistic families and spatial correlation functions, a plot of diversity against distance, done separately for each major continent, gives cause for alarm. In essentially every continent other than Africa, the trend with distance is positive, not negative. This is an example of Simpson’s paradox for continuous data.

To address statistical aspects of the question, it is necessary to take account not only of distance from the origin, but also of correlations associated with geographic distance and correlations associated with linguistic family. Atkinson concluded that language evolved from a point somewhere near the coast of equatorial Africa. And that conclusion can be reproduced if correlations are ignored. But when correlations are taken into account, the analogous procedure concludes with a confidence region that includes all of Africa.

As a follow-up project for her Masters thesis, Josephine Santoso compiled a more extensive data set of 1277 world languages. This is discussed in Sect. 8.7. Where Atkinson’s phoneme response emphasized tones over vowels and consonants, Santoso’s primary analysis counts the total for all three. When all three are counted equally, the conclusion using the same techniques is that language evolved from somewhere in Europe or Africa with no strong preference from one spot over another. But the west of Ireland has the highest likelihood! All-in-all, one would have to conclude that the data do not provide strong support for the Out-of-Africa thesis.

Abstract

This chapter discusses two environmental projects, one concerned with understanding the effects of atmospheric warming on plants, and one concerned with the effects of plant species on the rate of infection in bumblebees.

The first project is an experiment designed to study the effect of atmospheric warming on photosynthesis in broadleaf trees and in conifers. It is a classical field experiment conducted over three years at two geographic sites in Minnesota, each site consisting of several blocks, with several plots per block. Ten boreal species are grown in each plot. Treatment is assigned to certain plots selected at random. Treatment consists of underground electrical cables to warm the soil, plus heating lamps to warm the air locally. The response variables are of two types. First, there is a range of photosynthetic variables measured on one leaf from each of several trees per plot at several points throughout the growing season. Second, on the same occasions, soil moisture content is measured on each plot.

For photosynthetic measurements, each observational unit is a leaf or a tree on specific date. There are many observational units per plot-date and each sample includes up to ten units per plot. For soil moisture content, each observational unit is a plot at a specific date. Thus, a statistical analysis of soil moisture content has many fewer observational units than an analysis of photosynthetic measurements.

Soil moisture content is governed largely by recent rainfall, so strong correlations are to be expected for moisture values recorded on the same date, and weaker serial correlations are also expected up to about two weeks.

Treatment increases evaporation, and is expected to have an effect on soil moisture content. Air temperature may also affect photosynthesis, but the availability of water is expected to dominate. It is anticipated that any such effect should be constant over years and over sites.

The effect of treatment on photosynthesis is not expected to be the same for each species, so it is best initially to examine the data species by species. Also, photosynthesis and soil water content are not expected to be independent. To the extent that changes in one cause changes in the other, it is clear that soil water content is the driver, and that the current value is most relevant. This rationale suggests first looking at how soil water content is affected by treatment, and then examining the conditional distribution given soil water content.

The bumblebee project studies whether the rate of infection in bumblebees depends on the foraging plants available. The bees were organized in microcolonies, each colony being housed in one tent for the two-week period. The foraging plants in one tent It is a classical design with five replicates or rounds, each round lasting two weeks. Nine tents were available for each round; they are labelled in three blocks of three. Each observational unit is one bee, but each experimental unit is one microcolony, or one tent in one round.

The discussion centers on observational units versus experimental units, the use of generalized linear mixed models, and the possibility of using beta-binomial models, or even generalized linear models with allowance for over-dispersion.

Abstract

The project in this chapter is the analysis of the breeding record of birds at the Eynhallow colony in the Orkney islands. Fulmars have a relatively long adolescence, and commence breeding when they are around 3–7 years. Adults are monogamous, they form long-term pair bonds, and breeding pairs return to the same nest year after year. Breeding begins in May; a single egg is laid and incubated by both parents for about 50 days.

The record for each female bird at Eynhallow begins in the year when the first egg was observed, and continues until the last egg was laid at Eynhallow. Sequence lengths range from one to 40 years. The observation for each pair is a measure of breeding success, coded as 0–4 in each year.

From a statistical perspective, the structure of the design is rather complicated. On the one hand, the sequence for each bird is a time series. But, unlike a temperature series or an economic series, the series for each bird is finite. It commences with a non-zero initial value and terminates with a final non-zero value. Intermediate values may be zero or non-zero. There is a noticeable annual effect, which means that the outcomes for two birds in the same year are correlated.

Just as the weather in some years is more favourable than in others, so too, some birds are more successful breeders than other. The analysis demonstrates that the variation in success between birds is about four times the variation between years. Serial correlation is also noted.

Abstract

This chapter is a discussion of basic concepts in probability and applied statistics, beginning with the notion of probability and stochastic processes.

Experimental design matters are discussed, including the concept of a baseline, an experimental protocol, a population of observational units, samples and subsamples, all with illustrations taken from various projects. The role of baseline variables is discussed, together with various important distinctions, such as the distinction between functions and relationships. Randomization is a procedure specified in the protocol, and the treatment assignment is an immediate post-baseline outcome. The distinction between observational unit and experimental unit is a consequence of the randomization protocol.

Every biological population evolves in time; only the current population is accessible today. If we aim to make statements about values for future or past units, the stochastic model must be defined on the entire population, past present and future. Thus, in essentially all biological applications, the population is infinite, while the accessible population is finite. A sample is an ordered subset of the population, and an accessible sample is an ordered subset of the current population. The implications of this reasoning are discussed in the context of stochastic models for clinical trials.

The chapter concludes with a discussion of the interpretation of variability for parameter estimates.

Abstract

This chapter discusses a number of principles of statistical modelling. Foremost is the principle of consistency of stochastic formulations under sub-sampling. It is this principle that allows the statistician to make statements about population values on the basis of values observed on a sample. This principle is ordinarily satisfied in the great majority of areas of application, but there are exceptions. The text discusses problems that arise from formulations that are not sampling consistent.

Adequacy of the model for the application is the second principle. Chapters 1–10 provide numerous examples of models that are adequate to varying degrees. Every model is proposed tentatively in the knowledge that it is inadequate to some degree. Depending on the application, some infelicities such as unaccommodated interaction or correlation may produce misleading conclusions; others such as non-normality or non-constancy of variance may be benign. It is necessary to understand the effect of various inadequacies on a range of conclusions.

The likelihood principle is included in third place because it is entirely subsidiary to the first two.

Finally, attitudes to statistics and applied mathematics are discussed, with particular reference to Richard Courant and George Box.

Abstract

Initial values, are values of a stochastic process measured at baseline, usually prior to the determination of patient eligibility, and certainly prior to randomization. Essentially all longitudinal studies are of this type. On the one hand, initial values are simply values of the process that happen to be made at time zero. In that sense, they are on a par with values recorded subsequently. On the other hand, initial values are pre-baseline and thus available for use as covariates on an equal footing with other baseline variables. Typically, initial and subsequent values on the same physical unit are positively correlated. The text discusses a number of issues and options for handling baseline values in a randomized trial. For example, the regression phenomenon suggests that patient selection may reduce the variability at baseline relative to variability at subsequent times.

If the initial values are initially regarded as random variables with distribution as specified by the model, it is essential that the distribution conform with the randomization scheme. In particular, the joint distribution of initial values cannot depend on the treatment effect. Chapter 5 provides an illustration of a stochastic model that clashes with the randomization.

Abstract

To the uninitiated, a stochastic model may seem to emerge from nowhere with little explanation. This chapter attempts to address the information gap by arguing that exchangeability is the key concept that justifies essentially all statistical models.

In its simplest form, exchangeability is the statement that a sample of values taken in one order has the same distribution as the sample taken in a different order. Depending on the context, the statement may be reasonable or it may be unreasonable. If the claim is unreasonable, it must be possible to demonstrate that by pointing to some aspect of the baseline sample configuration that makes the permuted sample different from the original. The general idea is that every sample comes with a baseline covariate configuration, and exchangeability is a verifiable consequence of congruent sample configurations. Stationarity and isotropy are manifestations of exchangeability, albeit with different groups.

This chapter discusses the implications of various forms of exchangeability in regression models, in block designs, in time series, and so on. The overarching idea is that two samples having the same baseline covariate configuration must have the same response distribution. Various examples of models having this property are discussed.

Treatment is modelled as a group action on probability distributions. To each control distribution P
_{θ}, the group element g associates a treatment distribution P
_{gθ}. Thus, if g is the identity element, the treatment and control distributions are equal. Essentially all generalized linear models with a treatment effect are of this form, with the treatment group being the additive group of real numbers. For example, hazard multiplication is a group action on survival distributions.

Abstract

This chapter is concerned with Gaussian distributions, either real Gaussian on \({\mathbb R}^n\) or complex Gaussian on \({\mathbb C}^n\), and also with the associated Gaussian Hilbert space. The text covers the matrix form of inner products and orthogonal projections, the connection between orthogonality and independence, and Cochran’s theorem. Least squares estimation is presented as an orthogonal projection having a given image; prediction is presented as an orthogonal projection having a given kernel. Tukey’s one degree of freedom for non-additivity in linear models is discussed, first as an algorithm. Distributional claims are described and discussed.

Abstract

This chapter deals with stationary Gaussian processes in \({\mathbb R}^d\) specified by their covariance function. In the case of a real Gaussian process, K(x, x′) is a real-symmetric function of the separation vector x − x′; in the case of a complex-valued process, K(x, x′) is a Hermitian function of x − x′. The real Matérn process has a covariance function that is the Fourier transform of the rotationally-symmetric Student t distribution on \({\mathbb R}^d\); the Gaussian process is real stationary and isotropic. The complex Matérn process has a covariance function that is the Fourier transform of the re-centered Student t distribution on \({\mathbb R}^d\); the process is stationary but not isotropic.

Hermitian covariance products are considered as models for a complex-valued space-time process. The real part of a Hermitian product is the sum of the product of the real parts and the product of the imaginary parts. It is real-symmetric and positive definite, so it may be used as a space-time covariance function. The product of the real parts is doubly symmetric, while the product of the imaginary parts is doubly skew-symmetric, so their contributions are very distinct. Both products are symmetric. The product of the real parts is positive definite, and the sum is also positive definite.

Abstract

Almost all methods of parameter estimation in statistical models begin with the likelihood function. This chapter discusses the theory of maximum likelihood in regular finite-dimensional models, where the parameter space is a finite-dimensional smooth manifold. The Bartlett identities for log likelihood derivatives are derived, and the Bartlett adjustment for likelihood-ration statistics is discussed. These ideas are illustrated by a range of examples from linear models and generalized linear models, including the estimation of the LD_{50} or other dose quantile in a logistic model setting.

A sketch of the argument is given for generalized linear models, Gaussian variance-component models and mixture models, including sparse-signal detection problems.

Abstract

The accommodation of correlations associated with baseline relationships is a recurrent theme in this book. For the most part, this is done using Gaussian models with several additive variance components. The preferred method for variance-component estimation is maximum likelihood based on residuals. In practice, all linear-model specifications are made in the observation space \({\mathbb R}^n\). This chapter develops the theory needed for maximum-likelihood based on residuals. That includes both point estimation of variance components, and likelihood-ratio tests either for components in the mean model or for components in the variance model. There are very few examples in applied work where measure-theoretic issues intrude into computations, but likelihood-ratio testing in this setting is one such example. The text emphasizes the role of the kernel subspace in connection with residuals, and explains how to compute a likelihood ratio keeping the kernel fixed.

Abstract

The primary goal of response transformation is to induce additivity of effects; simplification is achieved by reducing the number of interactions or non-linearities needed. The secondary goal of transformation is to induce normality.

This chapter covers the theory and practice of response transformation in the context of linear Gaussian models, including those with several variance components. This is illustrated by a likelihood plot for two target models applied to the data in Chap. 2. The first target model handles team and saw.id as fixed effects contributing to the mean; the second handles them as additive block factors in the model for covariances. The two versions yield different profile log likelihood plots, but the maxima are almost coincident.

The last section considers response transformation in the quantile-matching setting. If quantile-matching is to be used, is it better to aim for a marginal Gaussian distribution or a marginal distribution in some other family such as Student t? The likelihood theory is developed to allow such comparisons to be made.

Abstract

This chapter offers advice on technical writing, and the inclusion of tables and graphs in reports.

The purpose of a table or graph is to advance the narrative by drawing attention to the most important patterns or features in the data such as the nature and direction of various effects. A graph is helpful for presentation only if it illustrates an important effect clearly. A graph of residuals may be helpful for model checking, and may be mentioned in presentation, but, unless it is explicitly requested in a homework exercise, it is seldom included as part of the report. In most factorial designs, whether balanced or otherwise, one-way and two-way tables of averages are often useful as a partial summary of conclusions. The relevance of the graph to the conclusions must be spelled out in the accompanying narrative.

Computer output: (i) How to adapt computer output for inclusion in the text; (ii) From the computer-generated analysis-of variance table, list only the parts that are relevant to your analysis. It is your job as author, statistician and expert to judge what is relevant and what is not. (iii) The number of decimal digits for regression coefficients, standard errors, F-ratios, p-values; (iv) Labelling of parameters and factor levels in tables; (v) Every parameter has a physical interpretation. Do not pass up the opportunity to remind the reader what the physical interpretation of the logistic coefficient \(\hat \beta = -0.684\) is in the context of the problem. It means that the odds of disease in the treatment group are one half the odds in the control group.

Grammar and word usage: substitute versus replace, use versus usage; verb tense; verbs for statistical and computational activities;

Technical matters: (i) Response transformation must always be considered; (ii) Experimental units versus observational units; it is the number of experimental units that governs the degrees of freedom for treatment-effect contrasts; (iii) Beware the pitfalls of automated model-selection procedures in situations for which they are not designed; e.g. factorial models; (iv) How to report interactions. (v) p-values: best avoided, but if you must report one, be sure to state the null hypothesis being tested.

Abstract

This chapter offers an informal discussion in Q & A format of some of the concepts introduced in Chap. 11.

The discussion begins with the design protocol and the specification of the population. Does the population exist pre-baseline? Is it finite or infinite? Is the sample a fixed subset or a random subset? In the case of a clinical trial, which patients are included in the population? Why should tomorrow’s population be included? Is a sequentially-recruited sample fixed or random? What is the role of finite-population random sampling? What does the population of all plots look like in an agricultural field setting? Do agricultural field trials use random samples?

Why is baseline so important? What happens pre-baseline? What is the distinction between a covariate and a relationship? What is the distinction between a classification factor and a block factor? What is the purpose of a covariate? Is treatment a covariate? Where does exchangeability come in? Is exchangeability a matter of fact or is it a mathematical theorem? Is the randomization distribution always uniform? What is the role of counterfactuals?