2021 | Book

# Applying Quantitative Bias Analysis to Epidemiologic Data

Authors: Matthew P. Fox, Richard F. MacLehose, Timothy L. Lash

Publisher: Springer International Publishing

Book Series : Statistics for Biology and Health

2021 | Book

Authors: Matthew P. Fox, Richard F. MacLehose, Timothy L. Lash

Publisher: Springer International Publishing

Book Series : Statistics for Biology and Health

This textbook and guide focuses on methodologies for bias analysis in epidemiology and public health, not only providing updates to the first edition but also further developing methods and adding new advanced methods.

As computational power available to analysts has improved and epidemiologic problems have become more advanced, missing data, Bayes, and empirical methods have become more commonly used. This new edition features updated examples throughout and adds coverage addressing:

Measurement error pertaining to continuous and polytomous variables Methods surrounding person-time (rate) data Bias analysis using missing data, empirical (likelihood), and Bayes methods

A unique feature of this revision is its section on best practices for implementing, presenting, and interpreting bias analyses. Pedagogically, the text guides students and professionals through the planning stages of bias analysis, including the design of validation studies and the collection of validity data from other sources. Three chapters present methods for corrections to address selection bias, uncontrolled confounding, and measurement errors, and subsequent sections extend these methods to probabilistic bias analysis, missing data methods, likelihood-based approaches, Bayesian methods, and best practices.

Advertisement

Abstract

Epidemiologic investigations have a direct impact on all aspects of health. Studies of social, environmental, behavioral, medical, and molecular factors associated with the incidence of disease or disease outcomes lead to interventions aimed at preventing the disease or improving its outcomes. However, biases are commonplace in epidemiologic research. Even the most carefully designed and conscientiously conducted study will be susceptible to some sources of systematic error. Study participants may not answer survey questions correctly (information bias), there may be confounders that were uncontrolled by study design or left unmeasured (uncontrolled confounding), or participants may have been enrolled differentially or incompletely followed (selection bias). Every epidemiologic study has some amount of information bias, selection bias and uncontrolled confounding. In interpreting epidemiologic data, then, scientists are often confronted with the question of how to incorporate quantitative consideration of these biases into results and inferences. The objective of this text is to provide the knowledge, skills and philosophy needed to routinely implement quantitative bias analysis. The objective of this text is to reduce the barriers to routine implementation of quantitative bias analysis.

Abstract

Estimates of association from nonrandomized epidemiologic studies are susceptible to two types of error: random error and systematic error. Random error (sometimes thought of as sampling error) is often called chance, and decreases toward zero as the sample size increases. The amount of random error in an estimate of association is measured by its precision, which is the inverse of the variance. Systematic error, often called bias, does not necessarily decrease toward zero as the sample size increases. That is, while random error can be dealt with by collecting more data, systematic error cannot be overcome in the same manner. Just as precision is the reduction of random error, we define validity as the reduction of systematic error. Reducing systematic error can be accomplished before the study has been conducted through design strategies or after the study has been conducted, through regression techniques or through quantitative bias analysis techniques.

Abstract

All bias analyses adjust a conventional estimate of effect to account for bias introduced by systematic error. These quantitative modifications combine the data used for the conventional estimate of association (e.g., a risk difference or a rate ratio) with equations that adjust the conventional estimate for the expected impact of the systematic error. The equations comprise the bias model, which is the model that links the conventional estimate with the bias-adjusted estimate. These equations have parameters, called bias parameters, that ultimately determine the direction and magnitude of the bias-adjustment. For example:

Abstract

Selection bias arises when—in a study population—an estimate of disease occurrence, or an estimate of the effect of an exposure contrast on disease occurrence, differs from the estimate that would have been obtained in the study population’s source population because of the way the study population was selected, either by design or analytic choice. The study population is the roster of persons, and their observed person-time, that are included in the analysis of the data that yielded the estimate. The source population is the roster of persons, and their observed person-time, eligible to be included in the analysis of the data that yielded the estimate. The difference between the source population and the study population—in the roster of included persons, in the observed person-time, or both—is what accounts for the selection bias. Selection bias is a systematic error and sometimes called a threat to internal validity. Selection bias can thus arise when participants enroll in a study, which is called “differential baseline participation,” and it can also arise when participants withdraw from the study, which is called “differential loss-to-follow-up.” Selection bias can occur because of study design choices or analysis choices. We use the word “differential” to denote that the baseline participation proportions or proportions lost to follow-up must have certain dependencies to induce a bias. The motivation for bias analysis to address selection bias follows directly from its conceptualization. One wishes to adjust an estimate of disease frequency or an estimate of association, measured only among initial or ongoing participants, to account for the bias introduced by conditioning on participation, when participation is affected by selection forces.

Abstract

Confounding occurs when the disease experience in a reference population is not exchangeable with the counterfactual disease experience for which it is intended to substitute [1, 2]. For example, if one is interested in the effect of a dichotomous exposure on the disease experience in the exposed group, then an unbiased estimate of that effect requires that the reference (unexposed) population’s factual disease experience must be exchangeable with the disease experience the exposed group would have had, had they been unexposed (counter to the fact). A sufficient set of confounders explains this lack of exchangeability; the effect of the exposure of interest mixes with the effects of these confounding variables, which must also be ancestors of the exposure [3]. According to causal graph theory, confounding occurs when the exposure and disease share common causal ancestors [4]. Understanding and controlling confounding in epidemiologic research, either in the design or analysis of a study, is central to obtaining an unbiased estimate of the effect of an exposure on the risk of an outcome. As students in introductory epidemiologic methods courses are taught, it is imperative to control confounding to assess causation from observational data because it can make an association appear greater or smaller than the true underlying effect and can even reverse its apparent direction. Confounding can also make a null effect (i.e., no causal relation between the exposure and the disease) appear as either causal or preventive.

Abstract

The accurate measurement of exposure, disease occurrence, and relevant covariates is generally necessary to estimate causal relations between exposures and outcomes. However, in all epidemiologic research, there exists the opportunity for measurement errors. When the variables being measured are categorical, these errors are referred to as misclassification. Regardless of the source of misclassification, when analyses divide study participants into categories pertaining to that covariate (e.g, exposed vs unexposed, diseased vs undiseased), some respondents will be classified in the wrong category, which can bias results. The impact of misclassification errors in epidemiologic studies is rarely explicitly addressed. Often, analysts have relied on beliefs that certain types of misclassification can bias results in known directions. We address the shortcomings of these beliefs and first present an introduction to misclassification as well as simple methods to adjust aggregate data for specified amounts of misclassification. When an analysis is susceptible to more than one set of classification errors, such as misclassification of both exposure and disease, the bias-adjustments below can be performed sequentially, so long as the classification errors are independent. When the classification errors are dependent, matrix methods described at the end of this Chapter or other bias modeling methods, such as Bayesian methods in Chapter 11 , are required.

Abstract

To this point, we have assigned only a single set of fixed values to the bias parameters of bias models (i.e., simple bias analysis, see Chapters 4, 5, and 6) or combinations of fixed values (i.e., multidimensional bias analysis, also in Chapters 4, 5, and 6). Simple bias analysis is an improvement over conventional analyses, which implicitly assume that all the bias parameters are fixed at values that imply that there is no bias. However, simple bias analysis is limited by its assumption that the values of the bias parameters are known without error, which is never a valid assumption (1). Even when internal validation studies are conducted, the resulting bias parameters are measured with random error in a subsample, and often have their own sources of systematic error (see Chapter 3). Multidimensional bias analysis improves on simple bias analysis by examining the impact of more than one set of values for the bias parameters, but even this approach only examines the bias conferred by a limited set of values for the bias parameters. For any analysis, many other possible combinations of values are plausible, and a multidimensional analysis will not describe the impact of these possibilities. More important, multidimensional analysis gives no sense of which bias-adjusted estimate of association is the most likely under the assumed bias model, which can make interpretation of the results challenging. Probabilistic bias analysis extends the bias analyses presented in previous chapters by assuming bias parameters follow a distribution. The distribution of the bias parameters, which represents uncertainty in the exact value of the bias parameter, can then incorporated in the bias analysis.

Abstract

Summary level probabilistic bias analysis uses aggregate data, such as a 2 × 2 table describing the cell counts and cross-tabulating an exposure and an outcome, or the number of observed cases divided by a census estimate of the number at risk describing disease surveillance. Summary level probabilistic bias analysis applies many of the same equations we introduced in Chapters 4, 5, and 6. The main difference between simple and probabilistic bias analysis is that for probabilistic bias analysis, instead of assuming a fixed value (or a set of fixed values) to assign to the bias parameters, we specify probability distributions for the bias parameters. The distributions are meant to represent our uncertainty about the values assigned to the bias parameters. We then use Monte Carlo simulation methods to estimate the bias-adjusted estimates and simulation intervals that represent the uncertainty about the estimate deriving from the uncertainty about the values to assign to the bias parameters.

Abstract

The probabilistic bias analyses shown in the previous chapter were conducted using summarized data. As noted, a major limitation of using the summarized data approach is that it is difficult to adjust for other measured confounders for which adjustment may have been made in the conventional analysis. However, probabilistic bias analysis can be conducted using record-level data (record-level bias-adjustment), which retains information on other covariates and allows for multiple adjustments to be made in the final analysis. The methods in this chapter follow the same logic and outline described in Chapter 8, with some changes for the record level scenario, so we strongly suggest working through that chapter before this one even if record level correction is the intended approach.

Abstract

Earlier chapters in this text presented methods for simple bias-adjustments (Chapters 4, 5 and 6). These methods yielded bias-adjusted point estimates, but without any accompanying quantitative description of the uncertainty accompanying that estimate. We showed how the impact of uncertainty in the bias parameter could be described by assigning different values or sets of values to the bias parameters of the model. These were presented in a multidimensional or tabular bias analysis. Although this multidimensional bias analysis approach is useful to understand what value the adjusted effect estimate would take under various bias parameter values, it provides no single summary effect estimate and no sense of the relative likelihood of the various bias-adjusted estimates. Each estimate appears to be equally likely. Probabilistic bias analysis (Chapters 7, 8 and 9) uses simulation methods to model the uncertainty in the values assigned to the bias parameters, along with sampling error, and yields a point estimate of the central tendency and a sense of the relative frequency of different bias-adjusted estimates. These relative frequencies can be summarized in a simulation interval. In this chapter we present two general and related classes of methods to implement bias analysis: direct bias modeling and missing data methods. Both of these also yield point estimates and an accompanying uncertainty interval.

Abstract

Chapter 2 introduced Bayesian inference and alluded to the close relationship between probabilistic bias analysis and Bayesian methods. The probabilistic bias analysis methods that were discussed in Chapters 7–9 can be considered approximately Bayesian methods. That is, inferential results obtained from a probabilistic bias analysis that adjusts for a bias would typically be very similar, or even indistinguishable, from inferential results obtained from a corresponding Bayesian analysis [1, 2]. If results are typically so similar, it is natural to wonder when analysts might want to use formal Bayesian inference for quantitative bias analysis. In many cases the results from both approaches will be similar. In some settings, however, a formal Bayesian inferential approach may result in bias-adjusted estimates that are quite different from the probabilistic bias analysis approaches that we have discussed so far. Although there has not been a formal theoretical examination of exactly when the two approaches differ to an appreciable extent, there is some literature we can use to identify when this might occur. We highlight three common situations when one might want to use Bayesian inference.

Abstract

Most nonrandomized epidemiologic studies, and even some randomized studies [1, 2], are susceptible to more than one threat to validity (i.e., multiple biases). Bias analysis applied to these studies requires a strategy to address each important threat to yield a reasonable estimate of the total bias affecting the study and the impact that it has on the magnitude and direction of the estimate of effect. The methods described in earlier chapters can be applied separately. to adjust for one source of bias at a time; that is, a bias analysis would be conducted separately for misclassification and then for confounding and then for selection bias. Alternatively, they can be applied serially one after the other (with some important caveats on how this is done) to simultaneously quantify the biases and their associated uncertainties. Either type of adjustment, one at a time or serially, can be conducted using simple bias analysis techniques that do not produce simulation intervals or using probabilistic bias analysis techniques that do provide such intervals. In this chapter we will discuss serial adjustment using simple techniques and serial adjustment using probabilistic techniques.

Abstract

Despite the extensive literature on Quantitative Bias Analysis methods, and despite the fact that the foundational methods for most of the material in this textbook were worked out in the 1950s and 1960s, there is still limited guidance on best practices for performing bias analysis. As we have noted in earlier chapters, good practices for quantitative bias analysis follow good practices for study design and data analysis and therefore should not be new to the reader. For example, good research practices and good bias analysis practices both include: (a) development of a protocol to guide the work, (b) documentation of revisions to the protocol that are made once the work is underway, along with reasons for these revisions, (c) detailed description of the data used, (d) a complete description of all analytic methods used and their results, along with reasons for emphasizing particular results for presentation, and (e) discussion of underlying assumptions and limitations of the methods used. Good practices in presentation include (c)–(e) along with (f) description of possible explanations for the results, and (g) prudent and circumspect inference beyond the study, integrated with prior knowledge on the topic at hand.

Abstract

Throughout the text we have illustrated methods to present the results of bias analysis and explained the inferences that might derive from those results. This chapter will briefly describe the overarching considerations we recommend for presentation and inference, and the reader should refer to specific examples throughout the text for the detailed implementations of these principles.