Skip to main content

2008 | Buch

Survival and Event History Analysis

A Process Point of View

verfasst von: Odd O. Aalen, Ørnulf Borgan, Håkon K. Gjessing

Verlag: Springer New York

Buchreihe : Statistics for Biology and Health

insite
SUCHEN

Über dieses Buch

The aim of this book is to bridge the gap between standard textbook models and a range of models where the dynamic structure of the data manifests itself fully. The common denominator of such models is stochastic processes. The authors show how counting processes, martingales, and stochastic integrals fit very nicely with censored data. Beginning with standard analyses such as Kaplan-Meier plots and Cox regression, the presentation progresses to the additive hazard model and recurrent event data. Stochastic processes are also used as natural models for individual frailty; they allow sensible interpretations of a number of surprising artifacts seen in population data.

The stochastic process framework is naturally connected to causality. The authors show how dynamic path analyses can incorporate many modern causality ideas in a framework that takes the time aspect seriously.

To make the material accessible to the reader, a large number of practical examples, mainly from medicine, are developed in detail. Stochastic processes are introduced in an intuitive and non-technical manner. The book is aimed at investigators who use event history methods and want a better understanding of the statistical concepts. It is suitable as a textbook for graduate courses in statistics and biostatistics.

Inhaltsverzeichnis

Frontmatter
1. An introduction to survival and event history analysis
This book is about survival and event history analysis. This is a statistical methodology used in many different settings where one is interested in the occurrence of events. By events we mean occurrences in the lives of individuals that are of interest in scientific studies in medicine, demography, biology, sociology, econometrics, etc. Examples of such events are: death, myocardial infarction, falling in love, wedding, divorce, birth of a child, getting the first tooth, graduation from school, cancer diagnosis, falling asleep, and waking up. All of these may be subject to scientific interest where one tries to understand their cause or establish risk factors. In classical survival analysis one focuses on a single event for each individual, describing the occurrence of the event by means of survival curves and hazard rates and analyzing the dependence on covariates by means of regression models.
The purpose of this introductory chapter is twofold. Our main purpose is to introduce the reader to some basic concepts and ideas in survival and event history analysis. But we will also take the opportunity to indicate what lies ahead in the remaining chapters of the book. In Section 1.1 we first consider some aspects of classical survival analysis where the focus is on the time to a single event. Sometimes the event in question may occur more than once for an individual, or more than one type of event is of interest. In Section 1.2 such event history data are considered, and we discuss some methodological issues they give rise to, while we in Section 1.3 briefly discuss why survival analysis methods may be useful also for data that do not involve time. When events occur, a natural approach for a statistician would be to count them. In fact, counting processes, a special kind of stochastic process, play a major role in this book, and in Section 1.4 we provide a brief introduction to counting processes and their associated intensity processes and martingales. Finally, in Section 1.5, we give an overview of some modeling issues for event history data.
2. Stochastic processes in event history analysis
Event histories unfold in time. Therefore, one would expect that tools from the theory of stochastic processes would be of considerable use in event history analysis. This is indeed the case, and in the present chapter we will review some basic concepts and results for stochastic processes that will be used in later chapters of the book.
Event histories consist of discrete events occurring over time in a number of individuals. One can think of events as being counted as they happen. Therefore, as indicated in Section 1.4, counting processes constitute a natural framework for analyzing survival and event history data. We shall in this chapter develop this idea further, and in particular elaborate the fundamental martingale concept that makes counting processes such an elegant tool. In this book the focus is on models in continuous time. However, as some concepts and results for martingales and other stochastic processes are more easily understood in discrete time, we first, in Section 2.1, consider the time-discrete case. Then, in Section 2.2, we discuss how the concepts and results carry over to continuous time. To keep the presentation fairly simple, we restrict attention to univariate counting processes and martingales in this chapter. Extensions to the multivariate case are summarized in Appendix B.
3. Nonparametric analysis of survival and event history data
In this chapter we study situations in which the data may be summarized by a counting process registering the occurrences of a specific event of interest or a few counting processes registering the occurrences of a few such events. One important example is the survival data situation, where one is interested in the time to a single event for each individual and the counting process is counting the number of occurrences of this event for a group of individuals. The event in question will depend on the study at hand; it may be the death of a laboratory animal, the relapse of a cancer patient, or the birth of a woman’s second child. In order to emphasize that events other than death are often of interest in the survival data situation, we will use the term event of interest to denote the event under study. However, as mentioned in Section 1.1, we will use the terms survival time and survival function also when the event of interest is something different from death.
4. Regression models
In the previous chapter we considered situations where the data may be summarized into one or a few counting processes registering the occurrences of an event of interest. Such situations occur when the population in question is grouped into a few subpopulations according to the value of one or two categorical covariates. However, usually there are more than two covariates of interest in a study, and some of them may be numeric. Then, as in almost all parts of statistics, grouping is no longer a useful option, and regression models are called for.
For ease of exposition, in the main body of the chapter we will assume that we are interested in only one type of event for each individual. However, most results carry over with only minor modifications to the situation where more than one type of event is of interest (e.g., deaths due to different causes), and this situation is considered briefly in Section 4.2.9.
5. Parametric counting process models
In biostatistics it has become a tradition to use non- and semiparametric methods, like those considered in the previous two chapters, to analyze censored survival data, while parametric methods are more common in reliability studies of failure times of technical equipment. In our opinion biostatistics would gain from the use of a wider range of statistical methods, including parametric methods, than is the current practice. In this chapter, we discuss the basic modeling and inferential issues for parametric counting process models. More advanced models and methods are discussed in Chapters 6, 7, 10, and 11.
We focus on likelihood inference in this chapter. In Section 5.1 we derive the likelihood for parametric counting process models, review the basic properties of the maximum likelihood estimator, and give some simple examples of parametric inference. Parametric regression models are considered in Section 5.2, with a focus on the so-called Poisson regression model. In Section 5.3, we give an outline of the derivations of the large sample properties of the maximum likelihood estimator.
6. Unobserved heterogeneity: The odd effects of frailty
Individuals differ. This is a basic observation of life and also of statistics. Some die old and some die young. Some are tall and some are small. In medicine one will find that a treatment that is useful for one person may be much less so for another person. And one person’s risk factor may be less risky for another one. This variation is found everywhere in biology and other fields, like in reliability studies of technical devices. Even between genetically inbred laboratory animals one observes a considerable variation.
One aim of a statistical analysis may be precisely to understand the factors determining such variation. For instance one might perform a regression analysis with some covariates. However, it is a general observation that such analyses always leave an unexplained rest. There is some variation that cannot be explained by observable covariates, and sometimes this remaining variation may be large and important. Traditionally, this is considered an error variance, something that creates uncertainty but can otherwise be handled in the statistical analysis, and usually one would not worry too much about it. However, in some cases there are reasons to worry.
7. Multivariate frailty models
Data in survival analysis are usually assumed to be univariate, with one, possibly censored, lifetime for each individual. All the standard methodology, including Kaplan-Meier plots and Cox analysis, is geared toward handling this situation. However, multivariate survival data also arise naturally in many contexts. Such data pose a problem for ordinary multivariate methods, which will have difficulty handling censored data.
There are two typical ways multivariate survival data can arise. One, which may be termed the recurrent events situation, is when several successive events of the same type are registered for each individual, for instance, repeated occurrences of ear infections. The other, which may be termed the clustered survival data situation, is when several units that may fail are collected in a cluster. Examples of the clustered survival data situation may be the possible failure of several dental fillings for an individual or the lifetimes of twins. The cluster structure may in fact be rather complex, including, for instance, related individuals in family groups. Sometimes one would assume common distributions for the individual units in a cluster; in other cases, like when considering the lifetimes of father and son, the distributions may be different.
8. Marginal and dynamic models for recurrent events and clustered survival data
We shall consider observation of clustered survival data or processes with recurrent events as defined in the introduction to Chapter 7. In the case of recurrent events we focus on the concept of dynamic models, which represent an attempt to understand in explicit terms how the past influences the present and the future. We may think of this as causal influences, but statistical dependence on the past may also be a reflection of heterogeneity. Instead of setting up a random effects, or frailty, model, one may alternatively condition with respect to past events to get a counting process model with a suitable intensity process. Frailty will induce dependence, such that, for example, the rate of a new event is increased if many events have been observed previously for this individual, since this would indicate a high frailty. The formulas (7.16) and (7.17) can be seen as expressions transforming the frailty structure into conditional representations given the past.
The existence of dynamic models follows from a general theorem for semimartingales, namely the Doob-Meyer decomposition, which states, essentially, that any semimartingale can be decomposed into a martingale and a compensator (Section 2.2.3). The martingale represents the “noise” or unpredictable changes, while the compensator represents the influence of the past history. A counting process is a submartingale, and hence a semimartingale, and the compensator is just the cumulative intensity process.
9. Causality
One of the exciting new developments in the field of statistics is a renewal of interest in the causality concept. This has led to several new approaches to defining and studying causality in statistical terms. Causality is based on a notion of the past influencing the present and the future. This has very natural links to the type of stochastic processes considered in this book, and it is therefore appropriate to incorporate material on causality.
We shall start with a discussion of causality concepts seen from a statistical point of view. We then continue with various models where causal thinking is applied, ranging from Granger-Schweder causality to counterfactual causality.
10. First passage time models: Understanding the shape of the hazard rate
In Chapter 6 we saw how the individual variability in hazard can be described in terms of a random frailty variable. For instance, frailty may enter the hazard in a multiplicative fashion, with the individual hazard h described as h(t) = Zα(t). Here, α is a “basic” rate and Z is a nonnegative random variable taking distinct values for each individual. This may be an appropriate way to account for missing (time-independent) covariates such as those genetic and other effects that may remain constant over time. A limitation of the standard frailty approach, however, is that Z is fixed at time zero. Once the level of frailty has been set for an individual, it is retained for the rest of the lifespan. Individuals may be endowed with a set of time-dependent (external) covariates encompassing changes in their environment, but conditional on the covariates the deviation of the individual hazard from the population baseline is completely determined. Thus, knowing the hazard ratio at one instant will completely determine the hazard ratio for the future. However, it is reasonable to believe that for each individual there will be a developing random process influencing the individual hazard and leading up to an event. This fact is usually ignored in the standard models, mostly because the process itself is usually poorly understood and may in fact not be observed at all. However, this does not imply that it should be ignored. A consideration of the underlying process, even in a speculative way, may improve our understanding of the hazard rate.
11. Diffusion and Lévy process models for dynamic frailty
In Chapter 10, we demonstrated how the concept of dynamically developing individual risk could be used to produce frailty-like effects on the population level. As soon as the risk process of an individual hits a barrier, that individual has had an event, leaving behind only those with a risk process not yet at the level of the barrier. The barrier hitting models can be seen as a way of relaxing the assumption of time-constant frailty found in the usual frailty models (Chapter 6). Another way of relaxing the constancy assumption is to allow dynamic covariates to capture a part of the variation experienced by an individual, as demonstrated in Chapter 8.
Backmatter
Metadaten
Titel
Survival and Event History Analysis
verfasst von
Odd O. Aalen
Ørnulf Borgan
Håkon K. Gjessing
Copyright-Jahr
2008
Verlag
Springer New York
Electronic ISBN
978-0-387-68560-1
Print ISBN
978-0-387-20287-7
DOI
https://doi.org/10.1007/978-0-387-68560-1