Skip to main content

2000 | Buch

Analysis of Multivariate Survival Data

verfasst von: Philip Hougaard

Verlag: Springer New York

Buchreihe : Statistics for Biology and Health

insite
SUCHEN

Über dieses Buch

Survival data or more general time-to-event data occur in many areas, including medicine, biology, engineering, economics, and demography, but previously standard methods have requested that all time variables are univariate and independent. This book extends the field by allowing for multivariate times. Applications where such data appear are survival of twins, survival of married couples and families, time to failure of right and left kidney for diabetic patients, life history data with time to outbreak of disease, complications and death, recurrent episodes of diseases and cross-over studies with time responses. As the field is rather new, the concepts and the possible types of data are described in detail and basic aspects of how dependence can appear in such data is discussed. Four different approaches to the analysis of such data are presented. The multi-state models where a life history is described as the subject moving from state to state is the most classical approach. The Markov models make up an important special case, but it is also described how easily more general models are set up and analyzed. Frailty models, which are random effects models for survival data, made a second approach, extending from the most simple shared frailty models, which are considered in detail, to models with more complicated dependence structures over individuals or over time. Marginal modelling has become a popular approach to evaluate the effect of explanatory factors in the presence of dependence, but without having specified a statistical model for the dependence. Finally, the completely non-parametric approach to bivariate censored survival data is described. This book is aimed at investigators who need to analyze multivariate survival data, but due to its focus on the concepts and the modelling aspects, it is also useful for persons interested in such data, but

Inhaltsverzeichnis

Frontmatter
1. Introduction
Abstract
The overall purpose of this book is to present four approaches to handle multivariate survival data, but before doing that we start by clarifying the concepts both for simple survival data and multivariate survival data.
Philip Hougaard
2. Univariate Survival Data
Abstract
This chapter gives a description of univariate survival data methods. The topic is also described in many other books. Thus, it is not absolutely necessary for persons experienced in survival analysis to read it, but it does contain notation and key results that will be needed later. Furthermore, some aspects are treated in more detail than elsewhere, in order to create a basis for understanding specific points in later chapters.
Philip Hougaard
3. Dependence structures
Abstract
The standard in ordinary multivariate analysis is to have a single parameter governing dependence. Such a parameter describes the degree of dependence. For many purposes, this might also be satisfactory for survival data, but for equally many cases it is useful to consider the dependence in more detail. In particular, with censored data, the assumed model for the dependence can have a major importance for the estimated degree of dependence, because the model for dependence for the period with data will extrapolate into the time period where all individuals are censored.
Philip Hougaard
4. Bivariate dependence measures
Abstract
The previous chapter has considered how dependence is generated. The next problem is to assess or quantify the dependence in a sensible way. In normal distribution models, we are familiar with using the ordinary product moment correlation (Pearson correlation) for measuring the dependence between the various coordinates. This quantity, however, measures only linear dependence, but for survival data, the marginal distributions are not normal and the dependence structure is nonlinear and therefore we need more general measures of dependence. In a parametric model, we may quote some parameter values as expressing dependence, but for more general discussion and for comparing different models, it makes more sense to have a measure not related to a particular model. That is, a measure that is defined on the bivariate distribution rather than tied up to a specific parametrization of a model.
Philip Hougaard
5. Probability aspects of multi-state models
Abstract
Multi-state models are the most commonly used models for describing the development for longitudinal data. A multi-state model is defined as a model for a stochastic process, which at any time point occupies one of a set of discrete states. In medicine, the states can describe conditions like healthy, diseased, diseased with complications, and dead. A change of state is called a transition. This then corresponds to outbreak of disease, occurrence of complications and death. The state structure specifies the states and which transitions from state to state are possible. It is possible to make a figure of the state structure. Some examples of state structures have already been shown in Figures 1.4–1.7. The full statistical model specifies the state structure and the form of the hazard function for each possible transition. This chapter is a description of how multi-state models can be used to model multivariate and multiple survival data. In principle, all kinds of such data can be formulated as multi-state models and this is often convenient for considering predictions. The approach is particularly well suited to event-related dependence. But the approach does have some shortcomings for recurrent events, because it considers states, rather than events (see below). Furthermore, multi-state models consider all data as longitudinal, which make them less useful for repeated measurements and require the censoring pattern for parallel data to be homogeneous.
Philip Hougaard
6. Statistical inference for multi-state models
Abstract
Where the previous chapter has discussed choosing the state structure for multi-state models and the probabilistic consequences (like evaluation of transition probabilities) of assumptions like the Markov assumption, the present chapter goes more deeply into the statistical modeling and analysis of the transition hazards.
Philip Hougaard
7. Shared frailty models
Abstract
The shared frailty model is a specific kind of the common risks model described in Section 3.1.2. The frailty is the term that describes the common risks, acting as a factor on the hazard function. The approach makes sense both for parallel data and recurrent events data. In this chapter, only parallel data are considered. The results are presented in terms of individuals, which have the same risk in some groups. Recurrent events data will be separately considered in Chapter 9. The shared frailty model is relevant to lifetimes of several individuals, similar organs and repeated measurements. It is not generally relevant for the case of different events. It is a mixture model, because in most cases the common risks are assumed random. The mixture term is the frailty and for this the notation Y will be used. The model assumes that all time observations are independent given the values of the frailties. In other words, it is a conditional independence model. The value of Y is constant over time and common to the individuals in the group and thus is responsible for creating dependence. This is the reason for the word shared, although it would be more correct to call the models of this chapter constant shared frailty models. The interpretation of this model is that the between-groups variability (the random variation in Y) leads to different risks for the groups, which then show up as dependence within groups. The approach is a multivariate version of the mixture calculations of Sections 2.2.7 and 2.4.6.
Philip Hougaard
8. Statistical inference for shared frailty models
Abstract
This chapter considers the statistical inference for the shared frailty models described in the previous chapter. A main part of this is estimation procedures. Estimation difficulties have previously limited the applicability of the shared frailty models. There have, however, been a number of suggestions on how to estimate the parameters. Reasons for the many choices are, of course, that some formulas are complicated and that iteration can be time consuming. One basic direction to take is to integrate out the random frailties, but this is not the only possibility. Alternatively, one can use estimation routines where the frailties are included as unobserved random variables, similar to BLUP (best linear unbiased predictor) methods for normal distribution models. For non-parametric hazard functions, there is one parameter per time point with observed events. This can be specifically included in the model, or one can attempt to remove it from the likelihood, an approach that is inspired by the successful way of doing so in the Cox model. Also for the Nelson-Aalen estimate, it is easy to handle the hazard contributions, because there is a separate equation for each term allowing for an explicit solution. It is, unfortunately, not quite as easy in a frailty model; iteration is necessary as all expressions are non-linear and related to each other.
Philip Hougaard
9. Shared frailty models for recurrent events
Abstract
Recurrent events are in several ways more complicated and in other ways more simple to analyze than parallel data of several individuals, and this is why the shared frailty models for such data are described in a separate chapter. This chapter treats the probability model as well as the statistical inference.
Philip Hougaard
10. Multivariate frailty models
Abstract
The shared frailty model described in Chapter 7 is very useful for bivariate data with common risk dependence, but in many cases, we do need extensions. In particular, for truly multivariate data, that is, when there are three or more observations, we need more models with varying degrees of dependence. This general frailty approach can be used to create a random treatment by group interaction, or other models with several sources of variation. Secondly, combining subgroups with different degrees of dependence in a single model, for example, monozygotic and dizygotic twins, is difficult in a shared frailty model. Furthermore, this extension can be an improvement for the consideration of effects of covariates in frailty models.
Philip Hougaard
11. Instantaneous and short-term frailty models
Abstract
Dependence has previously been classified into three time frames: instantaneous, short-term, and long-term (Section 3.2). Almost all the standard models studied until now lead to long-term dependence. Important examples are the Markov multi-state models and the shared frailty models. Only a single standard model, the Marshall-Olkin model of Section 5.5.4, displays instantaneous dependence. No standard models show short-term dependence, even though we have argued that many common subject matter problems possibly or probably display short-term dependence. Therefore, it makes sense to develop models that can be used to discuss whether the dependence is of short-term or long-term time frame. Multi-state models can easily be constructed to model short-term dependence by releasing the Markov assumption, substituting with a Markov extension model as shown in Chapter 5, but they have no interpretation, whereas a frailty model has an interpretation as a random effects model. In particular for frailty models, it is an advantage to have short-term dependence models for checking the fit of the ordinary shared frailty model. The aim of this chapter is to extend the frailty models to describe both instantaneous and short-term dependence.
Philip Hougaard
12. Competing risks models
Abstract
The term competing risks refers to cause of death models. Competing risks have been introduced and illustrated as a multi-state model in Figure 1.7 and discussed previously, in Sections 1.10, 3.3.9, and Chapter 5; but such data present special challenges, and therefore are considered separately in this chapter. It differs from the rest of this book in that only one event is possible for each individual, and in that sense, the concept of dependence seems irrelevant. However, two problems stand out as particularly important for competing risks data, the possibility of classification error and, more important, the desire to study the effect of modifying the hazards for some causes of death. Therefore, dependence is the key problem for such data. We would like to discuss aspects where dependence is important, but it is impossible to estimate the degree of dependence. In fact, it is impossible even to give a specific interpretation of dependence. The problem is caused by competing risks not being a truly multivariate survival problem. This has been discussed for centuries, since Bernoulli considered what the eradication of smallpox would imply for the mean lifetime.
Philip Hougaard
13. Marginal and copula modeling
Abstract
Marginal modeling is a term used for an approach where the effect of explanatory factors is estimated based on considering the marginal distributions. The dependence is not the interesting aspect and is not considered in detail. Afterwards, the variability of the regression coefficient estimators is determined by a procedure that accounts for the dependence between the observations. For parallel data, there are in practice two versions of this general idea. The coordinate-wise (CW) approach considers each marginal separately and estimates the regression coefficients in each marginal. The covariance matrix of these estimates is estimated and used for combining the estimates from the coordinates by means of a weighted average. The estimated covariance matrix is further used to evaluate the variance of the combined estimate. The second version of the approach is the independence working model (IWM) approach, where the estimate is found under the (incorrect) assumption of independence between the coordinates. This yields directly the final estimate of the regression coefficients. The uncertainty of the regression coefficient estimate is evaluated by means of an estimator that accounts for the dependence between the coordinates. This is typically done by a ”sandwich estimator” (see below). The independence working model approach is closely related to the so-called generalized estimating equations. For recurrent events, there is an approach similar in concept to the IWM approach.
Philip Hougaard
14. Multivariate non-parametric estimates
Abstract
It is desirable to have a completely non-parametric estimate of a multivariate survival distribution. This has the advantage of not requiring distributional assumptions, but it does, of course, require some structural assumptions, namely, independence of groups and identical multivariate distributions for the groups. It could serve as a baseline for evaluating the fit of more specific models. In principle, it allows for assessment of the dependence, but it may be inefficient compared to a fully parametric or semi-parametric estimate.
Philip Hougaard
15. Summary
Abstract
This book has several aims. Of course, on the general level it is to present the multivariate survival data and their analysis. In more detail, it is first to guide the statistician into selecting the most appropriate model for the actual subject matter problem and the actual multivariate survival data set. Second, the aim is to present the available approaches so that they can be used when they are relevant. To judge the relevance, it is important to include a description of the advantages and disadvantages of each single approach. Finally, a more subtle aim is to put all the approaches in a common frame and also describe the similarities between the various data types and approaches.
Philip Hougaard
Backmatter
Metadaten
Titel
Analysis of Multivariate Survival Data
verfasst von
Philip Hougaard
Copyright-Jahr
2000
Verlag
Springer New York
Electronic ISBN
978-1-4612-1304-8
Print ISBN
978-1-4612-7087-4
DOI
https://doi.org/10.1007/978-1-4612-1304-8