Model-based confidence bands for survival functions
Introduction
This paper seeks to investigate a new approach of constructing simultaneous confidence bands (SCBs) for survival functions from right censored data. In medical reports it is typical to display the estimate of a survival curve along with pointwise confidence intervals (PCIs), which are two curves one of which connects the upper endpoints and the other connects the lower endpoints of interval estimates obtained at several points. PCIs do not provide confidence statements for the entire survival curve, while SCBs do and hence are more meaningful to report. Recent advances offer the prospect of producing new SCBs for survival curves which provide correct coverage and are more informative than existing ones based on the Kaplan–Meier (KM) estimator. We implement such a new procedure for the one-sample setting.
In the analysis of censored-time-to-event data, SCBs have been developed for the cumulative hazard, survival, and quantile functions, some of them based on empirical likelihood as well (Gillespie and Fisher, 1979, Hall and Wellner, 1980, Efron, 1981, Nair, 1984, Akritas, 1986, Csörgő and Horváth, 1986, Bie et al., 1987, Hollander and Pena, 1989, Lin et al., 1994, Ziegler, 1995, Gulati and Padgett, 1996, Li et al., 1996, Hollander et al., 1997, Li and van Keilegom, 2002, McKeague and Zhao, 2006); or for the difference/ratio of two survival functions (Parzen et al., 1997, Zhang and Klein, 2001, McKeague and Zhao, 2002). In the one-sample setting of the random censorship model, see for example Kalbfleisch and Prentice (2002), the data are n independent and identically distributed copies of , where , T is the failure time of interest, C is an independent censoring variable, both assumed continuous, and is the censoring indicator. Significantly, however, all existing methods for the one-sample setting employ the KM estimator to develop SCBs for the survival function . Improved SCBs can be obtained through an alternative approach, which has not yet been investigated and implemented, and which will be our focus here.
The alternative approach that we implement recognizes that one or more good-fitting models for m(t), the conditional expectation of given Z=t, may be utilized for improved SCB construction for S(t). Indeed, choices such as the logistic, the probit, the complementary log–log, and the generalized proportional hazards (GPH, henceforth), among others, may be explored carefully for zeroing-in on an apt model for m(t); see, for example, Cox and Snell (1989) or Collett (2002). Semiparametric random censorship models (SRCMs) exploit this facility to replace each observed with a model-based estimate of its conditional expectation given Z, so that the censoring indicator figures in subsequent analysis only through its “surrogate”, the estimated m(t). Thus, SRCMs derive their rationale from their ability to gainfully utilize parametric ideas within the random censorship environment. In fact, when the model for m(t) is correctly chosen, the resulting SRCM-based estimator of S(t) is asymptotically more efficient than the KM estimator (Dikta, 1998), and so we may expect this efficiency to reflect in improved SRCM-based SCBs for S(t). Unlike for the random censorship model, the SRCM approach is flexible enough to include missing censoring indicators (Subramanian, 2004) as well with no additional effort; see also our concluding discussion section.
The main issue in constructing one-sample SCBs for S(t) lies in the specification of critical values. It is well known that the scaled KM process converges weakly to a time-transformed Brownian bridge (Hall and Wellner, 1980, Akritas, 1986, Fleming and Harrington, 2005), using which the percentiles of its supremum can be obtained from tables calculated for this purpose. This is not possible with the SRCM approach, however, because the limiting distribution of the normalized SRCM-specific cumulative hazard estimator does not have independent increments. One possible approach to surmount this problem would be to utilize a representation for the limit of the normalized SRCM-specific cumulative hazard process, to produce an approximation whose distribution one may generate using simulation. We, however, employ a different strategy in that we propose and implement a novel two-stage resampling scheme that is specifically tailored to SRCMs and which we show produces asymptotically correct critical values for the supremum statistic, leading to improved SCBs for S(t).
Efron's (1981) censored-data bootstrap, which obtains resamples by drawing at random and with replacement from , yields asymptotically correct SCBs for S(t) (Akritas, 1986, Horváth and Yandell, 1987). This bootstrap is equivalent to drawing samples from the KM estimators of the failure time and censoring distributions (Akritas, 1986). Sun et al. (2001) used the same approach to derive SCBs for quantile functions. This approach, however, would be unsatisfactory for obtaining SRCM-based SCBs, which calls for a resampling mechanism that takes into account information available in the form of a model for m(t). Our two-stage resampling plan combines classical bootstrap with model-based regeneration of censoring indicators. Lu and Tsiatis (2001), Tsiatis et al. (2002), Dikta et al. (2006), and Subramanian, 2009, Subramanian, 2011 have employed regeneration of binary responses in multiple imputations estimation and model checking.
Let be such that , where H denotes the distribution function of Z. The range-respecting SCBs that we propose are based on the asymptotic validity of our bootstrap, which means proving that for almost all samples the suprema over of bootstrap processes, from which critical values will be calibrated, have the same limit distribution as the basic ones they are intended to approximate. The method of proof involves first deriving functional central limit theorems for the bootstrap versions of basic SRCM-specific estimators and later invoking Gill and Johansen's (1990) functional delta method to prove via a series of compactly differentiable mappings the desired asymptotic validity. For different choices of a weight function we are then able to obtain the approximating critical values for constructing the SCBs for S(t) on any interval , see Section 2.3.
The paper is organized as follows. Section 2 deals with SRCM-based SCBs for the survival function. We give a brief review of the semiparametric approach to survival function estimation for right censored data in Section 2.1. The proposed two-stage resampling procedure and its asymptotic validity are presented in Section 2.2. The new SCBs are introduced in Section 2.3. Section 3 deals with numerical studies. Simulation results with proper specifications are reported in Section 3.1, while the results of misspecification studies are detailed in Section 3.2. The proposed SCBs are illustrated using data from medical studies in Section 3.3. A concluding discussion is given in Section 4. Proofs of theorems are placed in the Appendix.
Section snippets
Methodology
We write to denote the empirical estimator of H(t). We denote the cumulative hazard function of T by . Also, will indicate the transpose of a vector , and .
Numerical studies
Our simulation studies focus on two cases one of which is when the model for m(t) is specified correctly and the second pertains to performance in the face of misspecification. We perform comparisons with competing bands through measures such as the empirical coverage probability (ECP), the estimated average enclosed area (EAEA), and the estimated average width (EAW). The ECP is the proportion of SCBs that include S(t) for all . On the interval , the EAEA and EAW are defined
Concluding discussion
The rationale underlying any model-based procedure is that parametric specifications, when employed correctly, invariably lead to more efficient estimation and inference. For the one-sample censored data setting, a model-based approach of constructing SCBs for survival curves, introduced and implemented in this paper, is shown to provide a viable and better alternative to the existing approaches based on the KM estimator. This is evidenced by our numerical studies, which indicate that the new
Acknowledgments
The authors thank two reviewers for a number of comments and suggestions, which led to improvements in the paper.
References (39)
On semiparametric random censorship models
Journal of Statistical Planning and Inference
(1998)- et al.
Simultaneous confidence bands for ratios of survival functions via empirical likelihood
Statistics & Probability Letters
(2002) - et al.
Width-scaled confidence bands for survival functions
Statistics & Probability Letters
(2006) The Multiple imputations based Kaplan–Meier estimator
Statistics & Probability Letters
(2009)Multiple imputations and the missing censoring indicator model
Journal of Multivariate Analysis
(2011)- et al.
Inverse censoring weighted median regression
Statistical Methodology
(2009) Bootstrapping the Kaplan–Meier estimator
Journal of the American Statistical Association
(1986)- et al.
Statistical Models based on Counting Processes
(1993) - et al.
Confidence intervals and confidence bands for the cumulative hazard rate function and their small sample properties
Scandinavian Journal of Statistics
(1987) Convergence of Probability Measures
(1968)
Analysis of Binary Data
Confidence bands from censored samples
Canadian Journal of Statistics
Modelling Binary Data
Bootstrap approximations in model checks for binary data
Journal of the American Statistical Association
Censored data and the bootstrap
Journal of the American Statistical Association
Counting Processes and Survival Analysis
Simple nonparametric tests for a known standard survival based on censored data
Communications in Statistics—Theory and Methods
A survey of product-integration with a view toward application in survival analysis
Annals of Statistics
Confidence band for the Kaplan–Meier survival curve estimate
Annals of Statistics
Cited by (11)
Bootstrap likelihood ratio confidence bands for survival functions under random censorship and its semiparametric extension
2016, Journal of Multivariate AnalysisSemi-parametric survival function estimators deduced from an identifying Volterra type integral equation
2016, Journal of Multivariate AnalysisTwo-sample location-scale estimation from semiparametric random censorship models
2014, Journal of Multivariate AnalysisAsymptotically efficient estimation under semi-parametric random censorship models
2014, Journal of Multivariate AnalysisInvestigating non-inferiority or equivalence in time-to-event data under non-proportional hazards
2023, Lifetime Data Analysis
- 1
Both are first authors who contributed equally to the work.