Model-based confidence bands for survival functions

https://doi.org/10.1016/j.jspi.2013.01.012Get rights and content

Abstract

This paper focuses on a novel method of developing one-sample confidence bands for survival functions from right censored data. The approach is model-based, relying on a parametric model for the conditional expectation of the censoring indicator given the observed minimum, and derives its strength from easy access to a good-fitting model among a plethora of choices available for binary response data. The substantive methodological contribution is in exploiting a semiparametric estimator of the survival function to produce improved simultaneous confidence bands. To obtain critical values for computing the confidence bands, a two-stage bootstrap approach that combines the classical bootstrap with the more recent model-based regeneration of censoring indicators is proposed and a justification of its asymptotic validity is also provided. Several different confidence bands are studied using the proposed approach. Numerical studies, including robustness of the proposed bands to misspecification, are carried out to check efficacy. The method is illustrated using two lung cancer data sets.

Introduction

This paper seeks to investigate a new approach of constructing simultaneous confidence bands (SCBs) for survival functions from right censored data. In medical reports it is typical to display the estimate of a survival curve along with pointwise confidence intervals (PCIs), which are two curves one of which connects the upper endpoints and the other connects the lower endpoints of interval estimates obtained at several points. PCIs do not provide confidence statements for the entire survival curve, while SCBs do and hence are more meaningful to report. Recent advances offer the prospect of producing new SCBs for survival curves which provide correct coverage and are more informative than existing ones based on the Kaplan–Meier (KM) estimator. We implement such a new procedure for the one-sample setting.

In the analysis of censored-time-to-event data, SCBs have been developed for the cumulative hazard, survival, and quantile functions, some of them based on empirical likelihood as well (Gillespie and Fisher, 1979, Hall and Wellner, 1980, Efron, 1981, Nair, 1984, Akritas, 1986, Csörgő and Horváth, 1986, Bie et al., 1987, Hollander and Pena, 1989, Lin et al., 1994, Ziegler, 1995, Gulati and Padgett, 1996, Li et al., 1996, Hollander et al., 1997, Li and van Keilegom, 2002, McKeague and Zhao, 2006); or for the difference/ratio of two survival functions (Parzen et al., 1997, Zhang and Klein, 2001, McKeague and Zhao, 2002). In the one-sample setting of the random censorship model, see for example Kalbfleisch and Prentice (2002), the data are n independent and identically distributed copies of (Z,δ), where Z=min(T,C), T is the failure time of interest, C is an independent censoring variable, both assumed continuous, and δ=I(TC) is the censoring indicator. Significantly, however, all existing methods for the one-sample setting employ the KM estimator to develop SCBs for the survival function S(t)=P(T>t). Improved SCBs can be obtained through an alternative approach, which has not yet been investigated and implemented, and which will be our focus here.

The alternative approach that we implement recognizes that one or more good-fitting models for m(t), the conditional expectation of δ given Z=t, may be utilized for improved SCB construction for S(t). Indeed, choices such as the logistic, the probit, the complementary log–log, and the generalized proportional hazards (GPH, henceforth), among others, may be explored carefully for zeroing-in on an apt model for m(t); see, for example, Cox and Snell (1989) or Collett (2002). Semiparametric random censorship models (SRCMs) exploit this facility to replace each observed δ with a model-based estimate of its conditional expectation given Z, so that the censoring indicator δ figures in subsequent analysis only through its “surrogate”, the estimated m(t). Thus, SRCMs derive their rationale from their ability to gainfully utilize parametric ideas within the random censorship environment. In fact, when the model for m(t) is correctly chosen, the resulting SRCM-based estimator of S(t) is asymptotically more efficient than the KM estimator (Dikta, 1998), and so we may expect this efficiency to reflect in improved SRCM-based SCBs for S(t). Unlike for the random censorship model, the SRCM approach is flexible enough to include missing censoring indicators (Subramanian, 2004) as well with no additional effort; see also our concluding discussion section.

The main issue in constructing one-sample SCBs for S(t) lies in the specification of critical values. It is well known that the scaled KM process converges weakly to a time-transformed Brownian bridge (Hall and Wellner, 1980, Akritas, 1986, Fleming and Harrington, 2005), using which the percentiles of its supremum can be obtained from tables calculated for this purpose. This is not possible with the SRCM approach, however, because the limiting distribution of the normalized SRCM-specific cumulative hazard estimator does not have independent increments. One possible approach to surmount this problem would be to utilize a representation for the limit of the normalized SRCM-specific cumulative hazard process, to produce an approximation whose distribution one may generate using simulation. We, however, employ a different strategy in that we propose and implement a novel two-stage resampling scheme that is specifically tailored to SRCMs and which we show produces asymptotically correct critical values for the supremum statistic, leading to improved SCBs for S(t).

Efron's (1981) censored-data bootstrap, which obtains resamples by drawing at random and with replacement from {(Zi,δi),i=1,,n}, yields asymptotically correct SCBs for S(t) (Akritas, 1986, Horváth and Yandell, 1987). This bootstrap is equivalent to drawing samples from the KM estimators of the failure time and censoring distributions (Akritas, 1986). Sun et al. (2001) used the same approach to derive SCBs for quantile functions. This approach, however, would be unsatisfactory for obtaining SRCM-based SCBs, which calls for a resampling mechanism that takes into account information available in the form of a model for m(t). Our two-stage resampling plan combines classical bootstrap with model-based regeneration of censoring indicators. Lu and Tsiatis (2001), Tsiatis et al. (2002), Dikta et al. (2006), and Subramanian, 2009, Subramanian, 2011 have employed regeneration of binary responses in multiple imputations estimation and model checking.

Let τ be such that H(τ)>0, where H denotes the distribution function of Z. The range-respecting SCBs that we propose are based on the asymptotic validity of our bootstrap, which means proving that for almost all samples the suprema over [0,τ] of bootstrap processes, from which critical values will be calibrated, have the same limit distribution as the basic ones they are intended to approximate. The method of proof involves first deriving functional central limit theorems for the bootstrap versions of basic SRCM-specific estimators and later invoking Gill and Johansen's (1990) functional delta method to prove via a series of compactly differentiable mappings the desired asymptotic validity. For different choices of a weight function we are then able to obtain the approximating critical values for constructing the SCBs for S(t) on any interval [t1,t2][0,τ], see Section 2.3.

The paper is organized as follows. Section 2 deals with SRCM-based SCBs for the survival function. We give a brief review of the semiparametric approach to survival function estimation for right censored data in Section 2.1. The proposed two-stage resampling procedure and its asymptotic validity are presented in Section 2.2. The new SCBs are introduced in Section 2.3. Section 3 deals with numerical studies. Simulation results with proper specifications are reported in Section 3.1, while the results of misspecification studies are detailed in Section 3.2. The proposed SCBs are illustrated using data from medical studies in Section 3.3. A concluding discussion is given in Section 4. Proofs of theorems are placed in the Appendix.

Section snippets

Methodology

We write H^(t) to denote the empirical estimator of H(t). We denote the cumulative hazard function of T by Λ(t). Also, aT will indicate the transpose of a vector a, and a2=aaT.

Numerical studies

Our simulation studies focus on two cases one of which is when the model for m(t) is specified correctly and the second pertains to performance in the face of misspecification. We perform comparisons with competing bands through measures such as the empirical coverage probability (ECP), the estimated average enclosed area (EAEA), and the estimated average width (EAW). The ECP is the proportion of SCBs that include S(t) for all t[t1,t2]. On the interval [zm1,zm2], the EAEA and EAW are defined

Concluding discussion

The rationale underlying any model-based procedure is that parametric specifications, when employed correctly, invariably lead to more efficient estimation and inference. For the one-sample censored data setting, a model-based approach of constructing SCBs for survival curves, introduced and implemented in this paper, is shown to provide a viable and better alternative to the existing approaches based on the KM estimator. This is evidenced by our numerical studies, which indicate that the new

Acknowledgments

The authors thank two reviewers for a number of comments and suggestions, which led to improvements in the paper.

References (39)

  • D.R. Cox et al.

    Analysis of Binary Data

    (1989)
  • S. Csörgő et al.

    Confidence bands from censored samples

    Canadian Journal of Statistics

    (1986)
  • D. Collett

    Modelling Binary Data

    (2002)
  • G. Dikta et al.

    Bootstrap approximations in model checks for binary data

    Journal of the American Statistical Association

    (2006)
  • B. Efron

    Censored data and the bootstrap

    Journal of the American Statistical Association

    (1981)
  • T. Fleming et al.

    Counting Processes and Survival Analysis

    (2005)
  • C. Gatsonis et al.

    Simple nonparametric tests for a known standard survival based on censored data

    Communications in Statistics—Theory and Methods

    (1985)
  • R.D. Gill et al.

    A survey of product-integration with a view toward application in survival analysis

    Annals of Statistics

    (1990)
  • M.J. Gillespie et al.

    Confidence band for the Kaplan–Meier survival curve estimate

    Annals of Statistics

    (1979)
  • Cited by (11)

    View all citing articles on Scopus
    1

    Both are first authors who contributed equally to the work.

    View full text