Abstract
In disease registries, bivariate survival data are typically collected under interval sampling. It refers to a situation when entry into a registry is at the time of the first failure event (i.e., HIV infection) within a calendar time window. For all the cases in the registry, time of the initiating event (i.e., birth) is retrospectively identified, and subsequently the second failure event (i.e., death) is observed during follow-up. In this paper we discuss how interval sampling introduces bias into the data. Given the sampling design that the first event occurs within a specific time interval, the first failure time is doubly truncated, and the second failure time is possibly informatively right censored. Consider semi-stationary condition that the disease progression is independent of when the initiating event occurs. Under this condition, this paper adopts copula models to assess association between the bivariate survival times with interval sampling. We first obtain bias-corrected estimators of marginal survival functions, and estimate association parameter of copula model by a two-stage procedure. In the second part of the work, covariates are incorporated into the survival distributions via the proportional hazards models. Inference of the association measure in copula model is established, where the association is allowed to depend on covariates. Asymptotic properties of proposed estimators are established, and finite sample performance is evaluated by simulation studies. The method is applied to a community-based AIDS study in Rakai to investigate dependence between age at infection and residual lifetime without and with adjustment for HIV subtype.
1 Introduction
In disease surveillance systems or registries, it is common to collect data with a certain failure event, such as diagnosis of disease, occurring within a calendar time interval and then to obtain additional information retrospectively and/or prospectively. Such type of sampling is referred to as the interval sampling [1] and we consider bivariate survival data with interval sampling in this paper. One example of such data is AIDS blood transfusion data collected by the Centers for Disease Control, which is from a registry database, a common source of medical data [2]. Individuals who were diagnosed with AIDS during the course of the registry, July 1st, 1982, to June 30th, 1989, were recruited into the database and followed to study the progression of disease. In this example, the time of the initiating event of the HIV infection was retrospectively identified, and bivariate survival times of interest are the lag time from the infection to the AIDS diagnosis and the survival time after AIDS [2]. Generally speaking, under the interval sampling scheme, subjects experiencing the first failure event within a calendar time interval are identified as cases and entered into a registry [1]. For all the cases, the time of the initiating event is retrospectively confirmed and the occurrence of the second failure event is subsequently observed during the follow-up. Therefore, there is clearly a sampling bias due to the selection process, and subjects with the first failure events occurring before or after the course of the registry are unobservable and unaccountable. Any estimation and inference procedure done without consideration of this fact could possibly yield biased results. In the literature, sampling bias issues in disease surveillance data related to AIDS have been extensively studied; see the article of Brookmeyer [3] for an overview of research papers in the field. Methods were developed to handle, for example, various types of truncation, censoring and length-biased sampling.
This paper is partly motivated by a Rakai HIV study in investigating the relationship between age at infection and survival time of treatment-naive HIV-infected individuals. This study in the rural Rakai district of southwestern Uganda conducted annual surveillance from November 1994 in an open cohort of individuals aged 15–49 years [4]. Interest is focused on a cohort of HIV seroconverters, who were initially HIV negative, then seroconverted between 1995 and 2003 and followed until they died or were censored by out-migration or the end of follow-up. With a wide range of age at infection, the risk of death may be positively or negatively associated with increasing age at infection. The scientific goal is to explore how the HIV progression differs by the age at infection and HIV subtype. Since antiretroviral treatment (ART) became available in 2004 in the Rakai Health Sciences Program, the follow-up time and survival analysis were truncated on Dec 31st, 2003, to assess the survival of ART-naive HIV-infected individuals [4]. In this study, the initiating, first and second failure events correspond, respectively, to the birth, incidence of HIV infection and death. Bivariate survival times refer to the age at HIV infection and residual lifetime. Figure 1 provides a graphical presentation to illustrate how the interval sampling design arises in the Rakai HIV study.
As shown in Figure 1, the sampling population consists of individuals who became HIV infected between 1995 and 2003. Under the interval sampling, the age at HIV infection was observed subject to double truncation, that is, left truncation on Jan 1st, 1995, and right truncation on Dec 31st, 2003. The residual lifetime was possibly dependently right censored. A previous study [4] suggested that survival time decreased significantly with older age at infection. Their conclusion was based on comparing Kaplan–Meier survival curves among different groups of age at infection by the log-rank test and the estimated hazard ratio of death associated with age at infection by the Cox proportional hazards model. However, this ignored the fact that data were collected with the interval sampling, which may introduce substantial bias in data analysis. Therefore, our first research problem is to correctly examine the association between age at infection and residual lifetime by removing the bias from the interval sampling. Second, the HIV subtype was found to have an impact on the progression of HIV infection [5]. Thus, we further study the effect of HIV subtype on the association between age at infection and residual lifetime and evaluate the association adjusting for the HIV subtype.
In statistical literature, bivariate and multivariate survival data have been extensively studied. Various statistical methods have been developed to nonparametrically analyze bivariate survival data with right censoring [6, 7, 8, 9]. When the association of bivariate survival times is of interest, the semiparametric copula model has been becoming an increasingly popular tool for modeling the dependence, and in particular, the copula-based survival model has been proposed [10] for bivariate data both subject to right censoring. Wang [11] for bivariate survival data under dependent censoring and Lakhal-Chaieb et al. [12] for bivariate serial gap times. It is noted that these papers consider bivariate survival data, where both failure times are subject to right censoring. They are different from the bivariate survival data with interval sampling of our interest, for which the first failure time is subject to double truncation and the second failure time is subject to right censoring. Nevertheless, the copula family includes many useful bivariate survival models and enjoys flexibility in modeling [10]. An appealing feature is that it allows separate modeling and estimation of margins and the dependency parameter. Estimation and inference could be carried out by a two-stage procedure. At the first stage, marginal survival functions of each failure time are consistently estimated. At the second stage, the association parameter is estimated by maximizing a pseudo-likelihood with marginal survival functions replaced by their consistent estimators. The ideas of two-stage estimation for the copula model have been used by Genest et al. [13] for complete data, Shih and Louis [10] for right-censored data and Wang and Ding [14] for bivariate current status data. The properties of the proposed association estimators depend on regularity conditions on the imposed copula model and the plugged-in estimators of marginal survival functions. In these works, the covariate effect is often modeled through marginal distributions, such as the marginal Cox regression model, assuming the association parameter in the copula model is constant.
In this paper, we consider the semi-stationary copula model for bivariate survival data with interval sampling, and the association parameter is estimated through a two-stage procedure based on a pseudo-conditional likelihood. One challenge is how to correctly estimate the marginal distributions using interval sampling data, which is crucial to ensure unbiased estimation of the association between bivariate survival times. Under the stationary condition, the model was studied in Zhu and Wang [1], where it showed interval sampling does not induce bias on each univariate failure time. This paper relaxes the stationarity assumption to the semi-stationarity, which is a much less restrictive condition, and investigates the data structure that the first failure time is doubly truncated and the second failure time is dependently right censored. Moreover, motivated by the Rakai HIV study where the HIV progression is likely to depend on the HIV subtype, it is of interest to quantify the covariate effect on the association. Therefore, we focus on two scenarios. First, in the absence of covariates, we propose bias-corrected estimators for marginal survival functions and estimate the association parameter of the copula model by a two-stage procedure. Then we model the marginal distributions in a more flexible manner by the Cox proportional hazards models with covariates incorporated. The association is modeled in a parametric way to include covariates and we study the covariate-adjusted association. A novelty of the second model lies in the explicit linkage of the covariate effect to the association measure. Our approaches are not restricted to a particular copula but include the Clayton, positive stable and Frank copula models. The rest of the paper is organized as follows. In Section 2, the interval sampling design is discussed with more details, and the copula model for bivariate survival data as well as the semi-stationary model assumption are introduced. In Section 3, marginal survival distribution for each failure time and the association parameter in the copula are studied under the semi-stationary condition, without consideration of covariates. In Section 4, we incorporate covariates into the copula model and evaluate the association. Asymptotic properties of the proposed estimators are established. Finite sample performances are examined by simulation studies in Section 5. In Section 6, for illustration, the proposed method is applied to the Rakai HIV study. Finally, concluding remarks and discussion are included in Section 7. Technical details and proofs are provided in supplementary materials.
2 The semi-stationary Copula model
In this section, we describe the data structure for bivariate survival data with interval sampling and some fundamental concepts of the copula model, together with the semi-stationary model assumption. Statistical methods and inference are developed for a target population of individuals experiencing the first failure event of interest. To begin, we define random variables for the target population. Let T denote the calendar time of the initiating event, Y denote the time from the initiating event to the first failure event and Z denote the time from the first event to the second event. The failure times Y and Z are possibly correlated and their dependent relationship is of primary interest. Under the interval sampling, the sampling population is made up of subjects whose first failure events occur within a calendar time interval
Assume that the initiating event T occurs over the calendar time with a rate function
Suppose bivariate failure times
where
Assume that
Assume that bivariate failure times
The model is considered to be semi-stationary if
Assume that
where
3 The copula model without covariates
Under the semi-stationary condition when (S) is satisfied, we fit a copula model for bivariate survival data with interval sampling, without considering covariates in this section. The data structure is studied and bias-corrected consistent estimators for marginal survival functions are proposed. In some situations, there is sufficient information on the distribution of the initiating event time T to determine a well-fitted parametric form. In such cases, it is desirable to make use of this information and incorporate it into the analysis. Therefore, we assume a parametric density function
Under Assumption (S), using eq. (1), the conditional likelihood function of observed
in which the distribution of Y becomes a nuisance parameter and is eliminated by the conditioning procedure. The conditional maximum likelihood estimator
is the Fisher information matrix of
We then explore the probability structure of the bivariate data to estimate marginal survival functions
where
where
For the second failure time Z, we discuss different situations of censoring on Z according to the real data application. In the Rakai HIV study, following the previous work [4], we consider that censoring occurs either due to out-migration or the end of the study. It is noted that there may be dependent censoring caused by informative dropout in some HIV studies. Developing a method to handle this dependent censoring is an interesting problem and will be explored in future work. Let
where
Next, let
where
Then a weighted estimating equation for
We then present the estimation procedure for the association parameter
An interesting feature in the decomposition of likelihood is that the marginal likelihood function does not involve the parameter of interest
From the previous discussion, two margins
The estimator of the association parameter
Now we consider the general case when
Theorem 1. As
and
The details of the proof are given in supplementary materials, where we also show
An estimator for bivariate survival function could be obtained by plugging in consistent estimators for unknown quantities in
Theorem 2. As
It is often interesting to report Kendall’s
For the semi-stationary copula model, we rely on a parametric specification of
4 The copula model with covariates
In this section, we propose to model the dependence of the copula association measure
where
Similar to the discussion in Section 3 where there is no covariate, bivariate survival data
Let
To estimate
Denote its solution by
5 Simulation studies
The first set of simulations is carried out to assess the performance of the proposed estimation and inference procedures for the copula model without covariates under a moderate sample size. Specifically, we examine finite sample properties of the proposed estimators for marginal survival functions, association parameter and joint survival function. A set of data
The proposed estimators for the marginal survival functions of Y and Z are evaluated in Figure 2, where bivariate failure times
Clayton copula | ||||||||||||
1.0 | 0.3 | 8.4 | 0.50 | 1.1 | 14.8 | 13.3 | 95.3 | −0.3 | 3.2 | 94.6 | 0.3 | 4.9 |
0.1 | 7.2 | 1.33 | 2.6 | 24.1 | 18.2 | 95.5 | −0.2 | 3.0 | 95.7 | 1.0 | 4.7 | |
0.1 | 7.6 | 3.00 | 8.1 | 44.8 | 42.2 | 95.7 | −0.2 | 3.1 | 93.5 | 1.3 | 3.6 | |
4.0 | 0.4 | 16.2 | 0.50 | 2.7 | 38.3 | 36.8 | 95.3 | −0.2 | 4.2 | 95.4 | 2.7 | 11.3 |
1.0 | 17.5 | 1.33 | 13.4 | 65.9 | 64.2 | 95.9 | −0.2 | 4.3 | 95.6 | 5.0 | 10.8 | |
1.7 | 16.4 | 3.00 | 19.5 | 97.1 | 94.7 | 96.4 | −0.6 | 4.4 | 94.2 | 5.4 | 8.3 | |
Positive stable copula | ||||||||||||
1.0 | 0.1 | 7.2 | 1.25 | 1.5 | 6.2 | 5.0 | 93.1 | 0.2 | 3.2 | 95.1 | 0.7 | 3.9 |
0.6 | 7.9 | 1.67 | 1.3 | 10.1 | 8.3 | 93.5 | −0.2 | 3.1 | 94.6 | 0.6 | 3.4 | |
0.2 | 7.6 | 2.50 | 0.6 | 16.3 | 14.2 | 94.3 | −0.3 | 3.2 | 94.8 | 0.3 | 2.6 | |
4.0 | 1.3 | 15.8 | 1.25 | 1.8 | 8.5 | 6.7 | 94.3 | 0.2 | 4.6 | 95.1 | 0.8 | 5.6 |
1.8 | 17.3 | 1.67 | 0.7 | 15.5 | 14.2 | 94.4 | −0.1 | 4.4 | 95.7 | 0.4 | 5.4 | |
1.4 | 16.5 | 2.50 | 1.3 | 26.2 | 23.7 | 95.6 | −0.3 | 4.5 | 96.1 | 0.7 | 4.4 | |
Frank copula | ||||||||||||
1.0 | 0.7 | 7.9 | 2.00 | 1.0 | 41.8 | 40.6 | 95.4 | 0.1 | 3.1 | 94.5 | 0.1 | 4.1 |
0.2 | 7.5 | −1.00 | 3.2 | 42.2 | 40.5 | 94.8 | 0.1 | 3.2 | 95.2 | 0.2 | 4.6 | |
0.1 | 7.8 | −2.00 | 0.4 | 42.1 | 40.6 | 95.8 | −0.1 | 3.1 | 94.2 | 0.2 | 4.3 | |
4.0 | 1.0 | 17.5 | 2.00 | 3.1 | 82.7 | 81.0 | 96.6 | −0.3 | 4.7 | 96.3 | 0.4 | 8.7 |
1.7 | 21.7 | −1.00 | 1.9 | 85.2 | 83.9 | 96.3 | −0.2 | 4.7 | 95.7 | 1.3 | 9.7 | |
1.4 | 22.0 | −2.00 | 6.5 | 92.7 | 91.1 | 96.5 | −0.2 | 4.5 | 96.1 | 1.4 | 8.6 |
The second set of simulations is conducted to examine the finite sample performance of parameters in the copula model with covariates. The data-generating procedure generally follows that in the first set of simulations. The differences are, first of all, the association measure
where
A | |||||||||||
Clayton copula | |||||||||||
0.8 | 10.8 | 1.7 | 8.8 | 0.8 | 11.7 | 0 | 0.20 | −0.1 | 3.5 | 3.6 | 95.6 |
1 | 0.22 | −0.4 | 3.5 | 3.8 | 95.6 | ||||||
0.9 | 10.2 | 0.4 | 8.8 | 1.1 | 11.8 | 0 | 0.40 | −0.6 | 3.4 | 3.9 | 95.5 |
1 | 0.43 | −0.8 | 3.3 | 3.6 | 95.7 | ||||||
0.4 | 10.6 | 1.2 | 8.7 | 1.6 | 11.5 | 0 | 0.60 | −1.0 | 2.7 | 2.9 | 96.1 |
1 | 0.63 | −0.9 | 2.5 | 2.8 | 96.2 | ||||||
Positive stable copula | |||||||||||
−0.4 | 10.9 | 0.6 | 8.8 | 0.4 | 11.5 | 0 | 0.20 | 0.4 | 3.9 | 4.1 | 93.6 |
1 | 0.30 | 0.1 | 3.7 | 4.0 | 93.5 | ||||||
0.6 | 11.1 | 0.1 | 8.8 | 0.1 | 11.7 | 0 | 0.40 | 0.5 | 3.6 | 3.7 | 94.3 |
1 | 0.47 | 0.2 | 3.3 | 3.2 | 94.5 | ||||||
−0.3 | 10.9 | −0.4 | 8.9 | −1.1 | 11.8 | 0 | 0.60 | −0.2 | 3.0 | 3.2 | 94.7 |
1 | 0.65 | −0.3 | 2.7 | 2.8 | 95.1 | ||||||
Frank copula | |||||||||||
0.8 | 10.5 | 0.2 | 8.8 | 0.5 | 12.0 | 0 | 0.20 | 0.9 | 4.4 | 4.7 | 96.0 |
1 | 0.23 | 0.2 | 4.0 | 4.4 | 96.2 | ||||||
0.5 | 10.7 | −0.9 | 8.7 | 0.9 | 11.8 | 0 | −0.10 | 0.3 | 6.5 | 6.8 | 95.5 |
1 | −0.09 | −0.2 | 4.1 | 4.5 | 95.8 | ||||||
0.3 | 10.3 | 1.0 | 8.9 | −0.6 | 12.2 | 0 | −0.20 | −0.5 | 4.2 | 4.5 | 95.7 |
1 | −0.21 | 1.2 | 4.2 | 4.4 | 95.8 |
6 Application to the Rakai HIV study
6.1 Overall association
The HIV seroconversion data from the Rakai HIV study provide an example of bivariate survival data with interval sampling. In this study, 837 subjects were ascertained with a documented date of HIV seroconversion between 1995 and 2003, and followed until they died or by the end of 2003. Among them, 120 died and others were censored by out-migration or the end of the follow-up. The information on date of birth, date of death, sex, place of residence and HIV subtype is available. The bivariate survival times of interest are age at HIV infection and residual lifetime. Exclusion of subjects who were infected before 1995 or after 2003 results in selection bias of the interval sampling. For the purposes of illustration, we apply the proposed semi-stationary copula model methods to analyze the Rakai HIV seroconversion data, address statistical issues of the interval sampling and study the association between age at HIV infection and residual lifetime among HIV seroconverters. The data and analysis method allow one to model the HIV epidemic for treatment-naive individuals, which would help provide guidance on the initiation of ART.
In the analysis we assume the semi-stationary condition holds, that is, the progression of HIV is independent of the birth time of the study cohort. Denote the birth time of HIV seroconverters by T with distribution function
Next, the marginal survival functions
Previous analysis [4] showed survival time decreased significantly with older age at infection based on a Cox proportional hazards model of residual lifetime conditional on age at infection. However, the appropriateness of the Cox model is under investigation since it did not take into account the fact that the data are collected under the interval sampling. As discussed, due to the interval sampling, age at infection is doubly truncated and residual lifetime is observed subject to dependent right censoring. Therefore, selection bias needs to be corrected in analyzing the data and studying the association between age at infection and residual lifetime. We consider the proposed copula model without covariates in Section 3, where the dependency structure is fitted by the Frank copula, and assess the overall association. To estimate the standard error of the association estimator, we adopt a nonparametric bootstrap method by sampling 837 subjects with replacement from the dataset. The resampling procedure is repeated 500 times. The confidence interval is constructed based on the asymptotic normality, where the standard error is computed using bootstrap resamples. The association parameter
6.2 Subtype-stratified association and subtype-adjusted association
Studies suggest that the progression of HIV infection is affected by the HIV subtype [5]. HIV subtypes differ in biological characteristics that may affect pathogenicity, such as viral fitness and plasma viral loads. These differences may theoretically influence virus infectivity and transmissibility. We investigate this issue by analyzing the Rakai HIV seroconversion data with information on HIV subtype. Among 837 HIV seroconverters, 413 individuals’ HIV subtypes could be identified because their blood serum samples had sufficient HIV RNA for reverse transcriptase polymerase chain reaction amplification. Subtypes were classified as A (15.4%), C (0.5%), D (58.3%), AD recombinants (20.2%) and multiple infections (5.6%). Earlier analysis of the Rakai data suggests that subtypes D, AD recombinants and multiple infections have similar disease progression rates and there is only one individual with subtype C infection in this data set, so for the analysis purposes we compare A subtype with combined non-A virus subtypes [4].
First of all, we stratify by the HIV subtype to assess the association between age at infection and residual lifetime. The estimated marginal survival functions of age at infection and residual lifetime by the HIV subtype are shown in Figure 5. It demonstrates that survival curves of age at infection for A subtype, non-A subtypes and unknown subtype are almost the same, but the survival probability of residual lifetime is substantially lower for non-A and unknown subtypes compared with A subtype. In fact, there are only 2 deaths among 64 subtypes of A infections, compared with 45 deaths among 349 non-A subtypes infections. It is consistent with the result from the previous study in Uganda [5] that A subtype has a slower disease progression rate, and it is thought to be less pathogenic than other subtypes. The association measure
Next, we examine the relationship between age at infection and residual lifetime adjusting for the HIV subtype by the copula model with covariates in Section 4. The dependence structure in the Frank copula is allowed to depend on the HIV subtype through
7 Concluding remarks
This paper considers statistical issues on bivariate survival data with interval sampling, which arises commonly in disease registries or surveillance systems where data are collected conditional on the first failure event occurring within a specific time interval. In this paper, the semi-stationary condition is assumed for statistical modeling and inference. It is important to indicate that the semi-stationary assumption could be violated when, for instance, improved diagnostic strategies over time lead to earlier detection, or an effective treatment becomes available and is given to the diseased individuals during the process of observation. The situation when
The data structure considered in this paper assumes that the first failure time Y is observed exactly. However, as pointed out by one reviewer, there are situations in some HIV cohorts where Y is interval censored. The interval censoring problem of Y is worthy of further investigation and may be handled by extending the prior work on bivariate survival distribution estimation for interval-censored outcomes [18]. For convenience of discussion, the proportional hazards model is used to model the relationship between each failure time and covariates. In fact, any regression model for survival data, such as the semiparametric transformation model or accelerated failure time model, can be used. The proposed copula model framework is very general and can be modified to accommodate other regression model for the marginal distribution. Moreover, it is noted that the same covariates may not be relevant as causal factors to the two failure events of interest, as well as the association between them. Nevertheless, in development of the method, we consider a general case by including the same set of covariates in modeling marginal distributions and the association, which would allow one to test the significance and estimate the covariate effects. In real data applications, one can always start with a general model by including all factors of interest as covariates and only keep the significant ones in the final model.
In the simulations and data application, certain specific copula models are used to characterize the dependence structure of bivariate survival data given their popularity, modeling flexibility and computational convenience. The simulations show knowing the true copula, the estimation procedure provides good results, and we anticipate that the procedure will also perform well for other copulas. While in fact, any copula model could be considered and this raises closely related issues on how a wrong choice of the copula model would affect the estimated association measure and how to choose an appropriate copula model. Since different copula models may lead to different dependence properties of bivariate survival function, the problem of model selection of the copulas needs to be addressed in future work. Potentially, Goodness-of-fit procedures for the copulas could be developed for bivariate survival data with interval sampling. In absence of covariates, we suggest to compare the copula model fit with some nonparametric estimates of the association, such as cross-ratio function or Kendall’s
The research is motivated by and applied to the Rakai HIV seroconversion data to evaluate the association between age at HIV infection and residual lifetime among treatment-naive HIV seroconverters, and study how the association varies by the HIV subtype and changes after controlling for the HIV subtype. Another scientifically interesting problem for further research is to examine the ART effect on the HIV progression. In the Rakai Heath Science Program, the ART became available in 2004 and this time-dependent treatment variable would further complicate the analysis.
Acknowledgments
This work was supported in part by the Cancer Center Support Grant from the National Cancer Institute awarded to the Harold C. Simmons Cancer Center at the University of Texas Southwestern Medical Center. The authors thank the editor, the associate editor and two reviewers for their constructive comments that have greatly improved the initial version of this paper. We also thank the Rakai Health Sciences Program at Johns Hopkins Bloomberg School of Public Health for providing the data.
Appendix A: Asymptotic properties of α ^ ( θ ^ ) in the copula model without covariates
Proof of Theorem 1
Assume that the standard regularity conditions for the maximum likelihood estimator hold and the functions
In the following, we study the asymptotic properties of
If
As previously discussed, if
where
To develop the asymptotic results of the second term in eq. (4), the additional variation created by estimating
which converges weakly to a normal distribution with mean 0 and variance
Combining the preceding results of eqs (5) and (6), we get
Also the corresponding distributions of those two terms are asymptotically orthogonal to each other, since
Equations (7) and (8) imply that
Appendix B: Asymptotic properties of S ^ Y , Z ( Y , Z ) in the copula model without covariates
Proof of Theorem 2
Consider an estimate of
First of all, we show the consistency of
Next, we illustrate the asymptotic distribution of
By Theorem 1,
which is a sum of n i.i.d. random variables. Applying the counting process asymptotic techniques to
where
Moreover, we have
which means eqs (10) and (11) are asymptotically orthogonal. Therefore, eqs (10), (11) and (12) imply that as
Appendix C: Estimations and asymptotic properties in the copula model with covariates
We provide detailed estimation procedures for
where
The first error terms in the above expressions have been discussed, and the second error terms are generated by the use of
Theorem 3. As
Proof of Theorem 3
Observe that
Following the lines in Qi et al. [20], the first terms in eqs (13) and (14) can be proved to have the expressions
which converge weakly to multivariate normal distributions with mean zero and variance-covariance matrices
where for a
which converge weakly to multivariate normal distributions with mean zero and variance-covariance matrices
These imply that eqs (13) and (14) converge weakly to multivariate normal distributions with mean zero and variance-covariance matrices
From the proportional hazards models, marginal survival functions
and
where
and
where
References
1. ZhuH, WangM-C. Analyzing bivariate survival data with interval sampling and application to cancer epidemiology. Biometrika2012;99:345–61.10.1093/biomet/ass009Search in Google Scholar PubMed PubMed Central
2. BilkerWB, WangM-C. A semiparametric extension of the Mann-Whitney test for randomly truncated data. Biometrics1996;52:10–20.10.2307/2533140Search in Google Scholar
3. BrookmeyerR. AIDS, epidemics, and statistics. Biometrics1996;52:781–96.10.2307/2533042Search in Google Scholar
4. LutaloT, GrayRH, WawerM, SewankamboN, SerwaddaD, LaeyendeckerO, et al. Survival of HIV-infected treatment-naive individuals with documented dates of seroconversion in Rakai, Uganda. AIDS2007;21:S15–9.10.1097/01.aids.0000299406.44775.deSearch in Google Scholar PubMed
5. KaleebuP, RossA, MorganD, YirrelD, OramJ, RutebemberwaA, et al. Relationship between HIV-1 ENV subtypes A and D and disease progression in a rural Ugandan cohort. AIDS2001;15:293–9.10.1097/00002030-200102160-00001Search in Google Scholar PubMed
6. HuangY, LouisTA. Nonparametric estimation of the joint distribution of survival time and mark variable. Biometrika1998;85:785–96.10.1093/biomet/85.4.785Search in Google Scholar
7. LinD-Y, SunW, YingZ. Nonparametric estimation of gap time distributions for serial events with censored data. Biometrika1999;86:59–70.10.1093/biomet/86.1.59Search in Google Scholar
8. SchaubelDE, CaiJ. Nonparametric estimation of gap time survival functions for ordered multivariate failure time data. Stat Med2004;23:1885–900.10.1002/sim.1777Search in Google Scholar PubMed
9. VisserM. Nonparametric estimation on the bivariate survival function with application to vertically transmitted AIDS. Biometrika1996;83:507–18.10.1093/biomet/83.3.507Search in Google Scholar
10. ShihJH, LouisTA. Inferences on the association parameters in copula models for bivariate survival data. Biometrics1995;51:1384–99.10.2307/2533269Search in Google Scholar
11. WangW. Estimating the association parameter for copula models under dependent censoring. J R Stat Soc B2003;65:257–73.10.1111/1467-9868.00385Search in Google Scholar
12. Lakhal-ChaiebL, CookR, LinX. Inverse probability of censoring weighted estimates of Kendall’s τ for gap time analyses. Biometrics2010;66:1145–52.10.1111/j.1541-0420.2010.01404.xSearch in Google Scholar
13. GenestC, GhoudiK, RivestL-P. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika1995;82:543–52.10.1093/biomet/82.3.543Search in Google Scholar
14. WangW, DingAA. On assessing the association for bivariate current status data. Biometrika2000;87:879–93.10.1093/biomet/87.4.879Search in Google Scholar
15. AalenOO. Weak convergence of stochastic integrals related to counting process. Z Wahrsch Ver Geb1977;38:261–77.10.1007/BF00533158Search in Google Scholar
16. ZengD. Estimating marginal survival function by adjusting for dependent censoring using many covariates. Ann Stat2004;32:1533–55.10.1214/009053604000000508Search in Google Scholar
17. ShenP-S. Nonparametric analysis of doubly truncated data. Ann Inst Stat Math2008;62:835–53.10.1007/s10463-008-0192-2Search in Google Scholar
18. BetenskyRA, FinkelsteinDM. A non-parametric maximum likelihood estimator for bivariate interval censored data. Stat Med1999;18:3089–100.10.1002/(SICI)1097-0258(19991130)18:22<3089::AID-SIM191>3.0.CO;2-0Search in Google Scholar
19. Van der VaartAW. Asymptotic statistics. Cambridge: Cambridge University Press, 1998.Search in Google Scholar
20. QiL, WangCY, PrenticeRL. Weighted estimator for proportional hazards regression with missing covariates. J Am Stat Assoc2005;100:1250–63.10.1198/016214505000000295Search in Google Scholar
© 2015 Walter de Gruyter GmbH, Berlin/Boston